scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 23:04:29 +00:00

Author	SHA1	Message	Date
olveyra	b39cb22d83	dont discard slot when empty, just save in another dict in order to recycle if needed again. This fix avoids to continuosly create new slot under certain cases, bug that prevents download_delay and max_concurrent_requests to work properly. The problem arises when the slot for a given domain becomes empty, but further requests for that domain werent still created by the spider. This is typical when spider creates requests one by one, or it makes requests to multiple domains and one or more of them are created in a rate enough slow that makes slot to be empty each time the response is fetched. The effect is that a new slot is created for each request under such conditions, and so the download_delay and max_concurrent_requests are not taking effect (because in order to apply, depends on an already existing slot for that domain).	2012-04-02 20:34:57 +00:00
Pablo Hoffman	e9184def35	make selector re() method use re.UNICODE flag to compile regexes	2012-04-01 00:41:03 -03:00
Pablo Hoffman	27018fced7	changed default user agent to Scrapy/0.15 (+http://scrapy.org ) and removed no longer needed BOT_VERSION setting	2012-03-23 13:45:21 -03:00
Pablo Hoffman	731c569b5c	fixed test-scrapyd.sh script after changed on insophia website	2012-03-22 16:38:28 -03:00
Pablo Hoffman	8933e2f2be	added REFERER_ENABLED setting, to control referer middleware	2012-03-22 16:35:14 -03:00
Pablo Hoffman	eed34e88cd	Merge pull request #103 from jsyeo/patch-1 fixed minor mistake in Request objects documentation	2012-03-20 19:49:31 -07:00
Jason Yeo	da826aa13d	fixed minor mistake in Request objects documentation	2012-03-21 10:25:41 +08:00
Pablo Hoffman	175c70ad44	fixed minor defect in link extractors documentation	2012-03-20 22:56:45 -03:00
Pablo Hoffman	056a7c53d0	added artwork files properly now	2012-03-20 10:46:45 -03:00
Pablo Hoffman	aef70e8394	removed wrongly added artwork files	2012-03-20 10:45:48 -03:00
Pablo Hoffman	bcd8520f8d	added sep directory with Scrapy Enhancement Proposal imported from old Trac site	2012-03-20 10:15:00 -03:00
Pablo Hoffman	c0141d154e	added artwork directory (data taken from old Trac)	2012-03-20 10:14:11 -03:00
Pablo Hoffman	35fb01156e	removed some obsolete remaining code related to sqlite support in scrapy	2012-03-16 11:55:55 -03:00
Pablo Hoffman	838e1dcce9	updated FormRequest tests to use HtmlResponse instead of Response, as it makes more sense	2012-03-15 11:47:02 -03:00
Pablo Hoffman	b6ae266546	Removed (very old and possibly broken) backwards compatibility support for Twisted 2.5	2012-03-15 00:28:24 -03:00
Pablo Hoffman	9fddc73ed8	removed backwards compatibility code for old scrapy versions	2012-03-06 05:42:09 -02:00
Pablo Hoffman	9a508d4638	Removed deprecated setting: CLOSESPIDER_ITEMPASSED	2012-03-06 05:26:57 -02:00
Pablo Hoffman	8b83177655	Added CLOSESPIDER_ERRORCOUNT to scrapy/default_settings.py	2012-03-06 05:26:57 -02:00
Pablo Hoffman	9006227358	bumped required python-w3lib version in debian/control	2012-03-05 20:25:38 -02:00
Daniel Graña	2909a60e95	test that default start_request return value type is a generator. refs #98	2012-03-05 17:53:20 -02:00
Pablo Hoffman	45685ea6cd	Restored scrapy.utils.py26 module for backwards compatibility, with a deprecation message. This is needed because the module was used a lot by users and the change causes too much trouble	2012-03-05 17:15:49 -02:00
Daniel Graña	cc6e297062	Merge pull request #98 from kalessin/start_requests This will break any spider that extends `start_requests` and expect a `list` as return value. In the other side: * [Docs](http://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spider.BaseSpider.start_requests) says that return value is iterable not list: * Scrapy core already support consuming start_requests generator on demand so we can avoid problems like #47 * it allows extensions to change starting requests on `spider_opened` signal	2012-03-05 08:51:22 -08:00
Martin Olveyra	f6179a927e	replace list by generator also in start_requests method of Sitemap spider	2012-03-05 14:25:12 -02:00
Martin Olveyra	cc7fc33833	change start_request to return a generator instead of a list, in order to allow to modify start_urls triggered by spider_opened signal	2012-03-05 12:49:17 -02:00
Pablo Hoffman	e521da2e2f	Dropped support for Python 2.5. See: http://blog.scrapy.org/scrapy-dropping-support-for-python-25	2012-03-01 08:18:12 -02:00
Pablo Hoffman	8eb0b11f8a	removed unused import	2012-02-29 17:40:30 -02:00
Pablo Hoffman	5c329b6514	Merge pull request #97 from scrapy/w3lib_encoding Ported scrapy to use w3lib.encoding	2012-02-29 01:45:59 -08:00
Pablo Hoffman	de3a3b68dc	bumped required w3lib version to 1.1, after refactoring encoding detection to use the new w3lib.encoding module	2012-02-29 07:44:22 -02:00
Pablo Hoffman	2b16ebdc11	added minor clarification on cookiejar request meta key usage	2012-02-29 07:19:01 -02:00
Pablo Hoffman	61df6b4691	Merge pull request #51 from lostsnow/master scrapyd: support bind to a specific ip address	2012-02-28 23:49:56 -08:00
lostsnow	5afe4f50c1	scrapyd: support bind to a specific ip address	2012-02-29 13:47:40 +08:00
Daniel Graña	798169805a	Adapt response encoding detection to pass test cases	2012-02-28 14:32:55 -02:00
Pablo Hoffman	81abb45000	fixed bug in new cookiejar documentation	2012-02-28 11:08:25 -02:00
Pablo Hoffman	26c8004125	added documentation for the new cookiejar Request.meta key	2012-02-27 19:58:58 -02:00
Pablo Hoffman	44d6da82fd	Merge pull request #96 from kalessin/cookiesmultijar allow to work with multiple cookie jars on the same spider	2012-02-27 13:48:43 -08:00
olveyra	c093ac5ec6	allow to work with multiple cookie jars on the same spider	2012-02-27 18:03:48 +00:00
Pablo Hoffman	4ed1a03521	Merge pull request #95 from scrapy/openmobilealliance-mimetype Handle as html standard mimetype defined by Open Mobile Alliance	2012-02-24 10:28:32 -08:00
Daniel Graña	049f315ff4	Handle as html standard mimetype defined by Open Mobile Alliance	2012-02-24 16:16:35 -02:00
Pablo Hoffman	b1f011d740	use netloc instead of hostname in url_is_from_any_domain(). closes #50	2012-02-24 02:09:02 -02:00
Daniel Graña	08d2c2b9ee	Merge branch 'GH92-image-buf-threading'	2012-02-23 19:25:28 -02:00
Daniel Graña	2dbf2a38a2	move buffer pointing to start of file before computing checksum. refs #92	2012-02-23 19:23:37 -02:00
Pablo Hoffman	e0de5f3eab	Merge pull request #93 from dangra/GH92-image-buf-threading compute image checksum before persisting images	2012-02-23 13:21:10 -08:00
Daniel Graña	3286ce4f42	Compute image checksum before persisting images. closes #92 Avoids threading issue accesing buffer	2012-02-23 19:17:21 -02:00
Pablo Hoffman	52483c55cd	Merge pull request #94 from dangra/mediapipeline-cache-failures remove as much information as possible from cached failure	2012-02-23 13:11:29 -08:00
Daniel Graña	5c73a0b1c1	remove leaking references in cached failures	2012-02-23 19:08:36 -02:00
Pablo Hoffman	e312a88582	MemoryUsage: use resident memory size (instead of virtual) for tracking memory usage	2012-02-23 17:42:02 -02:00
Pablo Hoffman	7fe7c3f3b1	MemoryUsage extension: close the spiders (instead of stopping the engine) when the limit is exceeded, providing a descriptive reason for the close. Also fixed default value of MEMUSAGE_ENABLED setting to match the documentation.	2012-02-23 17:05:06 -02:00
Pablo Hoffman	c476681c06	ported to code to use w3lib.encoding (work in progress, many tests failing yet)	2012-02-21 21:31:19 -02:00
Pablo Hoffman	6769b92493	Merge branch 'master' of github.com:scrapy/scrapy	2012-02-19 05:59:30 -02:00
Pablo Hoffman	0939106872	fixed bug in MemoryUsage extension: get_engine_status() takes exactly 1 argument (0 given)	2012-02-19 05:59:21 -02:00

... 6 7 8 9 10 ...

3297 Commits