scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 19:03:49 +00:00

Author	SHA1	Message	Date
Shane Evans	88dbe2ae87	fix error messages due to fetching pages during shutdown process This version keeps the faster approach of not processing request callbacks when engine is shutting down	2011-05-20 14:35:37 +01:00
Pablo Hoffman	3897e33612	fixed stupid bug in scheduler introduced in previous change	2011-05-20 03:52:41 -03:00
Pablo Hoffman	70b0e42ca6	removed unused imports	2011-05-20 03:26:07 -03:00
Pablo Hoffman	d72d3f4607	stack trace dump extension: also dump engine status, and support triggering it with SIGQUIT, besides SIGUSR2	2011-05-20 03:25:00 -03:00
Pablo Hoffman	6069b0e5b2	Fixed 100% cpu loop that ocurred in some cases where Scrapy was shutting donw	2011-05-20 03:21:36 -03:00
Pablo Hoffman	951ba507f9	Removed support for default values in Scrapy items, which have proven confusing in the past	2011-05-19 21:42:46 -03:00
Pablo Hoffman	503f302010	removed remaining references to scheduler middleware from doc, as it will be removed on next release	2011-05-18 19:48:48 -03:00
Pablo Hoffman	3fd17432cf	fixed outdated documentation	2011-05-18 14:46:20 -03:00
Pablo Hoffman	9016e7e993	added role to link to scrapy source code (not yet used)	2011-05-18 14:43:34 -03:00
Pablo Hoffman	a98e9e054b	minor fix to spider closed count stat	2011-05-18 12:45:19 -03:00
Pablo Hoffman	cd85c12c33	Some Link extractor improvements: * added support for ignoring common file extensions that are not followed if they occur in links * fixed link extractor documentation issues * slighly improved performance of applying filters * added link to link extractors doc from documentation index	2011-05-18 12:32:34 -03:00
Pablo Hoffman	495152bd50	disabled verbose depth stats collection by default, added DEPTH_STATS_VERBOSE setting to enable it	2011-05-18 11:04:48 -03:00
Pablo Hoffman	accb6ed830	dump stats to log by default (ie. change default value of STATS_DUMP to True)	2011-05-17 22:42:05 -03:00
Pablo Hoffman	315457c2ef	added support for -a option to runspider command (like it works with crawl command)	2011-05-17 22:07:49 -03:00
Pablo Hoffman	ab6a4d053f	minor code improvement	2011-05-16 09:56:32 -03:00
Pablo Hoffman	d29eccba56	AutoThrottle: added missing line to connect spider_closed hanlder	2011-05-16 09:42:44 -03:00
Pablo Hoffman	403dc536e2	improved documentation of AutoThrottle extension	2011-05-15 06:07:26 -03:00
Pablo Hoffman	2b933a4a8c	added AutoThrottle extension (still under testing, not yet enabled by default)	2011-05-15 05:39:58 -03:00
Pablo Hoffman	bd8d7f5cf4	collect download latencies in 'download_latency' request/response meta key	2011-05-15 05:24:01 -03:00
Pablo Hoffman	668dfcabf3	send the response_received signal from the engine, after tying it with the corresponding request	2011-05-15 05:20:14 -03:00
Pablo Hoffman	f9aa819b06	scraper: minor performance improvement by using collections.deque() as in downloader (see previous commit)	2011-05-14 21:50:14 -03:00
Pablo Hoffman	079de67719	downloader: minor performance improvement by using collections.deque() to avoid the list.pop(0) call which is O(n)	2011-05-14 21:47:25 -03:00
Pablo Hoffman	7e62a0a1a1	Downloader: Added support for dynamically adjusting download delay and maximum concurrent requests	2011-05-14 21:35:46 -03:00
Pablo Hoffman	bac46ba438	make sure Request.method is always str	2011-05-02 01:11:19 -03:00
Pablo Hoffman	afa23688c6	fixed bug in scrapy.http.Headers: values weren't being encoded to str when passed as lists	2011-05-01 19:39:13 -03:00
Pablo Hoffman	7f97259ba7	added w3lib to requirements, in installation guide	2011-05-01 11:14:57 -03:00
Pablo Hoffman	718428c0ab	debian/control: added python-setuptools to Recommends, because it's need by 'scrapy deploy' command	2011-05-01 11:00:02 -03:00
Pablo Hoffman	d08281a44f	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-04-30 01:35:43 -03:00
Pablo Hoffman	4a83167698	fixed small doc typo	2011-04-30 01:35:30 -03:00
Pablo Hoffman	cf572bb642	removed experimental examples	2011-04-28 18:07:23 -03:00
Pablo Hoffman	bb2b67c862	updated tutorial to use 'dmoz' as the name of the spider instead of 'dmoz.org', so that it's more similar to the dirbot example project	2011-04-28 09:31:57 -03:00
Pablo Hoffman	bf73002428	removed googledir example, replaced by dirbot project on github. updated docs accordingly	2011-04-28 02:28:39 -03:00
Pablo Hoffman	b12dd76bb8	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-04-25 09:31:18 -03:00
Pablo Hoffman	678f08bc1b	added warning about using 'parse' as callback in crawl spider rules	2011-04-25 09:30:42 -03:00
Pablo Hoffman	18d303b5f1	ported internal scrapy.utils imports to w3lib	2011-04-19 01:33:52 -03:00
Pablo Hoffman	fcc8d73840	Removed scrapy.contrib.ibl module (and submodules). They have been moved to a new library "scrapely". See https://github.com/scrapy/scrapely	2011-04-19 01:04:22 -03:00
Pablo Hoffman	ebcbb9f453	debian: added python-w3lib package to dependencies	2011-04-19 00:55:08 -03:00
Pablo Hoffman	b10f4fae35	Moved several functions from scrapy.utils.{http,markup,multipart,response,url} (and their tests) to a new library called 'w3lib'. Scrapy will now depend on w3lib.	2011-04-18 22:37:19 -03:00
Pablo Hoffman	ad496eb3b6	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-04-14 12:36:27 -03:00
Pablo Hoffman	ecb4f44cbc	Added clarification on how to work with local settings and scrapy deploy	2011-04-14 12:36:09 -03:00
Pablo Hoffman	6f262a198c	Added IOError to the list of exceptions to retry in the RetryMiddleware	2011-04-12 18:12:36 -03:00
Pablo Hoffman	7c49e8679c	fixed typo	2011-04-07 02:04:42 -03:00
Pablo Hoffman	3ee2c94e93	Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it	2011-04-06 14:54:48 -03:00
Pablo Hoffman	8a5c08a6bc	added join_multivalued parameter to CsvItemExporter	2011-03-24 13:15:52 -03:00
Pablo Hoffman	84dee1f77f	removed unused function	2011-03-24 09:03:57 -03:00
Pablo Hoffman	3954e600ca	added DBM storage backend for HTTP cache	2011-03-23 21:32:02 -03:00
Pablo Hoffman	60f6a9b054	moved scrapyd python module to scrapy debian package. left scrapyd package only for installing service (upstart script) and scrapy user	2011-03-23 15:45:40 -03:00
Shane Evans	407f7f2d65	fix minor error in IBL tests and post-processing nested annoations	2011-03-11 21:58:47 +00:00
Shane Evans	bc6eee71e3	refactor IBL extraction to allow processing parsed data --HG-- extra : rebase_source : 1a0ec4322702288f6e996d384d45a36deede3868	2011-03-11 20:02:29 +00:00
Pablo Hoffman	9591413d9d	added Jochen Maes to AUTHORS	2011-03-09 14:24:17 -02:00

... 3 4 5 6 7 ...

2822 Commits