scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 17:24:29 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	37830da1f6	fixed wrong code in test	2011-06-10 18:27:39 -03:00
Pablo Hoffman	c4a607fc78	Raise ValueError if url has no scheme in Request constructor	2011-06-10 18:22:36 -03:00
Pablo Hoffman	88e33ad0ad	Simplified Request/Response __repr__ to be the same as __str__. This improves legibility and shouldn't affect any functionality, since we never use __repr__ for reconstructing a response AFAIK. Also fixes #318	2011-06-09 00:15:53 -03:00
Pablo Hoffman	07df0edf74	scrapyd.webservice: use twisted.web multipart data parsing, to simplify code. closes #324	2011-06-08 14:17:04 -03:00
Pablo Hoffman	7643f14c88	fixed bug handling truncated gzipped responses. closes #319	2011-06-06 18:25:14 -03:00
Pablo Hoffman	48509b036a	fixed some tests accidentally broken in previous commit	2011-06-06 16:11:43 -03:00
Pablo Hoffman	f793515565	make --headers output of fetch command resemble curl format, and also show request headers	2011-06-06 15:21:50 -03:00
Pablo Hoffman	03751749a8	Scheduler refactoring which introduces the following changes: * dropped deferred stored along with requests in scheduler queues, which will add the ability to support persistent schedulers in the future * moved duplicates filter into the scheduler itself, using the same dupe fltering class as before (DUPEFILTER_CLASS setting) * removed scheduler middleware component to simplify, as it was only used for duplicates filtering and that is now done in the scheduler itself * adapted media pipeline to work with new scheduler * cleanup old docstrings	2011-06-06 03:16:56 -03:00
Pablo Hoffman	474cba512c	simplified MemoryDebugger extension to use stats for dumping memory debugging info	2011-06-06 03:13:28 -03:00
Pablo Hoffman	5fbc32c015	call stats collector engine_stopped() after the engine is closed (to make sure all data from extensions has been collected), and added that method to documented api	2011-06-06 03:12:40 -03:00
Pablo Hoffman	35b52fcdf0	removed deprecated stat 'envinfo/request_depth_limit'. we should instead support dumping settings, for these cases	2011-06-06 01:02:58 -03:00
Pablo Hoffman	9d9c8877da	added 'scrapy edit' command	2011-06-05 22:02:56 -03:00
Pablo Hoffman	ffbc9295f6	simplified DownloaderStats middleware	2011-06-05 20:03:09 -03:00
Pablo Hoffman	3d823d6f45	simplified CoreStats extension	2011-06-05 19:57:38 -03:00
Pablo Hoffman	61cc95df7c	removed crawlspider v2 tests	2011-06-03 18:26:17 -03:00
Pablo Hoffman	03ae481cad	removed experimental crawlspider v2	2011-06-03 18:23:23 -03:00
Pablo Hoffman	5bf733b6f6	Changed default representation of items to pretty-printed dicts. This improves default logging by making log more readable in the default case, for both Scraped and Dropped lines. Projects can still customize how items are represented by overriding the item's __str__ method, as usual.	2011-06-03 01:13:01 -03:00
Pablo Hoffman	1bc2339bb8	Merged item passed and item scraped concepts, as they have often proved confusing in the past. This means: * original item_scraped signal was removed * original item_passed signal was renamed to item_scraped * old log lines "Scraped Item..." removed * old log lines "Passed Item..." renamed to "Scraped Item..."	2011-06-03 01:13:00 -03:00
Pablo Hoffman	e6091df551	fixed doc typo	2011-05-30 09:04:31 -03:00
Pablo Hoffman	1d98fc8fb5	added spider_error signal	2011-05-29 22:38:17 -03:00
Pablo Hoffman	13d8066788	removed undocumented (and untested) extension: SpiderCloseDelay	2011-05-27 11:52:33 -03:00
Pablo Hoffman	6c369c50ca	removed support for spider.dont_throttle attribute	2011-05-27 09:09:28 -03:00
Pablo Hoffman	2fa0f75f2d	added COOKIES_ENABLED setting to support disabling the cookies middleware	2011-05-27 00:35:34 -03:00
Pablo Hoffman	756bf0cc06	register AutoThrottle extension by default, and made AUTOTHROTTLE_ENABLED disabled by default	2011-05-27 00:22:13 -03:00
Pablo Hoffman	dcc28b7186	added setting: AUTOTHROTTLE_ENABLED	2011-05-22 18:31:36 -03:00
Pablo Hoffman	110cd05296	added Spider.dont_throttle attribute to disable AutoThrottle extension per spider	2011-05-22 18:26:38 -03:00
Shane Evans	88dbe2ae87	fix error messages due to fetching pages during shutdown process This version keeps the faster approach of not processing request callbacks when engine is shutting down	2011-05-20 14:35:37 +01:00
Pablo Hoffman	3897e33612	fixed stupid bug in scheduler introduced in previous change	2011-05-20 03:52:41 -03:00
Pablo Hoffman	70b0e42ca6	removed unused imports	2011-05-20 03:26:07 -03:00
Pablo Hoffman	d72d3f4607	stack trace dump extension: also dump engine status, and support triggering it with SIGQUIT, besides SIGUSR2	2011-05-20 03:25:00 -03:00
Pablo Hoffman	6069b0e5b2	Fixed 100% cpu loop that ocurred in some cases where Scrapy was shutting donw	2011-05-20 03:21:36 -03:00
Pablo Hoffman	951ba507f9	Removed support for default values in Scrapy items, which have proven confusing in the past	2011-05-19 21:42:46 -03:00
Pablo Hoffman	503f302010	removed remaining references to scheduler middleware from doc, as it will be removed on next release	2011-05-18 19:48:48 -03:00
Pablo Hoffman	3fd17432cf	fixed outdated documentation	2011-05-18 14:46:20 -03:00
Pablo Hoffman	9016e7e993	added role to link to scrapy source code (not yet used)	2011-05-18 14:43:34 -03:00
Pablo Hoffman	a98e9e054b	minor fix to spider closed count stat	2011-05-18 12:45:19 -03:00
Pablo Hoffman	cd85c12c33	Some Link extractor improvements: * added support for ignoring common file extensions that are not followed if they occur in links * fixed link extractor documentation issues * slighly improved performance of applying filters * added link to link extractors doc from documentation index	2011-05-18 12:32:34 -03:00
Pablo Hoffman	495152bd50	disabled verbose depth stats collection by default, added DEPTH_STATS_VERBOSE setting to enable it	2011-05-18 11:04:48 -03:00
Pablo Hoffman	accb6ed830	dump stats to log by default (ie. change default value of STATS_DUMP to True)	2011-05-17 22:42:05 -03:00
Pablo Hoffman	315457c2ef	added support for -a option to runspider command (like it works with crawl command)	2011-05-17 22:07:49 -03:00
Pablo Hoffman	ab6a4d053f	minor code improvement	2011-05-16 09:56:32 -03:00
Pablo Hoffman	d29eccba56	AutoThrottle: added missing line to connect spider_closed hanlder	2011-05-16 09:42:44 -03:00
Pablo Hoffman	403dc536e2	improved documentation of AutoThrottle extension	2011-05-15 06:07:26 -03:00
Pablo Hoffman	2b933a4a8c	added AutoThrottle extension (still under testing, not yet enabled by default)	2011-05-15 05:39:58 -03:00
Pablo Hoffman	bd8d7f5cf4	collect download latencies in 'download_latency' request/response meta key	2011-05-15 05:24:01 -03:00
Pablo Hoffman	668dfcabf3	send the response_received signal from the engine, after tying it with the corresponding request	2011-05-15 05:20:14 -03:00
Pablo Hoffman	f9aa819b06	scraper: minor performance improvement by using collections.deque() as in downloader (see previous commit)	2011-05-14 21:50:14 -03:00
Pablo Hoffman	079de67719	downloader: minor performance improvement by using collections.deque() to avoid the list.pop(0) call which is O(n)	2011-05-14 21:47:25 -03:00
Pablo Hoffman	7e62a0a1a1	Downloader: Added support for dynamically adjusting download delay and maximum concurrent requests	2011-05-14 21:35:46 -03:00
Pablo Hoffman	bac46ba438	make sure Request.method is always str	2011-05-02 01:11:19 -03:00

... 2 3 4 5 6 ...

2798 Commits