1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 19:03:49 +00:00

2822 Commits

Author SHA1 Message Date
Shane Evans
88dbe2ae87 fix error messages due to fetching pages during shutdown process
This version keeps the faster approach of not processing request callbacks when engine is shutting down
2011-05-20 14:35:37 +01:00
Pablo Hoffman
3897e33612 fixed stupid bug in scheduler introduced in previous change 2011-05-20 03:52:41 -03:00
Pablo Hoffman
70b0e42ca6 removed unused imports 2011-05-20 03:26:07 -03:00
Pablo Hoffman
d72d3f4607 stack trace dump extension: also dump engine status, and support triggering it with SIGQUIT, besides SIGUSR2 2011-05-20 03:25:00 -03:00
Pablo Hoffman
6069b0e5b2 Fixed 100% cpu loop that ocurred in some cases where Scrapy was shutting donw 2011-05-20 03:21:36 -03:00
Pablo Hoffman
951ba507f9 Removed support for default values in Scrapy items, which have proven confusing in the past 2011-05-19 21:42:46 -03:00
Pablo Hoffman
503f302010 removed remaining references to scheduler middleware from doc, as it will be removed on next release 2011-05-18 19:48:48 -03:00
Pablo Hoffman
3fd17432cf fixed outdated documentation 2011-05-18 14:46:20 -03:00
Pablo Hoffman
9016e7e993 added role to link to scrapy source code (not yet used) 2011-05-18 14:43:34 -03:00
Pablo Hoffman
a98e9e054b minor fix to spider closed count stat 2011-05-18 12:45:19 -03:00
Pablo Hoffman
cd85c12c33 Some Link extractor improvements:
* added support for ignoring common file extensions that are not followed if
  they occur in links
* fixed link extractor documentation issues
* slighly improved performance of applying filters
* added link to link extractors doc from documentation index
2011-05-18 12:32:34 -03:00
Pablo Hoffman
495152bd50 disabled verbose depth stats collection by default, added DEPTH_STATS_VERBOSE setting to enable it 2011-05-18 11:04:48 -03:00
Pablo Hoffman
accb6ed830 dump stats to log by default (ie. change default value of STATS_DUMP to True) 2011-05-17 22:42:05 -03:00
Pablo Hoffman
315457c2ef added support for -a option to runspider command (like it works with crawl command) 2011-05-17 22:07:49 -03:00
Pablo Hoffman
ab6a4d053f minor code improvement 2011-05-16 09:56:32 -03:00
Pablo Hoffman
d29eccba56 AutoThrottle: added missing line to connect spider_closed hanlder 2011-05-16 09:42:44 -03:00
Pablo Hoffman
403dc536e2 improved documentation of AutoThrottle extension 2011-05-15 06:07:26 -03:00
Pablo Hoffman
2b933a4a8c added AutoThrottle extension (still under testing, not yet enabled by default) 2011-05-15 05:39:58 -03:00
Pablo Hoffman
bd8d7f5cf4 collect download latencies in 'download_latency' request/response meta key 2011-05-15 05:24:01 -03:00
Pablo Hoffman
668dfcabf3 send the response_received signal from the engine, after tying it with the corresponding request 2011-05-15 05:20:14 -03:00
Pablo Hoffman
f9aa819b06 scraper: minor performance improvement by using collections.deque() as in downloader (see previous commit) 2011-05-14 21:50:14 -03:00
Pablo Hoffman
079de67719 downloader: minor performance improvement by using collections.deque() to avoid the list.pop(0) call which is O(n) 2011-05-14 21:47:25 -03:00
Pablo Hoffman
7e62a0a1a1 Downloader: Added support for dynamically adjusting download delay and maximum concurrent requests 2011-05-14 21:35:46 -03:00
Pablo Hoffman
bac46ba438 make sure Request.method is always str 2011-05-02 01:11:19 -03:00
Pablo Hoffman
afa23688c6 fixed bug in scrapy.http.Headers: values weren't being encoded to str when passed as lists 2011-05-01 19:39:13 -03:00
Pablo Hoffman
7f97259ba7 added w3lib to requirements, in installation guide 2011-05-01 11:14:57 -03:00
Pablo Hoffman
718428c0ab debian/control: added python-setuptools to Recommends, because it's need by 'scrapy deploy' command 2011-05-01 11:00:02 -03:00
Pablo Hoffman
d08281a44f Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-04-30 01:35:43 -03:00
Pablo Hoffman
4a83167698 fixed small doc typo 2011-04-30 01:35:30 -03:00
Pablo Hoffman
cf572bb642 removed experimental examples 2011-04-28 18:07:23 -03:00
Pablo Hoffman
bb2b67c862 updated tutorial to use 'dmoz' as the name of the spider instead of 'dmoz.org', so that it's more similar to the dirbot example project 2011-04-28 09:31:57 -03:00
Pablo Hoffman
bf73002428 removed googledir example, replaced by dirbot project on github. updated docs accordingly 2011-04-28 02:28:39 -03:00
Pablo Hoffman
b12dd76bb8 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-04-25 09:31:18 -03:00
Pablo Hoffman
678f08bc1b added warning about using 'parse' as callback in crawl spider rules 2011-04-25 09:30:42 -03:00
Pablo Hoffman
18d303b5f1 ported internal scrapy.utils imports to w3lib 2011-04-19 01:33:52 -03:00
Pablo Hoffman
fcc8d73840 Removed scrapy.contrib.ibl module (and submodules). They have been moved to a new library "scrapely". See https://github.com/scrapy/scrapely 2011-04-19 01:04:22 -03:00
Pablo Hoffman
ebcbb9f453 debian: added python-w3lib package to dependencies 2011-04-19 00:55:08 -03:00
Pablo Hoffman
b10f4fae35 Moved several functions from scrapy.utils.{http,markup,multipart,response,url} (and their tests) to a new library called 'w3lib'. Scrapy will now depend on w3lib. 2011-04-18 22:37:19 -03:00
Pablo Hoffman
ad496eb3b6 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-04-14 12:36:27 -03:00
Pablo Hoffman
ecb4f44cbc Added clarification on how to work with local settings and scrapy deploy 2011-04-14 12:36:09 -03:00
Pablo Hoffman
6f262a198c Added IOError to the list of exceptions to retry in the RetryMiddleware 2011-04-12 18:12:36 -03:00
Pablo Hoffman
7c49e8679c fixed typo 2011-04-07 02:04:42 -03:00
Pablo Hoffman
3ee2c94e93 Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it 2011-04-06 14:54:48 -03:00
Pablo Hoffman
8a5c08a6bc added join_multivalued parameter to CsvItemExporter 2011-03-24 13:15:52 -03:00
Pablo Hoffman
84dee1f77f removed unused function 2011-03-24 09:03:57 -03:00
Pablo Hoffman
3954e600ca added DBM storage backend for HTTP cache 2011-03-23 21:32:02 -03:00
Pablo Hoffman
60f6a9b054 moved scrapyd python module to scrapy debian package. left scrapyd package only for installing service (upstart script) and scrapy user 2011-03-23 15:45:40 -03:00
Shane Evans
407f7f2d65 fix minor error in IBL tests and post-processing nested annoations 2011-03-11 21:58:47 +00:00
Shane Evans
bc6eee71e3 refactor IBL extraction to allow processing parsed data
--HG--
extra : rebase_source : 1a0ec4322702288f6e996d384d45a36deede3868
2011-03-11 20:02:29 +00:00
Pablo Hoffman
9591413d9d added Jochen Maes to AUTHORS 2011-03-09 14:24:17 -02:00