scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 17:43:51 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	d72d3f4607	stack trace dump extension: also dump engine status, and support triggering it with SIGQUIT, besides SIGUSR2	2011-05-20 03:25:00 -03:00
Pablo Hoffman	6069b0e5b2	Fixed 100% cpu loop that ocurred in some cases where Scrapy was shutting donw	2011-05-20 03:21:36 -03:00
Pablo Hoffman	951ba507f9	Removed support for default values in Scrapy items, which have proven confusing in the past	2011-05-19 21:42:46 -03:00
Pablo Hoffman	503f302010	removed remaining references to scheduler middleware from doc, as it will be removed on next release	2011-05-18 19:48:48 -03:00
Pablo Hoffman	3fd17432cf	fixed outdated documentation	2011-05-18 14:46:20 -03:00
Pablo Hoffman	9016e7e993	added role to link to scrapy source code (not yet used)	2011-05-18 14:43:34 -03:00
Pablo Hoffman	a98e9e054b	minor fix to spider closed count stat	2011-05-18 12:45:19 -03:00
Pablo Hoffman	cd85c12c33	Some Link extractor improvements: * added support for ignoring common file extensions that are not followed if they occur in links * fixed link extractor documentation issues * slighly improved performance of applying filters * added link to link extractors doc from documentation index	2011-05-18 12:32:34 -03:00
Pablo Hoffman	495152bd50	disabled verbose depth stats collection by default, added DEPTH_STATS_VERBOSE setting to enable it	2011-05-18 11:04:48 -03:00
Pablo Hoffman	accb6ed830	dump stats to log by default (ie. change default value of STATS_DUMP to True)	2011-05-17 22:42:05 -03:00
Pablo Hoffman	315457c2ef	added support for -a option to runspider command (like it works with crawl command)	2011-05-17 22:07:49 -03:00
Pablo Hoffman	ab6a4d053f	minor code improvement	2011-05-16 09:56:32 -03:00
Pablo Hoffman	d29eccba56	AutoThrottle: added missing line to connect spider_closed hanlder	2011-05-16 09:42:44 -03:00
Pablo Hoffman	403dc536e2	improved documentation of AutoThrottle extension	2011-05-15 06:07:26 -03:00
Pablo Hoffman	2b933a4a8c	added AutoThrottle extension (still under testing, not yet enabled by default)	2011-05-15 05:39:58 -03:00
Pablo Hoffman	bd8d7f5cf4	collect download latencies in 'download_latency' request/response meta key	2011-05-15 05:24:01 -03:00
Pablo Hoffman	668dfcabf3	send the response_received signal from the engine, after tying it with the corresponding request	2011-05-15 05:20:14 -03:00
Pablo Hoffman	f9aa819b06	scraper: minor performance improvement by using collections.deque() as in downloader (see previous commit)	2011-05-14 21:50:14 -03:00
Pablo Hoffman	079de67719	downloader: minor performance improvement by using collections.deque() to avoid the list.pop(0) call which is O(n)	2011-05-14 21:47:25 -03:00
Pablo Hoffman	7e62a0a1a1	Downloader: Added support for dynamically adjusting download delay and maximum concurrent requests	2011-05-14 21:35:46 -03:00
Pablo Hoffman	bac46ba438	make sure Request.method is always str	2011-05-02 01:11:19 -03:00
Pablo Hoffman	afa23688c6	fixed bug in scrapy.http.Headers: values weren't being encoded to str when passed as lists	2011-05-01 19:39:13 -03:00
Pablo Hoffman	7f97259ba7	added w3lib to requirements, in installation guide	2011-05-01 11:14:57 -03:00
Pablo Hoffman	718428c0ab	debian/control: added python-setuptools to Recommends, because it's need by 'scrapy deploy' command	2011-05-01 11:00:02 -03:00
Pablo Hoffman	d08281a44f	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-04-30 01:35:43 -03:00
Pablo Hoffman	4a83167698	fixed small doc typo	2011-04-30 01:35:30 -03:00
Pablo Hoffman	cf572bb642	removed experimental examples	2011-04-28 18:07:23 -03:00
Pablo Hoffman	bb2b67c862	updated tutorial to use 'dmoz' as the name of the spider instead of 'dmoz.org', so that it's more similar to the dirbot example project	2011-04-28 09:31:57 -03:00
Pablo Hoffman	bf73002428	removed googledir example, replaced by dirbot project on github. updated docs accordingly	2011-04-28 02:28:39 -03:00
Pablo Hoffman	b12dd76bb8	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-04-25 09:31:18 -03:00
Pablo Hoffman	678f08bc1b	added warning about using 'parse' as callback in crawl spider rules	2011-04-25 09:30:42 -03:00
Pablo Hoffman	18d303b5f1	ported internal scrapy.utils imports to w3lib	2011-04-19 01:33:52 -03:00
Pablo Hoffman	fcc8d73840	Removed scrapy.contrib.ibl module (and submodules). They have been moved to a new library "scrapely". See https://github.com/scrapy/scrapely	2011-04-19 01:04:22 -03:00
Pablo Hoffman	ebcbb9f453	debian: added python-w3lib package to dependencies	2011-04-19 00:55:08 -03:00
Pablo Hoffman	b10f4fae35	Moved several functions from scrapy.utils.{http,markup,multipart,response,url} (and their tests) to a new library called 'w3lib'. Scrapy will now depend on w3lib.	2011-04-18 22:37:19 -03:00
Pablo Hoffman	ad496eb3b6	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-04-14 12:36:27 -03:00
Pablo Hoffman	ecb4f44cbc	Added clarification on how to work with local settings and scrapy deploy	2011-04-14 12:36:09 -03:00
Pablo Hoffman	6f262a198c	Added IOError to the list of exceptions to retry in the RetryMiddleware	2011-04-12 18:12:36 -03:00
Pablo Hoffman	7c49e8679c	fixed typo	2011-04-07 02:04:42 -03:00
Pablo Hoffman	3ee2c94e93	Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it	2011-04-06 14:54:48 -03:00
Pablo Hoffman	8a5c08a6bc	added join_multivalued parameter to CsvItemExporter	2011-03-24 13:15:52 -03:00
Pablo Hoffman	84dee1f77f	removed unused function	2011-03-24 09:03:57 -03:00
Pablo Hoffman	3954e600ca	added DBM storage backend for HTTP cache	2011-03-23 21:32:02 -03:00
Pablo Hoffman	60f6a9b054	moved scrapyd python module to scrapy debian package. left scrapyd package only for installing service (upstart script) and scrapy user	2011-03-23 15:45:40 -03:00
Shane Evans	407f7f2d65	fix minor error in IBL tests and post-processing nested annoations	2011-03-11 21:58:47 +00:00
Shane Evans	bc6eee71e3	refactor IBL extraction to allow processing parsed data --HG-- extra : rebase_source : 1a0ec4322702288f6e996d384d45a36deede3868	2011-03-11 20:02:29 +00:00
Pablo Hoffman	9591413d9d	added Jochen Maes to AUTHORS	2011-03-09 14:24:17 -02:00
Jochen Maes	47a7f154ab	Add listjobs.json to Scrapyd API You can use listjobs.json with project=<projectname> to get a list of projects that are running currently. It returns a list of jobs with spidername and job-id. Signed-off-by: Jochen Maes <jochen.maes@sejo.be> --- scrapyd/webservice.py \| 9 +++++++++ scrapyd/website.py \| 1 + 2 files changed, 10 insertions(+), 0 deletions(-)	2011-03-09 14:22:10 -02:00
Pablo Hoffman	99033d91c3	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-03-09 12:41:26 -02:00
Pablo Hoffman	36431a1439	Silenced confusing sqlite3.ProgrammingError exception. For more info see: http://twistedmatrix.com/trac/ticket/4040	2011-03-09 12:39:24 -02:00

1 2 3 4 5 ...

2619 Commits