scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 23:44:01 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	db5cae7c03	SitemapSpider: added support for filtering which sitemaps to follow (patch contributed by Rolando Espinoza). closes #330	2011-06-23 18:18:29 -03:00
Pablo Hoffman	57c43fdce6	added SitemapSpider, with tests and doc	2011-06-15 11:54:34 -03:00
Pablo Hoffman	91dc46539f	added LogStats extension for periodically logging basic stats (like crawled pages and scraped items)	2011-06-14 00:50:05 -03:00
Pablo Hoffman	841e9913db	renamed CLOSESPIDER_ITEMPASSED setting to CLOSESPIDER_ITEMCOUNT, to follow the refactoring done in r2630	2011-06-13 16:58:51 -03:00
Pablo Hoffman	474cba512c	simplified MemoryDebugger extension to use stats for dumping memory debugging info	2011-06-06 03:13:28 -03:00
Pablo Hoffman	5fbc32c015	call stats collector engine_stopped() after the engine is closed (to make sure all data from extensions has been collected), and added that method to documented api	2011-06-06 03:12:40 -03:00
Pablo Hoffman	9d9c8877da	added 'scrapy edit' command	2011-06-05 22:02:56 -03:00
Pablo Hoffman	1bc2339bb8	Merged item passed and item scraped concepts, as they have often proved confusing in the past. This means: * original item_scraped signal was removed * original item_passed signal was renamed to item_scraped * old log lines "Scraped Item..." removed * old log lines "Passed Item..." renamed to "Scraped Item..."	2011-06-03 01:13:00 -03:00
Pablo Hoffman	e6091df551	fixed doc typo	2011-05-30 09:04:31 -03:00
Pablo Hoffman	1d98fc8fb5	added spider_error signal	2011-05-29 22:38:17 -03:00
Pablo Hoffman	2fa0f75f2d	added COOKIES_ENABLED setting to support disabling the cookies middleware	2011-05-27 00:35:34 -03:00
Pablo Hoffman	d72d3f4607	stack trace dump extension: also dump engine status, and support triggering it with SIGQUIT, besides SIGUSR2	2011-05-20 03:25:00 -03:00
Pablo Hoffman	951ba507f9	Removed support for default values in Scrapy items, which have proven confusing in the past	2011-05-19 21:42:46 -03:00
Pablo Hoffman	503f302010	removed remaining references to scheduler middleware from doc, as it will be removed on next release	2011-05-18 19:48:48 -03:00
Pablo Hoffman	3fd17432cf	fixed outdated documentation	2011-05-18 14:46:20 -03:00
Pablo Hoffman	cd85c12c33	Some Link extractor improvements: * added support for ignoring common file extensions that are not followed if they occur in links * fixed link extractor documentation issues * slighly improved performance of applying filters * added link to link extractors doc from documentation index	2011-05-18 12:32:34 -03:00
Pablo Hoffman	495152bd50	disabled verbose depth stats collection by default, added DEPTH_STATS_VERBOSE setting to enable it	2011-05-18 11:04:48 -03:00
Pablo Hoffman	accb6ed830	dump stats to log by default (ie. change default value of STATS_DUMP to True)	2011-05-17 22:42:05 -03:00
Pablo Hoffman	b12dd76bb8	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-04-25 09:31:18 -03:00
Pablo Hoffman	678f08bc1b	added warning about using 'parse' as callback in crawl spider rules	2011-04-25 09:30:42 -03:00
Pablo Hoffman	ad496eb3b6	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-04-14 12:36:27 -03:00
Pablo Hoffman	ecb4f44cbc	Added clarification on how to work with local settings and scrapy deploy	2011-04-14 12:36:09 -03:00
Pablo Hoffman	3ee2c94e93	Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it	2011-04-06 14:54:48 -03:00
Pablo Hoffman	8a5c08a6bc	added join_multivalued parameter to CsvItemExporter	2011-03-24 13:15:52 -03:00
Pablo Hoffman	3954e600ca	added DBM storage backend for HTTP cache	2011-03-23 21:32:02 -03:00
Pablo Hoffman	cfd11df539	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-24 15:28:57 -02:00
Pablo Hoffman	8f7e163b04	Fixed wrong method name in downloader middleware documentation	2011-02-24 15:26:32 -02:00
Pablo Hoffman	c91f0d9ea1	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-04 13:39:54 -02:00
Pablo Hoffman	c5499ead73	Clarified behaviour when multiple rules match the same link in CrawlSpider	2011-02-04 13:39:12 -02:00
Pablo Hoffman	d7f193cbea	bumped version to 0.13 in documentation	2011-01-02 17:29:43 -02:00
Pablo Hoffman	b56e933be9	bumped version to 0.12 in documentation	2011-01-02 17:28:33 -02:00
Pablo Hoffman	fa644f7a5e	Some simplifications to Scrapyd architecture and internals: - launcher no longer knows about egg storage - removed get_spider_list_from_eggifile() file and replaced by simpler get_spider_list() which doesn't receive en egg file as argument - changed "egg runner" name to just "runner" to reflect the fact that it doesn't necesarilly run eggs (though it does in the default case) --HG-- rename : scrapyd/eggrunner.py => scrapyd/runner.py	2010-12-27 16:22:32 -02:00
Pablo Hoffman	544308d6d0	updated ubuntu repos doc, in preparation for the 0.11 release	2010-12-21 11:02:56 -02:00
Pablo Hoffman	002abf204f	Updated item_passed signal to send passed item in 'item' argument, instead of 'output' argument, keeping backwards compatibility for the 'output' argument. Closes #273	2010-12-13 14:05:47 -02:00
Pablo Hoffman	f984d438a0	updated docs to use scrapy version on aptitude install lines	2010-12-13 14:02:42 -02:00
Pablo Hoffman	119fd20e91	Added verbose option to 'version' command. Closes #298	2010-12-13 00:32:44 -02:00
Pablo Hoffman	6a1b69c93f	renamed command 'scrapyd' to 'server', and deprecated 'runserver' and 'queue' commands --HG-- rename : scrapy/commands/scrapyd.py => scrapy/commands/server.py	2010-11-30 20:23:27 -02:00
Pablo Hoffman	df54ed0041	Some Scrapyd enhancements: * added minimal web ui * return unique id per job (spider scheduled) * store one log per spider run (job) and rotate them, keeping the last N logs (where N is configurable through settings)	2010-11-30 02:26:31 -02:00
Pablo Hoffman	bbffa59497	Some changes to Scrapyd: * Always start one process per spider * Added max_proc_per_cpu option (defaults to 4) * Return the number of spiders (instead of a list of them) in schedule.json	2010-11-29 17:19:05 -02:00
Pablo Hoffman	2557777c39	Updated doc referring to HTTP cache middleware	2010-11-24 13:27:44 -02:00
Pablo Hoffman	91a7c25797	* Made Response.meta attribute map to Request.meta attribute. Closes #290 * Record redirected URLs in redirect middleware. Closes #291	2010-11-18 12:51:54 -02:00
Pablo Hoffman	d988ca1ec2	Some changes to scrapy deploy command: * changed deploy section names to [deploy:target] * project is now passed through a -p\|--project option * version can now be set in the target configuration * switched meaning of -l and -L options * updated documentation accordingly	2010-11-08 17:01:06 -02:00
Pablo Hoffman	0f69e7a191	Some changes to HTTP Cache middleware: * made it use the project data storage by default (closes #279) * added HTTPCACHE_ENABLED setting (False by default) to enable it * made HTTPCACHE_DIR = 'httpcache' by default (inside the project data storage) * simplified HTTPCACHE_EXPIRATION_SECS semantics: zero means don't expire, dropped support for negative numbers * other minor doc improvements	2010-11-01 02:38:15 -02:00
Pablo Hoffman	3c94c6cb9b	fixed sphinx doc id	2010-11-01 02:31:20 -02:00
dfdeshom	130276605b	Bind the web server and telnet server to a configurable interface (WEBSERVICE_HOST). The default is to bind to all interfaces. Also add documentation for WEBSERVICE_HOST and TELNETCONSOLE_HOST.	2010-11-01 00:59:04 -02:00
Pablo Hoffman	b76c5c597f	* Added support for project data storage (closes #276 ) * Documented project file structure * Moved default location of SQLite database to project data storage dir (closes #277)	2010-10-31 03:25:37 -02:00
Pablo Hoffman	dfa6745e91	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-10-30 16:05:53 -02:00
Pablo Hoffman	a0d9b43031	fixed typo in scrapyd doc	2010-10-30 16:05:32 -02:00
Pablo Hoffman	d67152ab0f	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-10-30 01:56:12 -02:00
Pablo Hoffman	75451cbe84	scrapyd doc: fixed delversion.json example	2010-10-30 01:56:00 -02:00

1 2 3 4 5 ...

295 Commits