scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 01:55:10 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	bb2b67c862	updated tutorial to use 'dmoz' as the name of the spider instead of 'dmoz.org', so that it's more similar to the dirbot example project	2011-04-28 09:31:57 -03:00
Pablo Hoffman	bf73002428	removed googledir example, replaced by dirbot project on github. updated docs accordingly	2011-04-28 02:28:39 -03:00
Pablo Hoffman	b12dd76bb8	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-04-25 09:31:18 -03:00
Pablo Hoffman	678f08bc1b	added warning about using 'parse' as callback in crawl spider rules	2011-04-25 09:30:42 -03:00
Pablo Hoffman	ad496eb3b6	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-04-14 12:36:27 -03:00
Pablo Hoffman	ecb4f44cbc	Added clarification on how to work with local settings and scrapy deploy	2011-04-14 12:36:09 -03:00
Pablo Hoffman	3ee2c94e93	Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it	2011-04-06 14:54:48 -03:00
Pablo Hoffman	8a5c08a6bc	added join_multivalued parameter to CsvItemExporter	2011-03-24 13:15:52 -03:00
Pablo Hoffman	3954e600ca	added DBM storage backend for HTTP cache	2011-03-23 21:32:02 -03:00
Pablo Hoffman	cfd11df539	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-24 15:28:57 -02:00
Pablo Hoffman	8f7e163b04	Fixed wrong method name in downloader middleware documentation	2011-02-24 15:26:32 -02:00
Daniel Grana	c55355642c	fix FAQ typos reported by marlun_ at #scrapy IRC channel	2011-02-16 08:57:42 -02:00
Pablo Hoffman	1fb55bdaf0	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-15 07:25:12 -02:00
Pablo Hoffman	16d9a33951	added FAQ entry about working with big data feeds	2011-02-15 07:24:52 -02:00
Pablo Hoffman	936353d5f1	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-09 11:20:46 -02:00
Pablo Hoffman	181d1c09ae	Fixed typo and code indentation in the doc. Closes #307 and #308	2011-02-09 11:19:46 -02:00
Pablo Hoffman	c91f0d9ea1	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-04 13:39:54 -02:00
Pablo Hoffman	c5499ead73	Clarified behaviour when multiple rules match the same link in CrawlSpider	2011-02-04 13:39:12 -02:00
Pablo Hoffman	d7f193cbea	bumped version to 0.13 in documentation	2011-01-02 17:29:43 -02:00
Pablo Hoffman	b56e933be9	bumped version to 0.12 in documentation	2011-01-02 17:28:33 -02:00
Pablo Hoffman	5879389ad0	Bumped version to 0.12	2011-01-02 16:16:40 -02:00
Pablo Hoffman	fa644f7a5e	Some simplifications to Scrapyd architecture and internals: - launcher no longer knows about egg storage - removed get_spider_list_from_eggifile() file and replaced by simpler get_spider_list() which doesn't receive en egg file as argument - changed "egg runner" name to just "runner" to reflect the fact that it doesn't necesarilly run eggs (though it does in the default case) --HG-- rename : scrapyd/eggrunner.py => scrapyd/runner.py	2010-12-27 16:22:32 -02:00
Pablo Hoffman	633ebc4c43	minor indentation improvement	2010-12-23 13:04:49 -02:00
Pablo Hoffman	db07a9a938	Added notice to documentation, pointing dev to stable versions and viceversa	2010-12-23 13:03:40 -02:00
Pablo Hoffman	544308d6d0	updated ubuntu repos doc, in preparation for the 0.11 release	2010-12-21 11:02:56 -02:00
Pablo Hoffman	002abf204f	Updated item_passed signal to send passed item in 'item' argument, instead of 'output' argument, keeping backwards compatibility for the 'output' argument. Closes #273	2010-12-13 14:05:47 -02:00
Pablo Hoffman	f984d438a0	updated docs to use scrapy version on aptitude install lines	2010-12-13 14:02:42 -02:00
Pablo Hoffman	119fd20e91	Added verbose option to 'version' command. Closes #298	2010-12-13 00:32:44 -02:00
Pablo Hoffman	6a1b69c93f	renamed command 'scrapyd' to 'server', and deprecated 'runserver' and 'queue' commands --HG-- rename : scrapy/commands/scrapyd.py => scrapy/commands/server.py	2010-11-30 20:23:27 -02:00
Pablo Hoffman	df54ed0041	Some Scrapyd enhancements: * added minimal web ui * return unique id per job (spider scheduled) * store one log per spider run (job) and rotate them, keeping the last N logs (where N is configurable through settings)	2010-11-30 02:26:31 -02:00
Pablo Hoffman	bbffa59497	Some changes to Scrapyd: * Always start one process per spider * Added max_proc_per_cpu option (defaults to 4) * Return the number of spiders (instead of a list of them) in schedule.json	2010-11-29 17:19:05 -02:00
Pablo Hoffman	2557777c39	Updated doc referring to HTTP cache middleware	2010-11-24 13:27:44 -02:00
Pablo Hoffman	426b6fa100	docs/intro/install.rst: added -U flag to easy_install command	2010-11-22 13:50:19 -02:00
Pablo Hoffman	91a7c25797	* Made Response.meta attribute map to Request.meta attribute. Closes #290 * Record redirected URLs in redirect middleware. Closes #291	2010-11-18 12:51:54 -02:00
Pablo Hoffman	ac007802d6	Simplified installation guide, including lxml as alternative dependency to libxml2. Closes #280	2010-11-17 21:32:23 -02:00
Pablo Hoffman	5a5364d0c1	Updated documentation to point out that simplejson is now required if using Python 2.5, and to recommended switching to Python 2.6	2010-11-16 03:31:04 -02:00
Pablo Hoffman	d988ca1ec2	Some changes to scrapy deploy command: * changed deploy section names to [deploy:target] * project is now passed through a -p\|--project option * version can now be set in the target configuration * switched meaning of -l and -L options * updated documentation accordingly	2010-11-08 17:01:06 -02:00
Pablo Hoffman	0f69e7a191	Some changes to HTTP Cache middleware: * made it use the project data storage by default (closes #279) * added HTTPCACHE_ENABLED setting (False by default) to enable it * made HTTPCACHE_DIR = 'httpcache' by default (inside the project data storage) * simplified HTTPCACHE_EXPIRATION_SECS semantics: zero means don't expire, dropped support for negative numbers * other minor doc improvements	2010-11-01 02:38:15 -02:00
Pablo Hoffman	3c94c6cb9b	fixed sphinx doc id	2010-11-01 02:31:20 -02:00
dfdeshom	130276605b	Bind the web server and telnet server to a configurable interface (WEBSERVICE_HOST). The default is to bind to all interfaces. Also add documentation for WEBSERVICE_HOST and TELNETCONSOLE_HOST.	2010-11-01 00:59:04 -02:00
Pablo Hoffman	b76c5c597f	* Added support for project data storage (closes #276 ) * Documented project file structure * Moved default location of SQLite database to project data storage dir (closes #277)	2010-10-31 03:25:37 -02:00
Pablo Hoffman	dfa6745e91	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-10-30 16:05:53 -02:00
Pablo Hoffman	a0d9b43031	fixed typo in scrapyd doc	2010-10-30 16:05:32 -02:00
Pablo Hoffman	d67152ab0f	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-10-30 01:56:12 -02:00
Pablo Hoffman	75451cbe84	scrapyd doc: fixed delversion.json example	2010-10-30 01:56:00 -02:00
Pablo Hoffman	a59bfb539d	* Added lxml backend for XPath selectors. Closes #147 * Added new setting (SELECTORS_BACKEND) to choose which backend to use * Deprecated the extract_unquoted() function from selectors * Made libxml2 optional by adding a dummy selector backend. Closes #260 --HG-- rename : scrapy/tests/test_selector.py => scrapy/tests/test_selector_libxml2.py	2010-10-25 14:47:10 -02:00
Pablo Hoffman	6c921896a5	Expanded documentation on deploy command and versions. Refs #261	2010-10-19 00:11:45 -02:00
Pablo Hoffman	1d567cdce6	Added new 'deploy' command. Closes #261	2010-10-18 22:38:46 -02:00
Pablo Hoffman	7d8f922df9	Added documentation for CLOSESPIDER_ERRORCOUNT setting. Refs #254	2010-10-18 22:36:30 -02:00
Pablo Hoffman	c96f17c43d	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-10-18 03:21:21 -02:00

... 3 4 5 6 7 ...

655 Commits