scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 23:43:59 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	5879389ad0	Bumped version to 0.12	2011-01-02 16:16:40 -02:00
Pablo Hoffman	fa644f7a5e	Some simplifications to Scrapyd architecture and internals: - launcher no longer knows about egg storage - removed get_spider_list_from_eggifile() file and replaced by simpler get_spider_list() which doesn't receive en egg file as argument - changed "egg runner" name to just "runner" to reflect the fact that it doesn't necesarilly run eggs (though it does in the default case) --HG-- rename : scrapyd/eggrunner.py => scrapyd/runner.py	2010-12-27 16:22:32 -02:00
Pablo Hoffman	633ebc4c43	minor indentation improvement	2010-12-23 13:04:49 -02:00
Pablo Hoffman	db07a9a938	Added notice to documentation, pointing dev to stable versions and viceversa	2010-12-23 13:03:40 -02:00
Pablo Hoffman	544308d6d0	updated ubuntu repos doc, in preparation for the 0.11 release	2010-12-21 11:02:56 -02:00
Pablo Hoffman	002abf204f	Updated item_passed signal to send passed item in 'item' argument, instead of 'output' argument, keeping backwards compatibility for the 'output' argument. Closes #273	2010-12-13 14:05:47 -02:00
Pablo Hoffman	f984d438a0	updated docs to use scrapy version on aptitude install lines	2010-12-13 14:02:42 -02:00
Pablo Hoffman	119fd20e91	Added verbose option to 'version' command. Closes #298	2010-12-13 00:32:44 -02:00
Pablo Hoffman	6a1b69c93f	renamed command 'scrapyd' to 'server', and deprecated 'runserver' and 'queue' commands --HG-- rename : scrapy/commands/scrapyd.py => scrapy/commands/server.py	2010-11-30 20:23:27 -02:00
Pablo Hoffman	df54ed0041	Some Scrapyd enhancements: * added minimal web ui * return unique id per job (spider scheduled) * store one log per spider run (job) and rotate them, keeping the last N logs (where N is configurable through settings)	2010-11-30 02:26:31 -02:00
Pablo Hoffman	bbffa59497	Some changes to Scrapyd: * Always start one process per spider * Added max_proc_per_cpu option (defaults to 4) * Return the number of spiders (instead of a list of them) in schedule.json	2010-11-29 17:19:05 -02:00
Pablo Hoffman	2557777c39	Updated doc referring to HTTP cache middleware	2010-11-24 13:27:44 -02:00
Pablo Hoffman	426b6fa100	docs/intro/install.rst: added -U flag to easy_install command	2010-11-22 13:50:19 -02:00
Pablo Hoffman	91a7c25797	* Made Response.meta attribute map to Request.meta attribute. Closes #290 * Record redirected URLs in redirect middleware. Closes #291	2010-11-18 12:51:54 -02:00
Pablo Hoffman	ac007802d6	Simplified installation guide, including lxml as alternative dependency to libxml2. Closes #280	2010-11-17 21:32:23 -02:00
Pablo Hoffman	5a5364d0c1	Updated documentation to point out that simplejson is now required if using Python 2.5, and to recommended switching to Python 2.6	2010-11-16 03:31:04 -02:00
Pablo Hoffman	d988ca1ec2	Some changes to scrapy deploy command: * changed deploy section names to [deploy:target] * project is now passed through a -p\|--project option * version can now be set in the target configuration * switched meaning of -l and -L options * updated documentation accordingly	2010-11-08 17:01:06 -02:00
Pablo Hoffman	0f69e7a191	Some changes to HTTP Cache middleware: * made it use the project data storage by default (closes #279) * added HTTPCACHE_ENABLED setting (False by default) to enable it * made HTTPCACHE_DIR = 'httpcache' by default (inside the project data storage) * simplified HTTPCACHE_EXPIRATION_SECS semantics: zero means don't expire, dropped support for negative numbers * other minor doc improvements	2010-11-01 02:38:15 -02:00
Pablo Hoffman	3c94c6cb9b	fixed sphinx doc id	2010-11-01 02:31:20 -02:00
dfdeshom	130276605b	Bind the web server and telnet server to a configurable interface (WEBSERVICE_HOST). The default is to bind to all interfaces. Also add documentation for WEBSERVICE_HOST and TELNETCONSOLE_HOST.	2010-11-01 00:59:04 -02:00
Pablo Hoffman	b76c5c597f	* Added support for project data storage (closes #276 ) * Documented project file structure * Moved default location of SQLite database to project data storage dir (closes #277)	2010-10-31 03:25:37 -02:00
Pablo Hoffman	dfa6745e91	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-10-30 16:05:53 -02:00
Pablo Hoffman	a0d9b43031	fixed typo in scrapyd doc	2010-10-30 16:05:32 -02:00
Pablo Hoffman	d67152ab0f	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-10-30 01:56:12 -02:00
Pablo Hoffman	75451cbe84	scrapyd doc: fixed delversion.json example	2010-10-30 01:56:00 -02:00
Pablo Hoffman	a59bfb539d	* Added lxml backend for XPath selectors. Closes #147 * Added new setting (SELECTORS_BACKEND) to choose which backend to use * Deprecated the extract_unquoted() function from selectors * Made libxml2 optional by adding a dummy selector backend. Closes #260 --HG-- rename : scrapy/tests/test_selector.py => scrapy/tests/test_selector_libxml2.py	2010-10-25 14:47:10 -02:00
Pablo Hoffman	6c921896a5	Expanded documentation on deploy command and versions. Refs #261	2010-10-19 00:11:45 -02:00
Pablo Hoffman	1d567cdce6	Added new 'deploy' command. Closes #261	2010-10-18 22:38:46 -02:00
Pablo Hoffman	7d8f922df9	Added documentation for CLOSESPIDER_ERRORCOUNT setting. Refs #254	2010-10-18 22:36:30 -02:00
Pablo Hoffman	c96f17c43d	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-10-18 03:21:21 -02:00
Pablo Hoffman	98662e53ea	Formatting fix in Scrapyd doc	2010-10-17 03:20:23 -02:00
Pablo Hoffman	a3d85da96f	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-10-16 19:54:24 -02:00
Pablo Hoffman	5f65c26080	Some minor improvements to feature list in Scrapy at a Glance documentation page	2010-10-16 19:02:08 -02:00
Pablo Hoffman	d5c8caf07b	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-10-10 20:31:38 -02:00
Pablo Hoffman	b4fbc6c5fa	Updated Scrapy Tutorial to reference feed exports, instead a custom written pipeline, and extended item pipeline documentation to include a JSON writer.	2010-10-10 20:31:05 -02:00
Pablo Hoffman	aa4142e4ba	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-10-07 18:23:48 -02:00
Pablo Hoffman	f4accb6c7f	Updated dmoz xpaths of Scrapy tutorial	2010-10-07 18:22:01 -02:00
Pablo Hoffman	7826869cb2	Added missing colon	2010-09-28 16:44:53 -03:00
Martin Santos	0bf9e4627c	added support to CloseSpider extension, for close the spider after N pages have been crawled. Using the CLOSESPIDER_PAGECOUNT setting. closes #253	2010-09-28 16:29:37 -03:00
Pablo Hoffman	279dcc245f	Fixed role name in Sphinx doc	2010-09-26 01:01:06 -03:00
Pablo Hoffman	9599bde3e9	Removed RequestLimitMiddleware	2010-09-22 16:09:13 -03:00
Pablo Hoffman	ed4aec187f	Ported code to use new unified access to spider settings, keeping backwards compatibility for old spider attributes. Refs #245	2010-09-22 16:09:13 -03:00
Pablo Hoffman	b6c2b55e5b	Splitted settings classes from settings singleton. Closes #244 --HG-- rename : scrapy/conf/__init__.py => scrapy/conf.py rename : scrapy/conf/default_settings.py => scrapy/settings/default_settings.py rename : scrapy/tests/test_conf.py => scrapy/tests/test_settings.py	2010-09-22 15:47:33 -03:00
Pablo Hoffman	2ebfa7e68d	Removed unneeded code (since autodoc is not used in Sphinx doc)	2010-09-22 10:52:02 -03:00
Shuaib	9288f622f9	Added formname parameter for FormRequest.from_response	2010-09-20 08:33:24 -03:00
Pablo Hoffman	bf467fc37a	Check 'dont_merge_cookies' membership in request.meta, instead of getting its value	2010-09-10 15:29:15 -03:00
Pablo Hoffman	7d14a52234	Reference dont_merge_cookies in list of special Request.meta keys	2010-09-09 21:54:26 -03:00
Pablo Hoffman	7f21a6384f	Documented handle_httpstatus_list request.meta key	2010-09-09 21:50:40 -03:00
Pablo Hoffman	f1c943543a	Added dont_retry request.meta key to make RetryMiddleware ignore requests. Closes #234	2010-09-09 21:43:44 -03:00
Pablo Hoffman	9f01e3e79e	Added dont_redirect request.meta key to make RedirectMiddleware ignore requests. Closes #233	2010-09-09 21:37:35 -03:00

1 2 3 4 5 ...

485 Commits