Pablo Hoffman
6a1b69c93f
renamed command 'scrapyd' to 'server', and deprecated 'runserver' and 'queue' commands
...
--HG--
rename : scrapy/commands/scrapyd.py => scrapy/commands/server.py
2010-11-30 20:23:27 -02:00
Pablo Hoffman
df54ed0041
Some Scrapyd enhancements:
...
* added minimal web ui
* return unique id per job (spider scheduled)
* store one log per spider run (job) and rotate them, keeping the last N logs (where N is configurable through settings)
2010-11-30 02:26:31 -02:00
Pablo Hoffman
bbffa59497
Some changes to Scrapyd:
...
* Always start one process per spider
* Added max_proc_per_cpu option (defaults to 4)
* Return the number of spiders (instead of a list of them) in schedule.json
2010-11-29 17:19:05 -02:00
Pablo Hoffman
2557777c39
Updated doc referring to HTTP cache middleware
2010-11-24 13:27:44 -02:00
Pablo Hoffman
91a7c25797
* Made Response.meta attribute map to Request.meta attribute. Closes #290
...
* Record redirected URLs in redirect middleware. Closes #291
2010-11-18 12:51:54 -02:00
Pablo Hoffman
d988ca1ec2
Some changes to scrapy deploy command:
...
* changed deploy section names to [deploy:target]
* project is now passed through a -p|--project option
* version can now be set in the target configuration
* switched meaning of -l and -L options
* updated documentation accordingly
2010-11-08 17:01:06 -02:00
Pablo Hoffman
0f69e7a191
Some changes to HTTP Cache middleware:
...
* made it use the project data storage by default (closes #279 )
* added HTTPCACHE_ENABLED setting (False by default) to enable it
* made HTTPCACHE_DIR = 'httpcache' by default (inside the project data storage)
* simplified HTTPCACHE_EXPIRATION_SECS semantics: zero means don't expire,
dropped support for negative numbers
* other minor doc improvements
2010-11-01 02:38:15 -02:00
Pablo Hoffman
3c94c6cb9b
fixed sphinx doc id
2010-11-01 02:31:20 -02:00
dfdeshom
130276605b
Bind the web server and telnet server to a configurable interface (WEBSERVICE_HOST). The default is to bind to all interfaces. Also add documentation for WEBSERVICE_HOST and TELNETCONSOLE_HOST.
2010-11-01 00:59:04 -02:00
Pablo Hoffman
b76c5c597f
* Added support for project data storage ( closes #276 )
...
* Documented project file structure
* Moved default location of SQLite database to project data storage dir (closes #277 )
2010-10-31 03:25:37 -02:00
Pablo Hoffman
dfa6745e91
Automated merge with http://hg.scrapy.org/scrapy-0.10
2010-10-30 16:05:53 -02:00
Pablo Hoffman
a0d9b43031
fixed typo in scrapyd doc
2010-10-30 16:05:32 -02:00
Pablo Hoffman
d67152ab0f
Automated merge with http://hg.scrapy.org/scrapy-0.10
2010-10-30 01:56:12 -02:00
Pablo Hoffman
75451cbe84
scrapyd doc: fixed delversion.json example
2010-10-30 01:56:00 -02:00
Pablo Hoffman
a59bfb539d
* Added lxml backend for XPath selectors. Closes #147
...
* Added new setting (SELECTORS_BACKEND) to choose which backend to use
* Deprecated the extract_unquoted() function from selectors
* Made libxml2 optional by adding a dummy selector backend. Closes #260
--HG--
rename : scrapy/tests/test_selector.py => scrapy/tests/test_selector_libxml2.py
2010-10-25 14:47:10 -02:00
Pablo Hoffman
6c921896a5
Expanded documentation on deploy command and versions. Refs #261
2010-10-19 00:11:45 -02:00
Pablo Hoffman
1d567cdce6
Added new 'deploy' command. Closes #261
2010-10-18 22:38:46 -02:00
Pablo Hoffman
7d8f922df9
Added documentation for CLOSESPIDER_ERRORCOUNT setting. Refs #254
2010-10-18 22:36:30 -02:00
Pablo Hoffman
c96f17c43d
Automated merge with http://hg.scrapy.org/scrapy-0.10
2010-10-18 03:21:21 -02:00
Pablo Hoffman
98662e53ea
Formatting fix in Scrapyd doc
2010-10-17 03:20:23 -02:00
Pablo Hoffman
d5c8caf07b
Automated merge with http://hg.scrapy.org/scrapy-0.10
2010-10-10 20:31:38 -02:00
Pablo Hoffman
b4fbc6c5fa
Updated Scrapy Tutorial to reference feed exports, instead a custom written pipeline, and extended item pipeline documentation to include a JSON writer.
2010-10-10 20:31:05 -02:00
Pablo Hoffman
7826869cb2
Added missing colon
2010-09-28 16:44:53 -03:00
Martin Santos
0bf9e4627c
added support to CloseSpider extension, for close the spider after N pages have been crawled. Using the CLOSESPIDER_PAGECOUNT setting. closes #253
2010-09-28 16:29:37 -03:00
Pablo Hoffman
279dcc245f
Fixed role name in Sphinx doc
2010-09-26 01:01:06 -03:00
Pablo Hoffman
9599bde3e9
Removed RequestLimitMiddleware
2010-09-22 16:09:13 -03:00
Pablo Hoffman
ed4aec187f
Ported code to use new unified access to spider settings, keeping backwards compatibility for old spider attributes. Refs #245
2010-09-22 16:09:13 -03:00
Pablo Hoffman
b6c2b55e5b
Splitted settings classes from settings singleton. Closes #244
...
--HG--
rename : scrapy/conf/__init__.py => scrapy/conf.py
rename : scrapy/conf/default_settings.py => scrapy/settings/default_settings.py
rename : scrapy/tests/test_conf.py => scrapy/tests/test_settings.py
2010-09-22 15:47:33 -03:00
Shuaib
9288f622f9
Added formname parameter for FormRequest.from_response
2010-09-20 08:33:24 -03:00
Pablo Hoffman
bf467fc37a
Check 'dont_merge_cookies' membership in request.meta, instead of getting its value
2010-09-10 15:29:15 -03:00
Pablo Hoffman
7d14a52234
Reference dont_merge_cookies in list of special Request.meta keys
2010-09-09 21:54:26 -03:00
Pablo Hoffman
7f21a6384f
Documented handle_httpstatus_list request.meta key
2010-09-09 21:50:40 -03:00
Pablo Hoffman
f1c943543a
Added dont_retry request.meta key to make RetryMiddleware ignore requests. Closes #234
2010-09-09 21:43:44 -03:00
Pablo Hoffman
9f01e3e79e
Added dont_redirect request.meta key to make RedirectMiddleware ignore requests. Closes #233
2010-09-09 21:37:35 -03:00
Pablo Hoffman
7da79b90fe
Make url/body attributes of Request/Response objects read-only - use replace() to change them. Deprecation warning left for backwards compatibilty.
2010-09-08 00:15:11 -03:00
Pablo Hoffman
c1aab2f58e
Copy callback/errback attributes when copying Requests
2010-09-08 00:15:09 -03:00
Pablo Hoffman
e9ebebb230
Removed UrlFilterMiddleware from scrapy.contrib - see this snippet for an alternative: http://snippets.scrapy.org/snippets/12/
2010-09-07 17:51:02 -03:00
Daniel Grana
12b04b068f
make download_timeout configurable by request. closes #229
...
--HG--
extra : rebase_source : e57dfd4aeb98d48b04fc4d0c6469e9a85e4b33a8
2010-09-07 13:01:40 -03:00
Pablo Hoffman
9158e9d682
Some changes to Scrapyd to support multiple configuration files, to make it easier to deploy Scrapyd applications. Also documented 'egg_runner' and 'application' options
...
--HG--
rename : debian/scrapyd.cfg => debian/000-default
rename : scrapyd/default_scrapyd.cfg => scrapyd/default_scrapyd.conf
2010-09-07 09:17:25 -03:00
Daniel Grana
3414bf13ee
remove request_uploaded signal and move response_received and response_downloaded to downloader manager. closes #228
...
--HG--
extra : rebase_source : 4af0d2a01b34de8a21048bb7f4a66bfc484b3b8f
2010-09-06 23:23:14 -03:00
Pablo Hoffman
766f2d910d
Renamed Request Handlers to Download Handlers
2010-09-05 19:35:53 -03:00
Pablo Hoffman
a5cf71cb06
Updated Ubuntu package signing key location
2010-09-05 19:04:15 -03:00
Pablo Hoffman
6bf52fb50e
Make telnet console and web service try a range of ports for binding, instead of just one. Closes #226
2010-09-05 06:48:08 -03:00
Pablo Hoffman
14e985b076
Updated Command line tool documentation
2010-09-05 05:29:58 -03:00
Pablo Hoffman
1190f97944
Updated settings documentation
2010-09-05 04:58:14 -03:00
Pablo Hoffman
ebdb733e95
Updated some old messages in Scrapy shell doc
2010-09-05 04:45:43 -03:00
Pablo Hoffman
bf34094e5a
Added versionadded:: notice to new documentation topics
2010-09-04 03:30:45 -03:00
Daniel Grana
9f4b1e47a4
damn, really fix httpcache docs
2010-09-04 03:26:41 -03:00
Daniel Grana
7ad901640b
fix httpcache docs
2010-09-04 03:23:08 -03:00
Daniel Grana
1abaa79469
Make ignored schemes configurable in HttpCacheMiddleware. closes #224
...
--HG--
extra : rebase_source : 2e6e8b93c642290f9bd6eb634eb4c8cd6da07c75
2010-09-04 02:58:43 -03:00