1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 15:43:48 +00:00

544 Commits

Author SHA1 Message Date
Pablo Hoffman
4f28ffcb2c removed no longer needed dependency on simplejson 2012-04-10 16:01:36 -03:00
Pablo Hoffman
6e8edbd72e switched default selectors backend to lxml 2012-04-10 15:52:14 -03:00
Pablo Hoffman
27018fced7 changed default user agent to Scrapy/0.15 (+http://scrapy.org) and removed no longer needed BOT_VERSION setting 2012-03-23 13:45:21 -03:00
Pablo Hoffman
8933e2f2be added REFERER_ENABLED setting, to control referer middleware 2012-03-22 16:35:14 -03:00
Jason Yeo
da826aa13d fixed minor mistake in Request objects documentation 2012-03-21 10:25:41 +08:00
Pablo Hoffman
175c70ad44 fixed minor defect in link extractors documentation 2012-03-20 22:56:45 -03:00
Pablo Hoffman
35fb01156e removed some obsolete remaining code related to sqlite support in scrapy 2012-03-16 11:55:55 -03:00
Pablo Hoffman
b6ae266546 Removed (very old and possibly broken) backwards compatibility support for Twisted 2.5 2012-03-15 00:28:24 -03:00
Pablo Hoffman
e521da2e2f Dropped support for Python 2.5. See: http://blog.scrapy.org/scrapy-dropping-support-for-python-25 2012-03-01 08:18:12 -02:00
Pablo Hoffman
2b16ebdc11 added minor clarification on cookiejar request meta key usage 2012-02-29 07:19:01 -02:00
lostsnow
5afe4f50c1 scrapyd: support bind to a specific ip address 2012-02-29 13:47:40 +08:00
Pablo Hoffman
81abb45000 fixed bug in new cookiejar documentation 2012-02-28 11:08:25 -02:00
Pablo Hoffman
26c8004125 added documentation for the new cookiejar Request.meta key 2012-02-27 19:58:58 -02:00
Pablo Hoffman
7fe7c3f3b1 MemoryUsage extension: close the spiders (instead of stopping the engine) when the limit is exceeded, providing a descriptive reason for the close. Also fixed default value of MEMUSAGE_ENABLED setting to match the documentation. 2012-02-23 17:05:06 -02:00
Pablo Hoffman
7b8942a648 updated StackTraceDump extension doc 2012-02-16 15:14:17 -02:00
Pablo Hoffman
ea77342b55 updated versioning doc according to recent changes 2012-01-05 11:50:28 -02:00
Pablo Hoffman
0b0bce7f3c scrapyd: added cancel.json and listjobs.json api methods to documentation 2012-01-05 11:23:25 -02:00
Pablo Hoffman
8f42633a94 scrapyd: added clarification about how to disable items feeds generation 2012-01-05 11:20:50 -02:00
Pablo Hoffman
dbda33efa6 scrapyd: added support for storing items by default
Items are stored the same way as logs, in jsonlines format.

Also renamed logs_to_keep setting to jobs_to_keep.
2012-01-03 23:08:54 -02:00
Pablo Hoffman
0be421fbf0 fixed reference to tutorial directory 2011-12-23 18:57:11 -02:00
Pablo Hoffman
41fd3c4f6c doc: removed duplicated callback argument from Request.replace() 2011-12-23 15:55:46 -02:00
Pablo Hoffman
0eeff76227 fixed formatting of scrapyd doc 2011-12-20 03:18:37 -02:00
Daniel Graña
bcb31988f2 change tutorial to follow changes on dmoz site 2011-12-14 13:03:31 -02:00
Pablo Hoffman
992af8d38f ubuntu repos: added support for oneiric release 2011-10-25 14:26:38 -02:00
Pablo Hoffman
c38c49d56a fixed PickeItemExporter bug, added unittest, and added pickle to suported feed exports formats 2011-10-25 02:36:51 -02:00
Pablo Hoffman
8bdf288428 made scrapyd doc more version agnostic 2011-10-23 05:29:54 -02:00
Pablo Hoffman
ade5efdc61 added -o option to scrapy crawl, a convenient shortcut for using feed exports 2011-10-22 20:53:49 -02:00
Pablo Hoffman
431441cb52 updated documentation to remove references to old issue tracker and mercurial repos 2011-09-25 13:06:24 -03:00
Pablo Hoffman
ce03ccd4ec updated documentation about DEPTH_PRIORITY and DFO/BFO crawls 2011-09-23 13:22:25 -03:00
Julien Duponchelle
b7c436343a scrapy deploy support git version 2011-09-21 22:17:08 +02:00
Pablo Hoffman
ab1c9cfc56 removed documentation header notifying about other documentation versions, as that's provided by readthedocs already 2011-09-14 02:39:32 -03:00
Daniel Grana
5f1b1c05f8 Do not filter requests with dont_filter attribute set in OffsiteMiddleware 2011-09-08 15:18:10 -03:00
Pablo Hoffman
bff3d31469 scrapyd: updated schedule.json response format 2011-09-04 09:29:24 -03:00
Pablo Hoffman
a1dbc62b45 removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead) 2011-09-02 18:27:39 -03:00
Pablo Hoffman
40f7075f11 added initial documentation about suspend and resume crawls 2011-09-02 13:12:27 -03:00
Pablo Hoffman
27dd68a690 added SpiderState extension 2011-09-02 13:06:59 -03:00
Pablo Hoffman
6a31ab667d minor fix to doc 2011-09-01 15:08:23 -03:00
Pablo Hoffman
d98b058c21 no longer recommend using labmda's in the doc, as they're not friendly with scheduler persistence 2011-09-01 15:06:49 -03:00
Pablo Hoffman
76af0cdd44 updated documentation and code to use -s instead of --set option 2011-09-01 14:35:37 -03:00
Pablo Hoffman
98b68ca89d scrapyd: documented support for passing setting to spiders in schedule.json 2011-08-27 01:31:12 -03:00
Pablo Hoffman
5c6b0631e2 minor doc fix 2011-08-19 11:42:03 -03:00
Pablo Hoffman
9d97e73a24 fixed priority handling on the new scheduler so that it's backwards compatible (ie. bigger priorities are higher). also fixed a few documentation bugs related to requests priority 2011-08-19 08:26:41 -03:00
Pablo Hoffman
a3697421c0 some minor updates to documentation 2011-08-11 09:19:59 -03:00
Pablo Hoffman
5da6ffb57b Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-08-11 09:11:19 -03:00
Pablo Hoffman
bc2d2183e9 fixed import in doc 2011-08-11 09:11:08 -03:00
Pablo Hoffman
19e6da59d8 added new downloader middleware: ChunkedTransferMiddleware 2011-08-09 03:03:25 -03:00
Pablo Hoffman
984be35461 Some telnet console changes:
* renamed manager alias to crawler
* added aliases: spider, slot
* fixed est() function
2011-08-08 15:01:08 -03:00
Pablo Hoffman
f7c0aeccc6 added note about engine_started signal 2011-08-07 03:57:09 -03:00
Pablo Hoffman
9f60c27612 added setting to support disabling DNS cache: DNSCACHE_ENABLED 2011-08-05 20:41:59 -03:00
Pablo Hoffman
cb95d7a5af added marshal to formats supported by feed exports 2011-08-03 16:16:48 -03:00