Daniel Graña
ccde3317d7
Merge pull request #816 from Curita/api-cleanup
...
GSoC API cleanup
2014-09-01 21:55:36 -03:00
Daniel Graña
a9292cfab7
jsonrpc webservice moved to https://github.com/scrapy/scrapy-jsonrpc repository
2014-08-15 23:28:13 -03:00
Julia Medina
d7038b2a13
SpiderManager interface cleanup
2014-08-12 14:02:55 -03:00
Julia Medina
8a3a6236d9
Settings topic updated
2014-06-10 11:26:50 -03:00
Alexander Chekunkov
fa5a67729b
DOWNLOADER setting
2014-06-02 13:11:02 +03:00
Alexander Chekunkov
1fba64d34e
DOWNLOADER setting
2014-06-02 13:05:22 +03:00
Alexander Chekunkov
4aa6179af7
added short RFPDupeFilter.request_fingerprint interface description
2014-04-26 15:46:53 +03:00
Alexander Chekunkov
baaa077725
added note about RFPDupeFilter.request_fingerprint overriding to the settings documentation
2014-04-26 15:43:12 +03:00
Julia Medina
b9e2aad874
Doc for disabling download handler
2014-03-12 23:21:33 -03:00
Pablo Hoffman
6d8b7b29ef
remove unused setting: DOWNLOADER_DEBUG
2014-02-27 12:16:05 -02:00
Paul Tremberth
41765ca18d
DupeFilter: add setting for verbose logging + stats counter for filtered requests
2014-02-17 13:42:42 +01:00
Rolando Espinoza
28f946b05f
DOC Use pipelines module name instead of pipieline following default project files.
2014-02-15 11:01:26 -04:00
Mikhail Korobov
9a999daa2a
DOWNLOAD_DELAY docs clarification:
...
* delay is enforced per website, not per spider;
* document download_delay attribute (it was previously documented only in FAQ about 999 error codes);
* document how CONCURRENT_REQUESTS_PER_IP affects download delays.
2013-12-28 06:30:34 +06:00
Pablo Hoffman
e8ee449a2a
Merge pull request #432 from darkrho/crawl-url
...
Removed URL reference in crawl command and .tld suffix in docs for spider names
2013-10-21 09:40:58 -07:00
Rolando Espinoza La fuente
34543c2b2e
DOCS removed .tld suffix for spider names for the sake of consistency.
2013-10-19 23:03:20 -04:00
Pablo Hoffman
12280c2a95
fix sphinx references in doc
2013-09-25 15:13:17 -03:00
Pablo Hoffman
fc388f4636
Make ITEM_PIPELINE setting a dict
...
This is for consistency with how spider and downloader middlewares are
defined. ITEM_PIPELINE_BASE was also added and both remain empty.
Backwards compatibility is kept (with a warning) with list-based
ITEM_PIPELINES.
2013-09-23 17:50:43 -03:00
Pablo Hoffman
22edc44c6c
doc: remove links to diveintopython.org, which is no longer available. closes #246
2013-02-14 11:09:40 -02:00
Chris Tilden
aae6aed4fb
fixes spelling errors in documentation
2013-01-22 14:52:18 -08:00
Daniel Graña
076ba40404
update DOWNLOADER_MIDDLEWARES_BASE setting documentation
2013-01-08 10:50:27 -02:00
Pablo Hoffman
7a7c5d1334
removed reference to global scrapy stats from settings doc
2012-11-03 17:05:01 -02:00
Pablo Hoffman
1f89eb59fe
fixed doc reference to topics-contracts
2012-10-09 16:02:12 -02:00
Pablo Hoffman
c380910b40
Merge pull request #167 from alexcepoi/sep-017
...
Spider contracts (SEP-017)
2012-09-28 13:57:07 -07:00
Pablo Hoffman
b46b5a6ef0
Documented AutoThrottle extension and added to extensions available by
...
default. Also deprecated concurrency and delay settings, in favour of
using the standard Scrapy ones.
2012-09-20 18:52:57 -03:00
Pablo Hoffman
c7f8219901
- removed scrapy.conf singleton from scrapy.log, scrapy.responsetypes,
...
scrapy.http.response.text, scrapy.selector
- fixed bug with scrapy.conf.settings backwards compatibility support
- added facility to notify (and provide some guidelines) about deprecated/obsolete settings
2012-09-19 03:03:34 -03:00
Alex Cepoi
bf8dc61fb7
SEP-017 contracts: pretty-printing and docs
2012-09-10 23:17:27 +02:00
Pablo Hoffman
babfc6e79b
Updated documentation after singleton removal changes.
...
Also removed some unused code and made some minor additional
refactoring.
2012-08-28 18:35:57 -03:00
Pablo Hoffman
27018fced7
changed default user agent to Scrapy/0.15 (+ http://scrapy.org ) and removed no longer needed BOT_VERSION setting
2012-03-23 13:45:21 -03:00
Pablo Hoffman
35fb01156e
removed some obsolete remaining code related to sqlite support in scrapy
2012-03-16 11:55:55 -03:00
Pablo Hoffman
ce03ccd4ec
updated documentation about DEPTH_PRIORITY and DFO/BFO crawls
2011-09-23 13:22:25 -03:00
Pablo Hoffman
a1dbc62b45
removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead)
2011-09-02 18:27:39 -03:00
Pablo Hoffman
27dd68a690
added SpiderState extension
2011-09-02 13:06:59 -03:00
Pablo Hoffman
76af0cdd44
updated documentation and code to use -s instead of --set option
2011-09-01 14:35:37 -03:00
Pablo Hoffman
9d97e73a24
fixed priority handling on the new scheduler so that it's backwards compatible (ie. bigger priorities are higher). also fixed a few documentation bugs related to requests priority
2011-08-19 08:26:41 -03:00
Pablo Hoffman
a3697421c0
some minor updates to documentation
2011-08-11 09:19:59 -03:00
Pablo Hoffman
19e6da59d8
added new downloader middleware: ChunkedTransferMiddleware
2011-08-09 03:03:25 -03:00
Pablo Hoffman
9f60c27612
added setting to support disabling DNS cache: DNSCACHE_ENABLED
2011-08-05 20:41:59 -03:00
Pablo Hoffman
549725215e
Initial support for a persistent scheduler, to support pausing and resuming
...
crawls.
* requests are serialized (using marshal by default) and stored on disk, using
one queue per priority
* request priorities must be integers now
* breadh-first and depth-first crawling orders can now be configured
through a new DEPTH_PRIORITY setting (see doc). backwards compatilibty with
SCHEDULER_ORDER was kept.
* requests that can't be serialized (for example, non serializable callbacks)
are always kept in memory queues
* adapted crawl spider to work with persitent scheduler
2011-08-02 11:57:55 -03:00
Pablo Hoffman
ce7a787970
Big downloader refactoring to support real concurrency limits per domain/ip,
...
instead of global limits per spider which were a bit useless.
This removes the setting CONCURRENT_REQUESTS_PER_SPIDER and adds thre new
settings:
* CONCURRENT_REQUESTS
* CONCURRENT_REQUESTS_PER_DOMAIN
* CONCURRENT_REQUESTS_PER_IP (overrides per domain)
The AutoThrottle extension had to be disabled, but will be ported and
re-enabled soon.
2011-07-27 13:38:09 -03:00
Pablo Hoffman
91dc46539f
added LogStats extension for periodically logging basic stats (like crawled pages and scraped items)
2011-06-14 00:50:05 -03:00
Pablo Hoffman
9d9c8877da
added 'scrapy edit' command
2011-06-05 22:02:56 -03:00
Pablo Hoffman
2fa0f75f2d
added COOKIES_ENABLED setting to support disabling the cookies middleware
2011-05-27 00:35:34 -03:00
Pablo Hoffman
503f302010
removed remaining references to scheduler middleware from doc, as it will be removed on next release
2011-05-18 19:48:48 -03:00
Pablo Hoffman
3fd17432cf
fixed outdated documentation
2011-05-18 14:46:20 -03:00
Pablo Hoffman
495152bd50
disabled verbose depth stats collection by default, added DEPTH_STATS_VERBOSE setting to enable it
2011-05-18 11:04:48 -03:00
Pablo Hoffman
accb6ed830
dump stats to log by default (ie. change default value of STATS_DUMP to True)
2011-05-17 22:42:05 -03:00
Pablo Hoffman
b76c5c597f
* Added support for project data storage ( closes #276 )
...
* Documented project file structure
* Moved default location of SQLite database to project data storage dir (closes #277 )
2010-10-31 03:25:37 -02:00
Pablo Hoffman
9599bde3e9
Removed RequestLimitMiddleware
2010-09-22 16:09:13 -03:00
Pablo Hoffman
ed4aec187f
Ported code to use new unified access to spider settings, keeping backwards compatibility for old spider attributes. Refs #245
2010-09-22 16:09:13 -03:00
Pablo Hoffman
b6c2b55e5b
Splitted settings classes from settings singleton. Closes #244
...
--HG--
rename : scrapy/conf/__init__.py => scrapy/conf.py
rename : scrapy/conf/default_settings.py => scrapy/settings/default_settings.py
rename : scrapy/tests/test_conf.py => scrapy/tests/test_settings.py
2010-09-22 15:47:33 -03:00