Pablo Hoffman
c7f8219901
- removed scrapy.conf singleton from scrapy.log, scrapy.responsetypes,
...
scrapy.http.response.text, scrapy.selector
- fixed bug with scrapy.conf.settings backwards compatibility support
- added facility to notify (and provide some guidelines) about deprecated/obsolete settings
2012-09-19 03:03:34 -03:00
Alex Cepoi
bf8dc61fb7
SEP-017 contracts: pretty-printing and docs
2012-09-10 23:17:27 +02:00
Pablo Hoffman
babfc6e79b
Updated documentation after singleton removal changes.
...
Also removed some unused code and made some minor additional
refactoring.
2012-08-28 18:35:57 -03:00
Pablo Hoffman
27018fced7
changed default user agent to Scrapy/0.15 (+ http://scrapy.org ) and removed no longer needed BOT_VERSION setting
2012-03-23 13:45:21 -03:00
Pablo Hoffman
35fb01156e
removed some obsolete remaining code related to sqlite support in scrapy
2012-03-16 11:55:55 -03:00
Pablo Hoffman
ce03ccd4ec
updated documentation about DEPTH_PRIORITY and DFO/BFO crawls
2011-09-23 13:22:25 -03:00
Pablo Hoffman
a1dbc62b45
removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead)
2011-09-02 18:27:39 -03:00
Pablo Hoffman
27dd68a690
added SpiderState extension
2011-09-02 13:06:59 -03:00
Pablo Hoffman
76af0cdd44
updated documentation and code to use -s instead of --set option
2011-09-01 14:35:37 -03:00
Pablo Hoffman
9d97e73a24
fixed priority handling on the new scheduler so that it's backwards compatible (ie. bigger priorities are higher). also fixed a few documentation bugs related to requests priority
2011-08-19 08:26:41 -03:00
Pablo Hoffman
a3697421c0
some minor updates to documentation
2011-08-11 09:19:59 -03:00
Pablo Hoffman
19e6da59d8
added new downloader middleware: ChunkedTransferMiddleware
2011-08-09 03:03:25 -03:00
Pablo Hoffman
9f60c27612
added setting to support disabling DNS cache: DNSCACHE_ENABLED
2011-08-05 20:41:59 -03:00
Pablo Hoffman
549725215e
Initial support for a persistent scheduler, to support pausing and resuming
...
crawls.
* requests are serialized (using marshal by default) and stored on disk, using
one queue per priority
* request priorities must be integers now
* breadh-first and depth-first crawling orders can now be configured
through a new DEPTH_PRIORITY setting (see doc). backwards compatilibty with
SCHEDULER_ORDER was kept.
* requests that can't be serialized (for example, non serializable callbacks)
are always kept in memory queues
* adapted crawl spider to work with persitent scheduler
2011-08-02 11:57:55 -03:00
Pablo Hoffman
ce7a787970
Big downloader refactoring to support real concurrency limits per domain/ip,
...
instead of global limits per spider which were a bit useless.
This removes the setting CONCURRENT_REQUESTS_PER_SPIDER and adds thre new
settings:
* CONCURRENT_REQUESTS
* CONCURRENT_REQUESTS_PER_DOMAIN
* CONCURRENT_REQUESTS_PER_IP (overrides per domain)
The AutoThrottle extension had to be disabled, but will be ported and
re-enabled soon.
2011-07-27 13:38:09 -03:00
Pablo Hoffman
91dc46539f
added LogStats extension for periodically logging basic stats (like crawled pages and scraped items)
2011-06-14 00:50:05 -03:00
Pablo Hoffman
9d9c8877da
added 'scrapy edit' command
2011-06-05 22:02:56 -03:00
Pablo Hoffman
2fa0f75f2d
added COOKIES_ENABLED setting to support disabling the cookies middleware
2011-05-27 00:35:34 -03:00
Pablo Hoffman
503f302010
removed remaining references to scheduler middleware from doc, as it will be removed on next release
2011-05-18 19:48:48 -03:00
Pablo Hoffman
3fd17432cf
fixed outdated documentation
2011-05-18 14:46:20 -03:00
Pablo Hoffman
495152bd50
disabled verbose depth stats collection by default, added DEPTH_STATS_VERBOSE setting to enable it
2011-05-18 11:04:48 -03:00
Pablo Hoffman
accb6ed830
dump stats to log by default (ie. change default value of STATS_DUMP to True)
2011-05-17 22:42:05 -03:00
Pablo Hoffman
b76c5c597f
* Added support for project data storage ( closes #276 )
...
* Documented project file structure
* Moved default location of SQLite database to project data storage dir (closes #277 )
2010-10-31 03:25:37 -02:00
Pablo Hoffman
9599bde3e9
Removed RequestLimitMiddleware
2010-09-22 16:09:13 -03:00
Pablo Hoffman
ed4aec187f
Ported code to use new unified access to spider settings, keeping backwards compatibility for old spider attributes. Refs #245
2010-09-22 16:09:13 -03:00
Pablo Hoffman
b6c2b55e5b
Splitted settings classes from settings singleton. Closes #244
...
--HG--
rename : scrapy/conf/__init__.py => scrapy/conf.py
rename : scrapy/conf/default_settings.py => scrapy/settings/default_settings.py
rename : scrapy/tests/test_conf.py => scrapy/tests/test_settings.py
2010-09-22 15:47:33 -03:00
Pablo Hoffman
766f2d910d
Renamed Request Handlers to Download Handlers
2010-09-05 19:35:53 -03:00
Pablo Hoffman
6bf52fb50e
Make telnet console and web service try a range of ports for binding, instead of just one. Closes #226
2010-09-05 06:48:08 -03:00
Pablo Hoffman
14e985b076
Updated Command line tool documentation
2010-09-05 05:29:58 -03:00
Pablo Hoffman
1190f97944
Updated settings documentation
2010-09-05 04:58:14 -03:00
Pablo Hoffman
053d45e79f
Splitted stats collector classes from stats collection facility ( #204 )
...
* moved scrapy.stats.collector.__init__ module to scrapy.statscol
* moved scrapy.stats.collector.simpledb module to scrapy.contrib.statscol
* moved signals from scrapy.stats.signals to scrapy.signals
* moved scrapy/stats/__init__.py to scrapy/stats.py
* updated documentation and tests accordingly
--HG--
rename : scrapy/stats/collector/simpledb.py => scrapy/contrib/statscol.py
rename : scrapy/stats/__init__.py => scrapy/stats.py
rename : scrapy/stats/collector/__init__.py => scrapy/statscol.py
2010-08-22 01:24:07 -03:00
Pablo Hoffman
9aefa242d5
Applied documentation patch provided by Lucian Ursu ( closes #207 )
2010-08-21 01:26:35 -03:00
Pablo Hoffman
94ead94bf6
Improved documentation of Scrapy command-line tool
...
--HG--
rename : docs/topics/cmdline.rst => docs/topics/commands.rst
2010-08-19 00:04:52 -03:00
Pablo Hoffman
34554da201
Deprecated scrapy-ctl.py command in favour of simpler "scrapy" command. Closes #199 . Also updated documenation accordingly and added convenient scrapy.bat script for running from Windows.
...
--HG--
rename : debian/scrapy-ctl.1 => debian/scrapy.1
rename : docs/topics/scrapy-ctl.rst => docs/topics/cmdline.rst
2010-08-18 19:48:32 -03:00
Pablo Hoffman
a71521bfba
Default per-command settings are now specified in the default_settings attribute of the command object. Closes #201
2010-08-17 18:30:13 -03:00
Pablo Hoffman
e741a807d2
Added new Feed exports extension with documentation and storage tests. Closes #197 .
...
Also deprecated File export pipeline (to be removed in Scrapy 0.11).
Still need to add tests for FeedExport main extension code.
2010-08-17 14:27:48 -03:00
Pablo Hoffman
1df2c17b78
updated old documentation references
2010-08-12 20:45:11 -03:00
Pablo Hoffman
bd16d1cd48
Added SMTP-AUTH support to scrapy.mail ( closes #149 )
2010-06-13 17:14:46 -03:00
Pablo Hoffman
6a33d6c4d0
* Added Scrapy Web Service with documentation and tests.
...
* Marked Web Console as deprecated.
* Removed Web Console documentation to discourage its use.
2010-06-09 13:46:22 -03:00
Pablo Hoffman
031eb1e5ed
removed no longer used SpiderScheduler (obsoleted by ExecutionQueue)
2010-05-28 17:27:15 -03:00
Ismael Carnales
a71dc295af
Some mail improvements and tests.
...
* Add mail_sent signal and use it in MailSender
* Add MAIL_DEBUG setting to not send mails when testing
* Add MailSender tests
2010-05-28 16:51:47 -03:00
Daniel Grana
68a875edb0
update ENCODING_ALIASES setting default value in settings documentation topic
2010-04-07 10:54:54 -03:00
Pablo Hoffman
2299deda66
updated wrong link in doc
2010-03-26 14:02:33 -03:00
Pablo Hoffman
1330697c3d
Some improvements to Response encoding support:
...
* added encoding aliases, configurable through a new ENCODING_ALIASES setting
* Response.encoding now returns the real encoding detected for the body
* simplified TextResponse API by removing body_encoding() and
headers_encoding() methods
* Response.encoding now tries to infer the encoding from the body always (it
was done before only on HtmlResponse and TextResponse)
* removed scrapy.utils.encoding.add_encoding_alias() function
* updated implementation of scrapy.utils.response function to reflect these API
changes
* updated documentation to reflect API changes
2010-03-25 15:47:10 -03:00
Pablo Hoffman
9ddcd1095d
sort setting alphabetically
2010-03-25 11:45:06 -03:00
Pablo Hoffman
4fa833c849
Added LOG_ENCODING setting
2010-03-24 12:13:38 -03:00
Pablo Hoffman
d12cd22d5e
switched default scheduler order to DFO, which consumes less memory by default
2010-03-04 10:15:58 -02:00
Pablo Hoffman
c1f8198639
Added RANDOMIZE_DOWNLOAD_DELAY setting
2010-02-19 21:53:18 -02:00
Pablo Hoffman
57d60eae39
sort settings doc alphabetically by setting name
2010-01-31 18:11:13 -02:00
Pablo Hoffman
08eeaf98a2
fixed description of LOG_STDOUT setting
2010-01-13 15:51:08 -02:00