scrapy/topics at 549725215e79fbd3c7b3590c2a47fc9c0ad30a1b - scrapy - H0llyW00dzZ's Gitea Source Tree

BackTrackZ/scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 02:44:22 +00:00

History

Pablo Hoffman 549725215e Initial support for a persistent scheduler, to support pausing and resuming

crawls.

* requests are serialized (using marshal by default) and stored on disk, using
  one queue per priority
* request priorities must be integers now
* breadh-first and depth-first crawling orders can now be configured
  through a new DEPTH_PRIORITY setting (see doc). backwards compatilibty with
  SCHEDULER_ORDER was kept.
* requests that can't be serialized (for example, non serializable callbacks)
  are always kept in memory queues
* adapted crawl spider to work with persitent scheduler

2011-08-02 11:57:55 -03:00

..

removed remaining references to scheduler middleware from doc, as it will be removed on next release

2011-05-18 19:48:48 -03:00

architecture.rst

removed remaining references to scheduler middleware from doc, as it will be removed on next release

2011-05-18 19:48:48 -03:00

commands.rst

More core changes:

2011-07-15 15:18:39 -03:00

downloader-middleware.rst

redirect mw: added REDIRECT_ENABLED setting and documented the other settings

2011-07-13 14:18:15 -03:00

email.rst

removed (somewhat hacky) MAIL_DEBUG setting

2010-08-22 22:42:00 -03:00

exceptions.rst

added CloseSpider exception, to manually close spiders

2011-07-12 14:24:10 -03:00

exporters.rst

added join_multivalued parameter to CsvItemExporter

2011-03-24 13:15:52 -03:00

extensions.rst

added LogStats extension for periodically logging basic stats (like crawled pages and scraped items)

2011-06-14 00:50:05 -03:00

feed-exports.rst

Added versionadded:: notice to new documentation topics

2010-09-04 03:30:45 -03:00

firebug.rst

Replaced default spider manager (TwistedPluginSpiderManger) with a simpler one that doesn't depend on Twisted Plugins infrastructure.

2010-07-30 17:30:32 -03:00

firefox.rst

#154 : Language fixes to the documentation

2010-04-18 23:39:54 -03:00

images.rst

Simplified images pipeline by allowing it to be used without having to override it in your project. Closes #217

2010-08-31 16:03:08 -03:00

item-pipeline.rst

Updated Scrapy Tutorial to reference feed exports, instead a custom written pipeline, and extended item pipeline documentation to include a JSON writer.

2010-10-10 20:31:05 -02:00

items.rst

Removed support for default values in Scrapy items, which have proven confusing in the past

2011-05-19 21:42:46 -03:00

leaks.rst

Applied documentation patch provided by Lucian Ursu (closes #207 )

2010-08-21 01:26:35 -03:00

link-extractors.rst

Some Link extractor improvements:

2011-05-18 12:32:34 -03:00

loaders.rst

Applied documentation patch provided by Lucian Ursu (closes #207 )

2010-08-21 01:26:35 -03:00

logging.rst

Applied documentation patch provided by Lucian Ursu (closes #207 )

2010-08-21 01:26:35 -03:00

request-response.rst

Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it

2011-04-06 14:54:48 -03:00

scrapyd.rst

Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12

2011-04-14 12:36:27 -03:00

selectors.rst

* Added lxml backend for XPath selectors. Closes #147

2010-10-25 14:47:10 -02:00

settings.rst

Initial support for a persistent scheduler, to support pausing and resuming

2011-08-02 11:57:55 -03:00

shell.rst

Updated some old messages in Scrapy shell doc

2010-09-05 04:45:43 -03:00

signals.rst

Merged item passed and item scraped concepts, as they have often proved

2011-06-03 01:13:00 -03:00

spider-middleware.rst

Initial support for a persistent scheduler, to support pausing and resuming

2011-08-02 11:57:55 -03:00

spiders.rst

SitemapSpider: added support for filtering which sitemaps to follow (patch contributed by Rolando Espinoza). closes #330

2011-06-23 18:18:29 -03:00

stats.rst

removed SimpledbStatsCollector from scrapy code, it was moved to https://github.com/scrapinghub/scaws

2011-07-20 10:38:16 -03:00

telnetconsole.rst

downloader: renamed SpiderInfo to Slot, for consistency with engine and scraper names

2011-07-22 02:06:10 -03:00

ubuntu.rst

Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12

2011-07-13 18:44:54 -03:00

webservice.rst

Bind the web server and telnet server to a configurable interface (WEBSERVICE_HOST). The default is to bind to all interfaces. Also add documentation for WEBSERVICE_HOST and TELNETCONSOLE_HOST.

2010-11-01 00:59:04 -02:00