scrapy/docs/intro/overview.rst

.. _intro-overview:

==================
Scrapy at a glance
==================

Scrapy a is an application framework for crawling web sites and extracting
structured data which can be used for a wide range of useful applications, like
data mining, information processing or historical archival.

Even though Scrapy was originally designed for `screen scraping`_ (more
precisely, `web scraping`_), it can also be used to extract data using APIs
(such as `Amazon Associates Web Services`_) or as a general purpose web
crawler.

.. _screen scraping: http://en.wikipedia.org/wiki/Screen_scraping
.. _web scraping: http://en.wikipedia.org/wiki/Web_scraping
.. _Amazon Associates Web Services: http://aws.amazon.com/associates/

The purpose of this document is to introduce you to the concepts behind Scrapy
so you can get an idea of how it works and decide if Scrapy is what you need. 

When you're ready to start a project, you can :ref:`start with the tutorial
<intro-tutorial>`.

Pick a website
==============

So you need to extract some information from a website, but the website doesn't
provide any API or mechanism to access that info from a computer program.
Scrapy can help you extract that information. Let's say we want to extract
information about all torrent files added today in the `mininova`_ torrent
site.

.. _mininova: http://www.mininova.org

The list of all torrents added today can be found in this page:

    http://www.mininova.org/today
    
Write a Spider to extract the Items
===================================

Now we'll write a Spider which defines the start URL
(http://www.mininova.org/today), the rules for following links and the rules
for extracting the data from pages.

If we take a look at that page content we'll see that all torrent URLs are like
http://www.mininova.org/tor/NUMBER where ``NUMBER`` is an integer. We'll use
that to construct the regular expression for the links to follow: ``/tor/\d+``.

For extracting data we'll use `XPath`_ to select the part of the document where
the data is to be extracted. Let's take one of those torrent pages:

    http://www.mininova.org/tor/2657665

.. _XPath: http://www.w3.org/TR/xpath
  
And look at the page HTML source to construct the XPath to select the data we
want to extract which is: torrent name, description and size.

.. highlight:: html

By looking at the page HTML source we can see that the file name is contained
inside a ``<h1>`` tag::

   <h1>Home[2009][Eng]XviD-ovd</h1>

.. highlight:: none

An XPath expression to extract the name could be::

    //h1/text()

.. highlight:: html

And the description is contained inside a ``<div>`` tag with ``id="description"``::

   <h2>Description:</h2>

   <div id="description">
   "HOME" - a documentary film by Yann Arthus-Bertrand
   <br/>
   <br/>
   ***
   <br/>
   <br/>
   "We are living in exceptional times. Scientists tell us that we have 10 years to change the way we live, avert the depletion of natural resources and the catastrophic evolution of the Earth's climate.

   ...

.. highlight:: none

An XPath expression to select the description could be::

    //div[@id='description']

.. highlight:: html

Finally, the file size is contained in the second ``<p>`` tag inside the ``<div>``
tag with ``id=specifications``::

   <div id="specifications">

   <p>
   <strong>Category:</strong>
   <a href="/cat/4">Movies</a> &gt; <a href="/sub/35">Documentary</a>
   </p>

   <p>
   <strong>Total size:</strong>
   699.79&nbsp;megabyte</p>


.. highlight:: none

An XPath expression to select the description could be::

   //div[@id='specifications']/p[2]/text()[2]

.. highlight:: python

For more information about XPath see the `XPath reference`_.

.. _XPath reference: http://www.w3.org/TR/xpath

Finally, here's the spider code::

    class MininovaSpider(CrawlSpider):

        name = 'mininova.org'
        allowed_domains = ['mininova.org']
        start_urls = ['http://www.mininova.org/today']
        rules = [Rule(SgmlLinkExtractor(allow=['/tor/\d+']), 'parse_torrent')]
        
        def parse_torrent(self, response):
            x = HtmlXPathSelector(response)

            torrent = TorrentItem()
            torrent['url'] = response.url
            torrent['name'] = x.select("//h1/text()").extract()
            torrent['description'] = x.select("//div[@id='description']").extract()
            torrent['size'] = x.select("//div[@id='info-left']/p[2]/text()[2]").extract()
            return torrent


For brevity sake, we intentionally left out the import statements and the
Torrent class definition (which is included some paragraphs above).

Write a pipeline to store the items extracted
=============================================

Now let's write an :ref:`topics-item-pipeline` that serializes and stores the
extracted item into a file using `pickle`_::

    import pickle

    class StoreItemPipeline(object):
        def process_item(self, item, spider):
            torrent_id = item['url'].split('/')[-1]
            f = open("torrent-%s.pickle" % torrent_id, "w")
            pickle.dump(item, f)
            f.close()

.. _pickle: http://docs.python.org/library/pickle.html

What else?
==========

You've seen how to extract and store items from a website using Scrapy, but
this is just the surface. Scrapy provides a lot of powerful features for making
scraping easy and efficient, such as:

* Built-in support for :ref:`selecting and extracting <topics-selectors>` data
  from HTML and XML sources

* Built-in support for :ref:`generating feed exports <topics-feed-exports>` in
  multiple formats (JSON, CSV, XML) and storing them in multiple backends (FTP,
  S3, filesystem)

* A media pipeline for :ref:`automatically downloading images <topics-images>`
  (or any other media) associated with the scraped items

* Support for :ref:`extending Scrapy <extending-scrapy>` by plugging
  your own functionality using middlewares, extensions, and pipelines

* Wide range of built-in middlewares and extensions for handling of
  compression, cache, cookies, authentication, user-agent spoofing, robots.txt
  handling, statistics, crawl depth restriction, etc

* An :ref:`Interactive scraping shell console <topics-shell>`, very useful for
  writing and debugging your spiders

* A builtin :ref:`Web service <topics-webservice>` for monitoring and
  controlling your bot

* A :ref:`Telnet console <topics-telnetconsole>` for full unrestricted access
  to a Python console inside your Scrapy process, to introspect and debug your
  crawler

* Built-in facilities for :ref:`logging <topics-logging>`, :ref:`collecting
  stats <topics-stats>`, and :ref:`sending email notifications <topics-email>`

What's next?
============

The next obvious steps are for you to `download Scrapy`_, read :ref:`the
tutorial <intro-tutorial>` and join `the community`_. Thanks for your
interest!

.. _download Scrapy: http://scrapy.org/download/
.. _the community: http://scrapy.org/community/
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`.. _intro-overview:`
Updated scrapy overview --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40549 2008-12-26 12:21:53 +00:00
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`==================`
			`Scrapy at a glance`
			`==================`
Updated scrapy overview --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40549 2008-12-26 12:21:53 +00:00
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`Scrapy a is an application framework for crawling web sites and extracting`
			`structured data which can be used for a wide range of useful applications, like`
			`data mining, information processing or historical archival.`
Updated scrapy overview --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40549 2008-12-26 12:21:53 +00:00
Improved "What else?" section of "Scrapy at a glance" overview 2010-03-20 20:24:18 -03:00			Even though Scrapy was originally designed for `screen scraping`_ (more
			precisely, `web scraping`_), it can also be used to extract data using APIs
			(such as `Amazon Associates Web Services`_) or as a general purpose web
			`crawler.`
Updated scrapy overview --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40549 2008-12-26 12:21:53 +00:00
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`.. _screen scraping: http://en.wikipedia.org/wiki/Screen_scraping`
Improved "What else?" section of "Scrapy at a glance" overview 2010-03-20 20:24:18 -03:00			`.. _web scraping: http://en.wikipedia.org/wiki/Web_scraping`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`.. _Amazon Associates Web Services: http://aws.amazon.com/associates/`
Updated scrapy overview --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40549 2008-12-26 12:21:53 +00:00
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`The purpose of this document is to introduce you to the concepts behind Scrapy`
			`so you can get an idea of how it works and decide if Scrapy is what you need.`
Updated scrapy overview --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40549 2008-12-26 12:21:53 +00:00
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			When you're ready to start a project, you can :ref:`start with the tutorial
rearranged documentation into a better organization --HG-- rename : docs/topics/index.rst => docs/index.rst 2009-08-21 21:49:54 -03:00			<intro-tutorial>`.
Updated scrapy overview --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40549 2008-12-26 12:21:53 +00:00
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`Pick a website`
			`==============`
Updated scrapy overview --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40549 2008-12-26 12:21:53 +00:00
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`So you need to extract some information from a website, but the website doesn't`
			`provide any API or mechanism to access that info from a computer program.`
			`Scrapy can help you extract that information. Let's say we want to extract`
			information about all torrent files added today in the `mininova`_ torrent
			`site.`
Updated scrapy overview --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40549 2008-12-26 12:21:53 +00:00
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`.. _mininova: http://www.mininova.org`

			`The list of all torrents added today can be found in this page:`

			`http://www.mininova.org/today`

			`Write a Spider to extract the Items`
			`===================================`

			`Now we'll write a Spider which defines the start URL`
minor corrections to overview doc --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40672 2009-01-07 12:56:40 +00:00			`(http://www.mininova.org/today), the rules for following links and the rules`
			`for extracting the data from pages.`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
			`If we take a look at that page content we'll see that all torrent URLs are like`
minor corrections to overview doc --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40672 2009-01-07 12:56:40 +00:00			http://www.mininova.org/tor/NUMBER where ``NUMBER`` is an integer. We'll use
			that to construct the regular expression for the links to follow: ``/tor/\d+``.
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
			For extracting data we'll use `XPath`_ to select the part of the document where
			`the data is to be extracted. Let's take one of those torrent pages:`

changed torrent in overview doc 2009-08-24 15:11:04 -03:00			`http://www.mininova.org/tor/2657665`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
			`.. _XPath: http://www.w3.org/TR/xpath`

			`And look at the page HTML source to construct the XPath to select the data we`
			`want to extract which is: torrent name, description and size.`

			`.. highlight:: html`

			`By looking at the page HTML source we can see that the file name is contained`
			inside a ``<h1>`` tag::

changed torrent in overview doc 2009-08-24 15:11:04 -03:00			`<h1>Home[2009][Eng]XviD-ovd</h1>`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
			`.. highlight:: none`

			`An XPath expression to extract the name could be::`

			`//h1/text()`

			`.. highlight:: html`

			And the description is contained inside a ``<div>`` tag with ``id="description"``::

changed torrent in overview doc 2009-08-24 15:11:04 -03:00			`<h2>Description:</h2>`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
changed torrent in overview doc 2009-08-24 15:11:04 -03:00			`<div id="description">`
			`"HOME" - a documentary film by Yann Arthus-Bertrand`
			`<br/>`
			`<br/>`
			`***`
			`<br/>`
			`<br/>`
			`"We are living in exceptional times. Scientists tell us that we have 10 years to change the way we live, avert the depletion of natural resources and the catastrophic evolution of the Earth's climate.`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
changed torrent in overview doc 2009-08-24 15:11:04 -03:00			`...`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
			`.. highlight:: none`

			`An XPath expression to select the description could be::`

			`//div[@id='description']`

			`.. highlight:: html`

			Finally, the file size is contained in the second ``<p>`` tag inside the ``<div>``
changed torrent in overview doc 2009-08-24 15:11:04 -03:00			tag with ``id=specifications``::
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
changed torrent in overview doc 2009-08-24 15:11:04 -03:00			`<div id="specifications">`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
			`<p>`
			`<strong>Category:</strong>`
changed torrent in overview doc 2009-08-24 15:11:04 -03:00			`<a href="/cat/4">Movies</a> > <a href="/sub/35">Documentary</a>`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`</p>`

			`<p>`
			`<strong>Total size:</strong>`
changed torrent in overview doc 2009-08-24 15:11:04 -03:00			`699.79 megabyte</p>`

improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
			`.. highlight:: none`

			`An XPath expression to select the description could be::`

changed torrent in overview doc 2009-08-24 15:11:04 -03:00			`//div[@id='specifications']/p[2]/text()[2]`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
			`.. highlight:: python`

			For more information about XPath see the `XPath reference`_.

			`.. _XPath reference: http://www.w3.org/TR/xpath`

			`Finally, here's the spider code::`

			`class MininovaSpider(CrawlSpider):`

SEP12 implementation * Rename BaseSpider.domain_name to BaseSpider.name This patch implements the domain_name to name change in BaseSpider class and change all spider instantiations to use the new attribute. * Add allowed_domains to spider This patch implements the merging of spider.domain_name and spider.extra_domain_names in spider.allowed_domains for offsite checking purposes. Note that spider.domain_name is not touched by this patch, only not used. * Remove spider.domain_name references from scrapy.stats * Rename domain_stats to spider_stats in MemoryStatsCollector * Use ``spider`` instead of ``domain`` in SimpledbStatsCollector * Rename domain_stats_history table to spider_data_history and rename domain field to spider in MysqlStatsCollector * Refactor genspider command The new signature for genspider is: genspider [options] <domain_name>. Genspider uses domain_name for spider name and for the module name. * Remove spider.domain_name references * Update crawl command signature <spider\|url> * docs: updated references to domain_name * examples/experimental: use spider.name * genspider: require <name> <domain> * spidermanager: renamed crawl_domain to crawl_spider_name * spiderctl: updated references of domain to spider * added backward compatiblity with legacy spider's attributes 'domain_name' and 'extra_domain_names' 2010-04-01 18:27:22 -03:00			`name = 'mininova.org'`
			`allowed_domains = ['mininova.org']`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`start_urls = ['http://www.mininova.org/today']`
Sorted out Link Extractors organization by moving all them to scrapy.contrib.linkextractors. The most relevant being: scrapy.link.extractors.RegexLinkExtractor which was moved to: scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor The old location still works but throws a deprecation warning. It will be removed before the 0.7 release. Documentation and tests were also updated. Also, in this changeset, a new regex-based link extractor was added to scrapy.contrib.linkextractors.regex. --HG-- rename : scrapy/tests/sample_data/link_extractor/regex_linkextractor.html => scrapy/tests/sample_data/link_extractor/sgml_linkextractor.html rename : scrapy/tests/test_link.py => scrapy/tests/test_contrib_linkextractors.py 2009-05-18 19:19:37 -03:00			`rules = [Rule(SgmlLinkExtractor(allow=['/tor/\d+']), 'parse_torrent')]`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
			`def parse_torrent(self, response):`
			`x = HtmlXPathSelector(response)`
removed redudant part of Scrapy introduction to make it simpler. thanks Ismael for pointing that out --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40851 2009-02-12 20:58:42 +00:00
moved scrapy.newitem to scrapy.item and declared newitem api officially stable. updated docs and example project. deprecated old ScrapedItem 2009-08-19 21:39:58 -03:00			`torrent = TorrentItem()`
			`torrent['url'] = response.url`
			`torrent['name'] = x.select("//h1/text()").extract()`
			`torrent['description'] = x.select("//div[@id='description']").extract()`
			`torrent['size'] = x.select("//div[@id='info-left']/p[2]/text()[2]").extract()`
modified doc to reflect the new spider callback return policy (lists not needed) 2009-09-22 11:25:40 -03:00			`return torrent`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00

			`For brevity sake, we intentionally left out the import statements and the`
			`Torrent class definition (which is included some paragraphs above).`

			`Write a pipeline to store the items extracted`
			`=============================================`

			Now let's write an :ref:`topics-item-pipeline` that serializes and stores the
			extracted item into a file using `pickle`_::

			`import pickle`

			`class StoreItemPipeline(object):`
Some improvements to Item Pipeline (closes #195): * Made Item Pipeline Manager a subclass of scrapy.middleware.MiddlewareManager * Added open_spider/close_spider methods with support for returning deferreds from them * Inverted the process_item() arguments to be more friendly with deferred callbacks (backwards compatibility kept through arguments introspection) * Updated documentation with new methods and process_item() arguments change 2010-08-12 10:48:37 -03:00			`def process_item(self, item, spider):`
moved scrapy.newitem to scrapy.item and declared newitem api officially stable. updated docs and example project. deprecated old ScrapedItem 2009-08-19 21:39:58 -03:00			`torrent_id = item['url'].split('/')[-1]`
minor doc update for making it more windows-friendly 2009-08-09 17:08:42 -03:00			`f = open("torrent-%s.pickle" % torrent_id, "w")`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`pickle.dump(item, f)`
			`f.close()`

			`.. _pickle: http://docs.python.org/library/pickle.html`

minor update to overview doc --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40674 2009-01-07 14:02:28 +00:00			`What else?`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`==========`

			`You've seen how to extract and store items from a website using Scrapy, but`
			`this is just the surface. Scrapy provides a lot of powerful features for making`
minor update to overview doc --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40674 2009-01-07 14:02:28 +00:00			`scraping easy and efficient, such as:`

Improved "What else?" section of "Scrapy at a glance" overview 2010-03-20 20:24:18 -03:00			* Built-in support for :ref:`selecting and extracting <topics-selectors>` data
			`from HTML and XML sources`
minor update to overview doc --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40674 2009-01-07 14:02:28 +00:00
Added new Feed exports extension with documentation and storage tests. Closes #197. Also deprecated File export pipeline (to be removed in Scrapy 0.11). Still need to add tests for FeedExport main extension code. 2010-08-17 14:27:48 -03:00			* Built-in support for :ref:`generating feed exports <topics-feed-exports>` in
			`multiple formats (JSON, CSV, XML) and storing them in multiple backends (FTP,`
			`S3, filesystem)`
minor update to overview doc --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40674 2009-01-07 14:02:28 +00:00
Improved "What else?" section of "Scrapy at a glance" overview 2010-03-20 20:24:18 -03:00			* A media pipeline for :ref:`automatically downloading images <topics-images>`
			`(or any other media) associated with the scraped items`

			* Support for :ref:`extending Scrapy <extending-scrapy>` by plugging
			`your own functionality using middlewares, extensions, and pipelines`
minor update to overview doc --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40674 2009-01-07 14:02:28 +00:00
			`* Wide range of built-in middlewares and extensions for handling of`
			`compression, cache, cookies, authentication, user-agent spoofing, robots.txt`
			`handling, statistics, crawl depth restriction, etc`

Improved "What else?" section of "Scrapy at a glance" overview 2010-03-20 20:24:18 -03:00			* An :ref:`Interactive scraping shell console <topics-shell>`, very useful for
			`writing and debugging your spiders`

* Added Scrapy Web Service with documentation and tests. * Marked Web Console as deprecated. * Removed Web Console documentation to discourage its use. 2010-06-09 13:46:22 -03:00			* A builtin :ref:`Web service <topics-webservice>` for monitoring and
Improved "What else?" section of "Scrapy at a glance" overview 2010-03-20 20:24:18 -03:00			`controlling your bot`
minor update to overview doc --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40674 2009-01-07 14:02:28 +00:00
Improved "What else?" section of "Scrapy at a glance" overview 2010-03-20 20:24:18 -03:00			* A :ref:`Telnet console <topics-telnetconsole>` for full unrestricted access
			`to a Python console inside your Scrapy process, to introspect and debug your`
			`crawler`
minor update to overview doc --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40674 2009-01-07 14:02:28 +00:00
Improved "What else?" section of "Scrapy at a glance" overview 2010-03-20 20:24:18 -03:00			* Built-in facilities for :ref:`logging <topics-logging>`, :ref:`collecting
			stats <topics-stats>`, and :ref:`sending email notifications <topics-email>`
minor update to overview doc --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40674 2009-01-07 14:02:28 +00:00
some minor doc improvements here and there --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40699 2009-01-09 22:45:58 +00:00			`What's next?`
			`============`
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00
			The next obvious steps are for you to `download Scrapy`_, read :ref:`the
doc: fixed a couple of broken links --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%401054 2009-04-12 09:05:00 +00:00			tutorial <intro-tutorial>` and join `the community`_. Thanks for your
improved overview doc. closes #44 --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40668 2009-01-07 03:58:15 +00:00			`interest!`

			`.. _download Scrapy: http://scrapy.org/download/`
			`.. _the community: http://scrapy.org/community/`