From 844b59b5b39aabb0d468ff349eead9d94bd69827 Mon Sep 17 00:00:00 2001 From: Ismael Carnales Date: Mon, 9 Feb 2009 15:06:22 +0000 Subject: [PATCH] reduced introduction text in proposed doc --HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40836 --- scrapy/trunk/docs/proposed/introduction.rst | 69 +++++---------------- 1 file changed, 16 insertions(+), 53 deletions(-) diff --git a/scrapy/trunk/docs/proposed/introduction.rst b/scrapy/trunk/docs/proposed/introduction.rst index 602f8d3ad..7646c3646 100644 --- a/scrapy/trunk/docs/proposed/introduction.rst +++ b/scrapy/trunk/docs/proposed/introduction.rst @@ -12,75 +12,38 @@ Overview :height: 468 :alt: Scrapy architecture -.. _items: - -Items ------ - -In Scrapy, Items are the placeholder to use for the scraped data. They are -represented by a :class:`~scrapy.item.ScrapedItem` object, or any subclass -instance, and store the information in instance attributes. - -.. _request-response: - Requests and Responses ---------------------- -Scrapy uses :class:`~scrapy.http.Request` and :class:`~scrapy.http.Response` -objects for crawling web sites. +Scrapy uses *Requests* and *Responses* for crawling web sites. -Generally, :class:`~scrapy.http.Request` objects are generated in the -:ref:`Spiders ` (although they can be generated in any component of -the framework), then they pass across the system until they reach the -Downloader, which actually executes the request and returns a -:class:`~scrapy.http.Response` object to the :class:`Request's callback -function `. - -.. _overview-spiders: +Generally, *Requests* are generated in the Spiders and pass across the system +until they reach the *Downloader*, which executes the *Request* and returns a +*Response* which goes back to the Spider that generated the *Request*. Spiders ------- -Spiders are user written classes which define how a certain site (or domain) -will be scraped; including how to crawl the site and how to scrape :ref:`Items -` from their pages. +Spiders are user written classes to scrape information from a domain (or group +of domains). -All Spiders must be descendant of :class:`~scrapy.spider.BaseSpider` or any -subclass of it, in :ref:`ref-spiders` you can see a list of available Spiders -in Scrapy. -.. _selectors: +They define an initial set of URLs (or Requests) to download, how to crawl the +domain and how to scrape *Items* from their pages. -Selectors ---------- +Items +----- -Selectors are the recommended tool to extract information from documents. They -retrieve information from the :ref:`Response ` body using -`XPath `_, a language for finding information in a -XML document navigating trough its elements and attributes. +Items are the placeholder to use for the scraped data. They are represented by a +simple Python class. -Scrapy defines a class :class:`~scrapy.xpath.XPathSelector`, that comes in two -flavours, :class:`~scrapy.xpath.HtmlXPatSelector` (for HTML) and -:class:`~scrapy.xpath.XmlXPathSelector` (for XML). In order to use them you -must instantiate the desired class with a :ref:`Response ` -object. - -You can see selectors as objects that represents nodes in the document -structure. So, the first instantiated selectors are associated to the root -node, or the entire document. - -.. _item-pipeline: +After an Item has been scraped by a Spider, it is sent to the Item Pipeline for further proccesing. Item Pipeline ------------- -After an :ref:`Item ` has been scraped by a :ref:`Spider `, it -is sent to the Item Pipeline which allows us to perform some actions over the -:ref:`scrapped Items `. - The Item Pipeline is a list of user written Python classes that implement a -specific method , which is called sequentially for every element of the -Pipeline. +specific method, which is called sequentially for every element of the Pipeline. Each element receives the Scraped Item, do an action upon it (like validating, -checking for duplicates, store the item), and then decide if the Item -continues trough the Pipeline or the item is dropped. +checking for duplicates, store the item), and then decide if the Item continues +trough the Pipeline or the item is dropped.