1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 00:04:03 +00:00

somes fixes and updates to scrapy documentation

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40669
This commit is contained in:
Pablo Hoffman 2009-01-07 03:59:39 +00:00
parent 0319373c61
commit 2c2dc61766
7 changed files with 19 additions and 18 deletions

View File

@ -26,6 +26,6 @@ framework).
Does Scrapy work with Python 3.0?
---------------------------------
No, and there is no plan to port Scrapy to Python 3.0 yet. At the moment Scrapy
requires Python 2.5 or 2.6.
No, and there are no plans to port Scrapy to Python 3.0 yet. At the moment
Scrapy requires Python 2.5 or 2.6.

View File

@ -1,11 +1,9 @@
.. Scrapy documentation master file, created by sphinx-quickstart on Mon Nov 24 12:02:52 2008.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
.. _index:
Scrapy |version| documentation
================================
Welcome! This is the documentation for Scrapy |version|.
Welcome! This is the documentation for Scrapy |version|, last updated on |today|.
For more information visit the `Scrapy homepage <http://scrapy.org>`_.

View File

@ -1,11 +1,11 @@
.. _tutorial1:
.. highlight:: sh
.. _intro-tutorial1:
======================
Creating a new project
======================
.. highlight:: sh
In this tutorial, we'll teach you how to scrape http://www.google.com/dirhp Google's web directory.
We'll assume that Scrapy is already installed in your system, if not see :ref:`intro-install`.
@ -29,4 +29,4 @@ As long as Scrapy is well installed and the path is set, this should create a di
$ export PYTHONPATH=/path/to/your/project
Now you can continue with the next part of the tutorial: :ref:`tutorial2`.
Now you can continue with the next part of the tutorial: :ref:`intro-tutorial2`.

View File

@ -1,4 +1,4 @@
.. _tutorial2:
.. _intro-tutorial2:
================
Our first spider
@ -121,5 +121,5 @@ You can try crawling with this little code, by running::
./scrapy-ctl crawl google.com
and it will actually work, altough it won't do any parsing, since parse_category is not defined, and that's exactly what we're going to do in the next part of
the tutorial: :ref:`tutorial3`.
the tutorial: :ref:`intro-tutorial3`.

View File

@ -1,4 +1,4 @@
.. _tutorial3:
.. _intro-tutorial3:
=================
Scraping our data
@ -71,4 +71,4 @@ are handled different than others (in fact, it *will* happen once you scrape mor
The rest of the code is quite self-explanatory. The *attribute* method sets the item's attributes, and the items themselves are put into a list that we'll return to Scrapy's engine.
One simple (although important) thing to remember here is that you must always return a list that contains either items, requests, or both, but always inside a list.
So, we're almost done! Let's now check the last part of the tutorial: :ref:`tutorial4`
So, we're almost done! Let's now check the last part of the tutorial: :ref:`intro-tutorial4`

View File

@ -1,4 +1,4 @@
.. _tutorial4:
.. _intro-tutorial4:
=================
Finishing the job

View File

@ -21,10 +21,13 @@ Writing your own item pipeline
Writing your own item pipeline is easy. Each item pipeline component is a
single Python class that must define the following method:
.. method:: process_item(request, spider)
.. method:: process_item(domain, response, item)
``request`` is a Request object
``spider`` is a BaseSpider object
``domain`` is a string with the domain of the spider which scraped the item
``response`` is a :class:`scrapy.http.Response` with the response where the item was scraped
``item`` is a :class:`scrapy.item.ScrapedItem` with the item scraped
This method is called for every item pipeline component and must either return
a ScrapedItem (or any descendant class) object or raise a :exception:`DropItem`