mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-25 00:04:03 +00:00
somes fixes and updates to scrapy documentation
--HG-- extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40669
This commit is contained in:
parent
0319373c61
commit
2c2dc61766
@ -26,6 +26,6 @@ framework).
|
||||
Does Scrapy work with Python 3.0?
|
||||
---------------------------------
|
||||
|
||||
No, and there is no plan to port Scrapy to Python 3.0 yet. At the moment Scrapy
|
||||
requires Python 2.5 or 2.6.
|
||||
No, and there are no plans to port Scrapy to Python 3.0 yet. At the moment
|
||||
Scrapy requires Python 2.5 or 2.6.
|
||||
|
||||
|
@ -1,11 +1,9 @@
|
||||
.. Scrapy documentation master file, created by sphinx-quickstart on Mon Nov 24 12:02:52 2008.
|
||||
You can adapt this file completely to your liking, but it should at least
|
||||
contain the root `toctree` directive.
|
||||
.. _index:
|
||||
|
||||
Scrapy |version| documentation
|
||||
================================
|
||||
|
||||
Welcome! This is the documentation for Scrapy |version|.
|
||||
Welcome! This is the documentation for Scrapy |version|, last updated on |today|.
|
||||
|
||||
For more information visit the `Scrapy homepage <http://scrapy.org>`_.
|
||||
|
||||
|
@ -1,11 +1,11 @@
|
||||
.. _tutorial1:
|
||||
|
||||
.. highlight:: sh
|
||||
.. _intro-tutorial1:
|
||||
|
||||
======================
|
||||
Creating a new project
|
||||
======================
|
||||
|
||||
.. highlight:: sh
|
||||
|
||||
In this tutorial, we'll teach you how to scrape http://www.google.com/dirhp Google's web directory.
|
||||
|
||||
We'll assume that Scrapy is already installed in your system, if not see :ref:`intro-install`.
|
||||
@ -29,4 +29,4 @@ As long as Scrapy is well installed and the path is set, this should create a di
|
||||
|
||||
$ export PYTHONPATH=/path/to/your/project
|
||||
|
||||
Now you can continue with the next part of the tutorial: :ref:`tutorial2`.
|
||||
Now you can continue with the next part of the tutorial: :ref:`intro-tutorial2`.
|
||||
|
@ -1,4 +1,4 @@
|
||||
.. _tutorial2:
|
||||
.. _intro-tutorial2:
|
||||
|
||||
================
|
||||
Our first spider
|
||||
@ -121,5 +121,5 @@ You can try crawling with this little code, by running::
|
||||
./scrapy-ctl crawl google.com
|
||||
|
||||
and it will actually work, altough it won't do any parsing, since parse_category is not defined, and that's exactly what we're going to do in the next part of
|
||||
the tutorial: :ref:`tutorial3`.
|
||||
the tutorial: :ref:`intro-tutorial3`.
|
||||
|
||||
|
@ -1,4 +1,4 @@
|
||||
.. _tutorial3:
|
||||
.. _intro-tutorial3:
|
||||
|
||||
=================
|
||||
Scraping our data
|
||||
@ -71,4 +71,4 @@ are handled different than others (in fact, it *will* happen once you scrape mor
|
||||
The rest of the code is quite self-explanatory. The *attribute* method sets the item's attributes, and the items themselves are put into a list that we'll return to Scrapy's engine.
|
||||
One simple (although important) thing to remember here is that you must always return a list that contains either items, requests, or both, but always inside a list.
|
||||
|
||||
So, we're almost done! Let's now check the last part of the tutorial: :ref:`tutorial4`
|
||||
So, we're almost done! Let's now check the last part of the tutorial: :ref:`intro-tutorial4`
|
||||
|
@ -1,4 +1,4 @@
|
||||
.. _tutorial4:
|
||||
.. _intro-tutorial4:
|
||||
|
||||
=================
|
||||
Finishing the job
|
||||
|
@ -21,10 +21,13 @@ Writing your own item pipeline
|
||||
Writing your own item pipeline is easy. Each item pipeline component is a
|
||||
single Python class that must define the following method:
|
||||
|
||||
.. method:: process_item(request, spider)
|
||||
.. method:: process_item(domain, response, item)
|
||||
|
||||
``request`` is a Request object
|
||||
``spider`` is a BaseSpider object
|
||||
``domain`` is a string with the domain of the spider which scraped the item
|
||||
|
||||
``response`` is a :class:`scrapy.http.Response` with the response where the item was scraped
|
||||
|
||||
``item`` is a :class:`scrapy.item.ScrapedItem` with the item scraped
|
||||
|
||||
This method is called for every item pipeline component and must either return
|
||||
a ScrapedItem (or any descendant class) object or raise a :exception:`DropItem`
|
||||
|
Loading…
x
Reference in New Issue
Block a user