1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 20:44:04 +00:00

formatting changes and references to spiders added in the tutorial

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40809
This commit is contained in:
Ismael Carnales 2009-01-30 19:14:16 +00:00
parent 2526e11be0
commit 7e6c9c2e25

View File

@ -43,16 +43,16 @@ These are basically:
* ``dmoz/spiders/``: a directory where you'll later put your spiders. * ``dmoz/spiders/``: a directory where you'll later put your spiders.
* ``dmoz/templates/``: directory containing the spider's templates. * ``dmoz/templates/``: directory containing the spider's templates.
The use of this files will be clarified throughout the tutorial, now let's go into The use of this files will be clarified throughout the tutorial, now let's go
spiders. into spiders.
Spiders Spiders
======= =======
Spiders are custom modules written by you, the user, to scrape information from Spiders are custom modules written by you, the user, to scrape information from
a certain domain. Their duty is to feed the Scrapy engine with URLs to a certain domain. Their duty is to feed the Scrapy engine with URLs to download,
download, and then parse the downloaded contents in the search for data or more and then parse the downloaded contents in the search for data or more URLs to
URLs to follow. follow.
They are the heart of a Scrapy project and where most part of the action takes They are the heart of a Scrapy project and where most part of the action takes
place. place.
@ -60,7 +60,6 @@ place.
To create our first spider, save this code in a file named ``dmoz_spider.py`` To create our first spider, save this code in a file named ``dmoz_spider.py``
inside ``dmoz/spiders`` folder:: inside ``dmoz/spiders`` folder::
from scrapy.spider import BaseSpider from scrapy.spider import BaseSpider
class OpenDirectorySpider(BaseSpider): class OpenDirectorySpider(BaseSpider):
@ -82,15 +81,15 @@ inside ``dmoz/spiders`` folder::
When creating spiders, be sure not to name them equal to the project's name When creating spiders, be sure not to name them equal to the project's name
or you won't be able to import modules from your project in your spider! or you won't be able to import modules from your project in your spider!
The first line imports the class BaseSpider. For the purpose of creating a The first line imports the class :class:`scrapy.spider.BaseSpider`. For the
working spider, you must subclass BaseSpider, and then define the three main, purpose of creating a working spider, you must subclass
mandatory, attributes: :class:`scrapy.spider.BaseSpider`, and then define the three main, mandatory,
attributes:
* ``domain_name``: identifies the spider. It must be unique, that is, you can't * ``domain_name``: identifies the spider. It must be unique, that is, you can't
set the same domain name for different spiders. set the same domain name for different spiders.
* ``start_urls``: is a list * ``start_urls``: is a list of URLs where the spider will begin to crawl from.
of URLs where the spider will begin to crawl from.
So, the first pages downloaded will be those listed here. The subsequent URLs So, the first pages downloaded will be those listed here. The subsequent URLs
will be generated successively from data contained in the start URLs. will be generated successively from data contained in the start URLs.