mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-26 15:04:37 +00:00
Merge pull request #229 from christilden/master
fixes spelling errors in documentation
This commit is contained in:
commit
c40f947dc1
@ -100,7 +100,7 @@ how you :ref:`configure the downloader middlewares
|
||||
|
||||
.. method:: start()
|
||||
|
||||
Start the crawler. This calss :meth:`configure` if it hasn't been called yet.
|
||||
Start the crawler. This calls :meth:`configure` if it hasn't been called yet.
|
||||
|
||||
Settings API
|
||||
============
|
||||
|
@ -126,7 +126,7 @@ And you can see all available commands with::
|
||||
There are two kinds of commands, those that only work from inside a Scrapy
|
||||
project (Project-specific commands) and those that also work without an active
|
||||
Scrapy project (Global commands), though they may behave slightly different
|
||||
when running from inside a project (as they would use the project overriden
|
||||
when running from inside a project (as they would use the project overridden
|
||||
settings).
|
||||
|
||||
Global commands:
|
||||
|
@ -87,7 +87,7 @@ Scrapy Shell
|
||||
|
||||
While the :command:`parse` command is very useful for checking behaviour of a
|
||||
spider, it is of little help to check what happens inside a callback, besides
|
||||
showing the reponse received and the output. How to debug the situation when
|
||||
showing the response received and the output. How to debug the situation when
|
||||
``parse_details`` sometimes receives no item?
|
||||
|
||||
Fortunately, the :command:`shell` is your bread and butter in this case (see
|
||||
|
@ -16,7 +16,7 @@ Using DjangoItem
|
||||
================
|
||||
|
||||
:class:`DjangoItem` works much like ModelForms in Django, you create a subclass
|
||||
and define its ``django_model`` atribute to ve a valid Django model. With this
|
||||
and define its ``django_model`` attribute to be a valid Django model. With this
|
||||
you will get an item with a field for each Django model field.
|
||||
|
||||
In addition, you can define fields that aren't present in the model and even
|
||||
@ -85,7 +85,7 @@ And we can override the fields of the model with your own::
|
||||
django_model = Person
|
||||
name = Field(default='No Name')
|
||||
|
||||
This is usefull to provide properties to the field, like a default or any other
|
||||
This is useful to provide properties to the field, like a default or any other
|
||||
property that your project uses.
|
||||
|
||||
DjangoItem caveats
|
||||
|
@ -108,7 +108,7 @@ single Python class that defines one or more of the following methods:
|
||||
:param request: the request that originated the response
|
||||
:type request: is a :class:`~scrapy.http.Request` object
|
||||
|
||||
:param reponse: the response being processed
|
||||
:param response: the response being processed
|
||||
:type response: :class:`~scrapy.http.Response` object
|
||||
|
||||
:param spider: the spider for which this response is intended
|
||||
@ -563,7 +563,7 @@ HttpProxyMiddleware
|
||||
``proxy`` meta value to :class:`~scrapy.http.Request` objects.
|
||||
|
||||
Like the Python standard library modules `urllib`_ and `urllib2`_, it obeys
|
||||
the following enviroment variables:
|
||||
the following environment variables:
|
||||
|
||||
* ``http_proxy``
|
||||
* ``https_proxy``
|
||||
|
@ -56,7 +56,7 @@ uses `Twisted non-blocking IO`_, like the rest of the framework.
|
||||
performed.
|
||||
:type smtphost: str
|
||||
|
||||
:param smtppass: the SMTP pass for authetnication.
|
||||
:param smtppass: the SMTP pass for authentication.
|
||||
:type smtppass: str
|
||||
|
||||
:param smtpport: the SMTP port to connect to
|
||||
|
@ -244,7 +244,7 @@ XmlItemExporter
|
||||
</item>
|
||||
</items>
|
||||
|
||||
Unless overriden in the :meth:`serialize_field` method, multi-valued fields are
|
||||
Unless overridden in the :meth:`serialize_field` method, multi-valued fields are
|
||||
exported by serializing each value inside a ``<value>`` element. This is for
|
||||
convenience, as multi-valued fields are very common.
|
||||
|
||||
|
@ -113,7 +113,7 @@ being created. These parameters are:
|
||||
* ``%(time)s`` - gets replaced by a timestamp when the feed is being created
|
||||
* ``%(name)s`` - gets replaced by the spider name
|
||||
|
||||
Any other named parmeter gets replaced by the spider attribute of the same
|
||||
Any other named parameter gets replaced by the spider attribute of the same
|
||||
name. For example, ``%(site_id)s`` would get replaced by the ``spider.site_id``
|
||||
attribute the moment the feed is being created.
|
||||
|
||||
|
@ -273,7 +273,7 @@ Here are the methods that you should override in your custom Images Pipeline:
|
||||
.. method:: item_completed(results, items, info)
|
||||
|
||||
The :meth:`ImagesPipeline.item_completed` method called when all image
|
||||
requests for a single item have completed (either finshed downloading, or
|
||||
requests for a single item have completed (either finished downloading, or
|
||||
failed for some reason).
|
||||
|
||||
The :meth:`~item_completed` method must return the
|
||||
|
@ -62,7 +62,7 @@ Item pipeline example
|
||||
Price validation and dropping items with no prices
|
||||
--------------------------------------------------
|
||||
|
||||
Let's take a look at the following hypothetic pipeline that adjusts the ``price``
|
||||
Let's take a look at the following hypothetical pipeline that adjusts the ``price``
|
||||
attribute for those items that do not include VAT (``price_excludes_vat``
|
||||
attribute), and drops those items which don't contain a price::
|
||||
|
||||
|
@ -61,7 +61,7 @@ certain field keys to configure that behaviour. You must refer to their
|
||||
documentation to see which metadata keys are used by each component.
|
||||
|
||||
It's important to note that the :class:`Field` objects used to declare the item
|
||||
do not stay assigned as class attributes. Instead, they can be accesed through
|
||||
do not stay assigned as class attributes. Instead, they can be accessed through
|
||||
the :attr:`Item.fields` attribute.
|
||||
|
||||
And that's all you need to know about declaring items.
|
||||
@ -137,7 +137,7 @@ Setting field values
|
||||
...
|
||||
KeyError: 'Product does not support field: lala'
|
||||
|
||||
Accesing all populated values
|
||||
Accessing all populated values
|
||||
-----------------------------
|
||||
|
||||
To access all populated values, just use the typical `dict API`_::
|
||||
|
@ -49,7 +49,7 @@ loading that attribute from the job directory, when the spider starts and
|
||||
stops.
|
||||
|
||||
Here's an example of a callback that uses the spider state (other spider code
|
||||
is ommited for brevity)::
|
||||
is omitted for brevity)::
|
||||
|
||||
def parse_item(self, response):
|
||||
# parse item here
|
||||
|
@ -85,7 +85,7 @@ SgmlLinkExtractor
|
||||
Defaults to ``('a', 'area')``.
|
||||
:type tags: str or list
|
||||
|
||||
:param attrs: list of attrbitues which should be considered when looking
|
||||
:param attrs: list of attributes which should be considered when looking
|
||||
for links to extract (only for those tags specified in the ``tags``
|
||||
parameter). Defaults to ``('href',)``
|
||||
:type attrs: boolean
|
||||
|
@ -61,7 +61,7 @@ In other words, data is being collected by extracting it from two XPath
|
||||
locations, using the :meth:`~XPathItemLoader.add_xpath` method. This is the
|
||||
data that will be assigned to the ``name`` field later.
|
||||
|
||||
Afterwards, similar calls are used for ``price`` and ``stock`` fields, and
|
||||
Afterwords, similar calls are used for ``price`` and ``stock`` fields, and
|
||||
finally the ``last_update`` field is populated directly with a literal value
|
||||
(``today``) using a different method: :meth:`~ItemLoader.add_value`.
|
||||
|
||||
@ -253,7 +253,7 @@ ItemLoader objects
|
||||
:attr:`default_item_class`.
|
||||
|
||||
The item and the remaining keyword arguments are assigned to the Loader
|
||||
context (accesible through the :attr:`context` attribute).
|
||||
context (accessible through the :attr:`context` attribute).
|
||||
|
||||
.. method:: get_value(value, \*processors, \**kwargs)
|
||||
|
||||
@ -280,7 +280,7 @@ ItemLoader objects
|
||||
The value is first passed through :meth:`get_value` by giving the
|
||||
``processors`` and ``kwargs``, and then passed through the
|
||||
:ref:`field input processor <topics-loaders-processors>` and its result
|
||||
appened to the data collected for that field. If the field already
|
||||
appended to the data collected for that field. If the field already
|
||||
contains collected data, the new data is added.
|
||||
|
||||
The given ``field_name`` can be ``None``, in which case values for
|
||||
|
@ -63,13 +63,13 @@ scrapy.log module
|
||||
will be sent to standard error.
|
||||
:type logfile: str
|
||||
|
||||
:param loglevel: the minimum logging level to log. Availables values are:
|
||||
:param loglevel: the minimum logging level to log. Available values are:
|
||||
:data:`CRITICAL`, :data:`ERROR`, :data:`WARNING`, :data:`INFO` and
|
||||
:data:`DEBUG`.
|
||||
|
||||
:param logstdout: if ``True``, all standard output (and error) of your
|
||||
application will be logged instead. For example if you "print 'hello'"
|
||||
it will appear in the Scrapy log. If ommited, the :setting:`LOG_STDOUT`
|
||||
it will appear in the Scrapy log. If omitted, the :setting:`LOG_STDOUT`
|
||||
setting will be used.
|
||||
:type logstdout: boolean
|
||||
|
||||
|
@ -152,7 +152,7 @@ Request objects
|
||||
recognized by Scrapy.
|
||||
|
||||
This dict is `shallow copied`_ when the request is cloned using the
|
||||
``copy()`` or ``replace()`` methods, and can also be accesed, in your
|
||||
``copy()`` or ``replace()`` methods, and can also be accessed, in your
|
||||
spider, from the ``response.meta`` attribute.
|
||||
|
||||
.. _shallow copied: http://docs.python.org/library/copy.html
|
||||
@ -270,7 +270,7 @@ fields with form data from :class:`Response` objects.
|
||||
sometimes it can cause problems which could be hard to debug. For
|
||||
example, when working with forms that are filled and/or submitted using
|
||||
javascript, the default :meth:`from_response` behaviour may not be the
|
||||
most appropiate. To disable this behaviour you can set the
|
||||
most appropriate. To disable this behaviour you can set the
|
||||
``dont_click`` argument to ``True``. Also, if you want to change the
|
||||
control clicked (instead of disabling it) you can also use the
|
||||
``clickdata`` argument.
|
||||
@ -294,7 +294,7 @@ fields with form data from :class:`Response` objects.
|
||||
overridden by the one passed in this parameter.
|
||||
:type formdata: dict
|
||||
|
||||
:param dont_click: If True, the form data will be sumbitted without
|
||||
:param dont_click: If True, the form data will be submitted without
|
||||
clicking in any element.
|
||||
:type dont_click: boolean
|
||||
|
||||
|
@ -66,7 +66,7 @@ Or, if you want to start Scrapyd from inside a Scrapy project you can use the
|
||||
Installing Scrapyd
|
||||
==================
|
||||
|
||||
How to deploy Scrapyd on your servers depends on the platform your're using.
|
||||
How to deploy Scrapyd on your servers depends on the platform you're using.
|
||||
Scrapy comes with Ubuntu packages for Scrapyd ready for deploying it as a
|
||||
system service, to ease the installation and administration, but you can create
|
||||
packages for other distribution or operating systems (including Windows). If
|
||||
@ -303,7 +303,7 @@ Now, if you type ``scrapy deploy -l`` you'll see::
|
||||
See available projects
|
||||
----------------------
|
||||
|
||||
To see all available projets in a specific target use::
|
||||
To see all available projects in a specific target use::
|
||||
|
||||
scrapy deploy -L scrapyd
|
||||
|
||||
@ -459,7 +459,7 @@ Example request::
|
||||
|
||||
$ curl http://localhost:6800/addversion.json -F project=myproject -F version=r23 -F egg=@myproject.egg
|
||||
|
||||
Example reponse::
|
||||
Example response::
|
||||
|
||||
{"status": "ok", "spiders": 3}
|
||||
|
||||
|
@ -96,7 +96,7 @@ extensions and middlewares::
|
||||
if settings['LOG_ENABLED']:
|
||||
print "log is enabled!"
|
||||
|
||||
In other words, settings can be accesed like a dict, but it's usually preferred
|
||||
In other words, settings can be accessed like a dict, but it's usually preferred
|
||||
to extract the setting in the format you need it to avoid type errors. In order
|
||||
to do that you'll have to use one of the methods provided the
|
||||
:class:`~scrapy.settings.Settings` API.
|
||||
@ -648,7 +648,7 @@ REDIRECT_MAX_TIMES
|
||||
|
||||
Default: ``20``
|
||||
|
||||
Defines the maximun times a request can be redirected. After this maximun the
|
||||
Defines the maximum times a request can be redirected. After this maximum the
|
||||
request's response is returned as is. We used Firefox default value for the
|
||||
same task.
|
||||
|
||||
|
@ -77,7 +77,7 @@ single Python class that defines one or more of the following methods:
|
||||
direction for :meth:`process_spider_output` to process it, or
|
||||
:meth:`process_spider_exception` if it raised an exception.
|
||||
|
||||
:param reponse: the response being processed
|
||||
:param response: the response being processed
|
||||
:type response: :class:`~scrapy.http.Response` object
|
||||
|
||||
:param spider: the spider for which this response is intended
|
||||
@ -258,7 +258,7 @@ OffsiteMiddleware
|
||||
these messages for each new domain filtered. So, for example, if another
|
||||
request for ``www.othersite.com`` is filtered, no log message will be
|
||||
printed. But if a request for ``someothersite.com`` is filtered, a message
|
||||
will be printed (but only for the first request filtred).
|
||||
will be printed (but only for the first request filtered).
|
||||
|
||||
If the spider doesn't define an
|
||||
:attr:`~scrapy.spider.BaseSpider.allowed_domains` attribute, or the
|
||||
|
@ -30,7 +30,7 @@ For spiders, the scraping cycle goes through something like this:
|
||||
response handled by the specified callback.
|
||||
|
||||
3. In callback functions, you parse the page contents, typically using
|
||||
:ref:`topics-selectors` (but you can also use BeautifuSoup, lxml or whatever
|
||||
:ref:`topics-selectors` (but you can also use BeautifulSoup, lxml or whatever
|
||||
mechanism you prefer) and generate items with the parsed data.
|
||||
|
||||
4. Finally, the items returned from the spider will be typically persisted to a
|
||||
@ -183,7 +183,7 @@ BaseSpider
|
||||
:class:`~scrapy.item.Item` objects.
|
||||
|
||||
:param response: the response to parse
|
||||
:type reponse: :class:~scrapy.http.Response`
|
||||
:type response: :class:~scrapy.http.Response`
|
||||
|
||||
.. method:: log(message, [level, component])
|
||||
|
||||
@ -434,7 +434,7 @@ These spiders are pretty easy to use, let's have a look at one example::
|
||||
name = 'example.com'
|
||||
allowed_domains = ['example.com']
|
||||
start_urls = ['http://www.example.com/feed.xml']
|
||||
iterator = 'iternodes' # This is actually unnecesary, since it's the default value
|
||||
iterator = 'iternodes' # This is actually unnecessary, since it's the default value
|
||||
itertag = 'item'
|
||||
|
||||
def parse_node(self, response, node):
|
||||
|
@ -6,7 +6,7 @@ Stats Collection
|
||||
|
||||
Scrapy provides a convenient facility for collecting stats in the form of
|
||||
key/values, where values are often counters. The facility is called the Stats
|
||||
Collector, and can be accesed through the :attr:`~scrapy.crawler.Crawler.stats`
|
||||
Collector, and can be accessed through the :attr:`~scrapy.crawler.Crawler.stats`
|
||||
attribute of the :ref:`topics-api-crawler`, as illustrated by the examples in
|
||||
the :ref:`topics-stats-usecases` section below.
|
||||
|
||||
@ -100,7 +100,7 @@ DummyStatsCollector
|
||||
|
||||
.. class:: DummyStatsCollector
|
||||
|
||||
A Stats collector which does nothing but is very efficient (beacuse it does
|
||||
A Stats collector which does nothing but is very efficient (because it does
|
||||
nothing). This stats collector can be set via the :setting:`STATS_CLASS`
|
||||
setting, to disable stats collect in order to improve performance. However,
|
||||
the performance penalty of stats collection is usually marginal compared to
|
||||
|
@ -155,7 +155,7 @@ TELNETCONSOLE_PORT
|
||||
|
||||
Default: ``[6023, 6073]``
|
||||
|
||||
The port range to use for the etlnet console. If set to ``None`` or ``0``, a
|
||||
The port range to use for the telnet console. If set to ``None`` or ``0``, a
|
||||
dynamically assigned port is used.
|
||||
|
||||
|
||||
|
@ -28,7 +28,7 @@ The web service contains several resources, defined in the
|
||||
functionality. See :ref:`topics-webservice-resources-ref` for a list of
|
||||
resources available by default.
|
||||
|
||||
Althought you can implement your own resources using any protocol, there are
|
||||
Although you can implement your own resources using any protocol, there are
|
||||
two kinds of resources bundled with Scrapy:
|
||||
|
||||
* Simple JSON resources - which are read-only and just output JSON data
|
||||
@ -188,7 +188,7 @@ To write a web service resource you should subclass the :class:`JsonResource` or
|
||||
.. attribute:: ws_name
|
||||
|
||||
The name by which the Scrapy web service will known this resource, and
|
||||
also the path wehere this resource will listen. For example, assuming
|
||||
also the path where this resource will listen. For example, assuming
|
||||
Scrapy web service is listening on http://localhost:6080/ and the
|
||||
``ws_name`` is ``'resource1'`` the URL for that resource will be:
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user