1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-23 08:03:53 +00:00

Fixed minor grammar issues.

This commit is contained in:
David Chen 2015-11-16 07:30:17 +08:00
parent 57f87b95d4
commit 0025d5a943
8 changed files with 11 additions and 11 deletions

View File

@ -144,7 +144,7 @@ I get "Filtered offsite request" messages. How can I fix them?
Those messages (logged with ``DEBUG`` level) don't necessarily mean there is a Those messages (logged with ``DEBUG`` level) don't necessarily mean there is a
problem, so you may not need to fix them. problem, so you may not need to fix them.
Those message are thrown by the Offsite Spider Middleware, which is a spider Those messages are thrown by the Offsite Spider Middleware, which is a spider
middleware (enabled by default) whose purpose is to filter out requests to middleware (enabled by default) whose purpose is to filter out requests to
domains outside the ones covered by the spider. domains outside the ones covered by the spider.

View File

@ -34,7 +34,7 @@ These are some common properties often found in broad crawls:
As said above, Scrapy default settings are optimized for focused crawls, not As said above, Scrapy default settings are optimized for focused crawls, not
broad crawls. However, due to its asynchronous architecture, Scrapy is very broad crawls. However, due to its asynchronous architecture, Scrapy is very
well suited for performing fast broad crawls. This page summarize some things well suited for performing fast broad crawls. This page summarizes some things
you need to keep in mind when using Scrapy for doing broad crawls, along with you need to keep in mind when using Scrapy for doing broad crawls, along with
concrete suggestions of Scrapy settings to tune in order to achieve an concrete suggestions of Scrapy settings to tune in order to achieve an
efficient broad crawl. efficient broad crawl.
@ -46,7 +46,7 @@ Concurrency is the number of requests that are processed in parallel. There is
a global limit and a per-domain limit. a global limit and a per-domain limit.
The default global concurrency limit in Scrapy is not suitable for crawling The default global concurrency limit in Scrapy is not suitable for crawling
many different domains in parallel, so you will want to increase it. How much many different domains in parallel, so you will want to increase it. How much
to increase it will depend on how much CPU you crawler will have available. A to increase it will depend on how much CPU you crawler will have available. A
good starting point is ``100``, but the best way to find out is by doing some good starting point is ``100``, but the best way to find out is by doing some
trials and identifying at what concurrency your Scrapy process gets CPU trials and identifying at what concurrency your Scrapy process gets CPU

View File

@ -17,7 +17,7 @@ Extensions use the :ref:`Scrapy settings <topics-settings>` to manage their
settings, just like any other Scrapy code. settings, just like any other Scrapy code.
It is customary for extensions to prefix their settings with their own name, to It is customary for extensions to prefix their settings with their own name, to
avoid collision with existing (and future) extensions. For example, an avoid collision with existing (and future) extensions. For example, a
hypothetic extension to handle `Google Sitemaps`_ would use settings like hypothetic extension to handle `Google Sitemaps`_ would use settings like
`GOOGLESITEMAP_ENABLED`, `GOOGLESITEMAP_DEPTH`, and so on. `GOOGLESITEMAP_ENABLED`, `GOOGLESITEMAP_DEPTH`, and so on.

View File

@ -95,7 +95,7 @@ contain a price::
Write items to a JSON file Write items to a JSON file
-------------------------- --------------------------
The following pipeline stores all scraped items (from all spiders) into a a The following pipeline stores all scraped items (from all spiders) into a
single ``items.jl`` file, containing one item per line serialized in JSON single ``items.jl`` file, containing one item per line serialized in JSON
format:: format::

View File

@ -61,7 +61,7 @@ the example above.
You can specify any kind of metadata for each field. There is no restriction on You can specify any kind of metadata for each field. There is no restriction on
the values accepted by :class:`Field` objects. For this same the values accepted by :class:`Field` objects. For this same
reason, there is no reference list of all available metadata keys. Each key reason, there is no reference list of all available metadata keys. Each key
defined in :class:`Field` objects could be used by a different components, and defined in :class:`Field` objects could be used by a different component, and
only those components know about it. You can also define and use any other only those components know about it. You can also define and use any other
:class:`Field` key in your project too, for your own needs. The main goal of :class:`Field` key in your project too, for your own needs. The main goal of
:class:`Field` objects is to provide a way to define all field metadata in one :class:`Field` objects is to provide a way to define all field metadata in one

View File

@ -97,7 +97,7 @@ subclasses):
A real example A real example
-------------- --------------
Let's see a concrete example of an hypothetical case of memory leaks. Let's see a concrete example of a hypothetical case of memory leaks.
Suppose we have some spider with a line similar to this one:: Suppose we have some spider with a line similar to this one::
return Request("http://www.somenastyspider.com/product.php?pid=%d" % product_id, return Request("http://www.somenastyspider.com/product.php?pid=%d" % product_id,

View File

@ -228,7 +228,7 @@ with varying degrees of sophistication. Getting around those measures can be
difficult and tricky, and may sometimes require special infrastructure. Please difficult and tricky, and may sometimes require special infrastructure. Please
consider contacting `commercial support`_ if in doubt. consider contacting `commercial support`_ if in doubt.
Here are some tips to keep in mind when dealing with these kind of sites: Here are some tips to keep in mind when dealing with these kinds of sites:
* rotate your user agent from a pool of well-known ones from browsers (google * rotate your user agent from a pool of well-known ones from browsers (google
around to get a list of them) around to get a list of them)

View File

@ -579,7 +579,7 @@ Built-in Selectors reference
is used together with ``text``. is used together with ``text``.
If ``type`` is ``None`` and a ``response`` is passed, the selector type is If ``type`` is ``None`` and a ``response`` is passed, the selector type is
inferred from the response type as follow: inferred from the response type as follows:
* ``"html"`` for :class:`~scrapy.http.HtmlResponse` type * ``"html"`` for :class:`~scrapy.http.HtmlResponse` type
* ``"xml"`` for :class:`~scrapy.http.XmlResponse` type * ``"xml"`` for :class:`~scrapy.http.XmlResponse` type
@ -757,7 +757,7 @@ nodes can be accessed directly by their names::
<Selector xpath='//link' data=u'<link xmlns="http://www.w3.org/2005/Atom'>, <Selector xpath='//link' data=u'<link xmlns="http://www.w3.org/2005/Atom'>,
... ...
If you wonder why the namespace removal procedure isn't called always by default If you wonder why the namespace removal procedure isn't always called by default
instead of having to call it manually, this is because of two reasons, which, in order instead of having to call it manually, this is because of two reasons, which, in order
of relevance, are: of relevance, are: