2009-01-03 09:14:52 +00:00
|
|
|
.. _topics-settings:
|
|
|
|
|
2009-01-02 16:25:28 +00:00
|
|
|
========
|
|
|
|
Settings
|
|
|
|
========
|
2008-12-30 13:28:36 +00:00
|
|
|
|
|
|
|
The Scrapy settings allows you to customize the behaviour of all Scrapy
|
|
|
|
components, including the core, extensions, pipelines and spiders themselves.
|
|
|
|
|
2010-08-21 01:26:35 -03:00
|
|
|
The infrastructure of the settings provides a global namespace of key-value mappings
|
2009-08-29 18:20:13 -03:00
|
|
|
that the code can use to pull configuration values from. The settings can be
|
2008-12-30 13:28:36 +00:00
|
|
|
populated through different mechanisms, which are described below.
|
|
|
|
|
2010-08-21 01:26:35 -03:00
|
|
|
The settings are also the mechanism for selecting the currently active Scrapy
|
2009-08-29 18:20:13 -03:00
|
|
|
project (in case you have many).
|
2009-01-26 23:38:21 +00:00
|
|
|
|
2009-08-29 18:20:13 -03:00
|
|
|
For a list of available built-in settings see: :ref:`topics-settings-ref`.
|
|
|
|
|
|
|
|
Designating the settings
|
2008-12-30 13:28:36 +00:00
|
|
|
========================
|
|
|
|
|
2009-08-29 18:20:13 -03:00
|
|
|
When you use Scrapy, you have to tell it which settings you're using. You can
|
2010-09-05 04:58:14 -03:00
|
|
|
do this by using an environment variable, ``SCRAPY_SETTINGS_MODULE``.
|
2009-08-29 18:20:13 -03:00
|
|
|
|
|
|
|
The value of ``SCRAPY_SETTINGS_MODULE`` should be in Python path syntax, e.g.
|
|
|
|
``myproject.settings``. Note that the settings module should be on the
|
|
|
|
Python `import search path`_.
|
|
|
|
|
2013-02-14 11:09:40 -02:00
|
|
|
.. _import search path: http://docs.python.org/2/tutorial/modules.html#the-module-search-path
|
2009-08-29 18:20:13 -03:00
|
|
|
|
|
|
|
Populating the settings
|
|
|
|
=======================
|
|
|
|
|
2008-12-30 13:28:36 +00:00
|
|
|
Settings can be populated using different mechanisms, each of which having a
|
|
|
|
different precedence. Here is the list of them in decreasing order of
|
|
|
|
precedence:
|
|
|
|
|
2014-06-10 10:59:48 -03:00
|
|
|
1. Command line options (most precedence)
|
2010-09-22 15:47:33 -03:00
|
|
|
2. Project settings module
|
|
|
|
3. Default settings per-command
|
|
|
|
4. Default global settings (less precedence)
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2014-06-10 10:59:48 -03:00
|
|
|
The population of these settings sources is taken care of internally, but a
|
|
|
|
manual handling is possible using API calls. See the
|
|
|
|
:ref:`topics-api-settings` topic for reference.
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2014-06-10 10:59:48 -03:00
|
|
|
These mechanisms are described in more detail below.
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2014-06-10 10:59:48 -03:00
|
|
|
1. Command line options
|
|
|
|
-----------------------
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2014-06-10 10:59:48 -03:00
|
|
|
Arguments provided by the command line are the ones that take most precedence,
|
|
|
|
overriding any other options. You can explicitly override one (or more)
|
|
|
|
settings using the ``-s`` (or ``--set``) command line option.
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2009-01-07 18:04:40 +00:00
|
|
|
.. highlight:: sh
|
|
|
|
|
2009-03-22 19:25:08 +00:00
|
|
|
Example::
|
|
|
|
|
2013-10-19 23:03:20 -04:00
|
|
|
scrapy crawl myspider -s LOG_FILE=scrapy.log
|
2009-03-22 19:25:08 +00:00
|
|
|
|
2010-09-22 15:47:33 -03:00
|
|
|
2. Project settings module
|
|
|
|
--------------------------
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2010-09-22 15:47:33 -03:00
|
|
|
The project settings module is the standard configuration file for your Scrapy
|
|
|
|
project. It's where most of your custom settings will be populated. For
|
|
|
|
example:: ``myproject.settings``.
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2010-09-22 15:47:33 -03:00
|
|
|
3. Default settings per-command
|
2009-01-07 18:04:40 +00:00
|
|
|
-------------------------------
|
|
|
|
|
2010-08-19 00:04:52 -03:00
|
|
|
Each :doc:`Scrapy tool </topics/commands>` command can have its own default
|
|
|
|
settings, which override the global default settings. Those custom command
|
|
|
|
settings are specified in the ``default_settings`` attribute of the command
|
|
|
|
class.
|
2009-01-07 18:04:40 +00:00
|
|
|
|
2010-09-22 15:47:33 -03:00
|
|
|
4. Default global settings
|
2009-01-07 18:04:40 +00:00
|
|
|
--------------------------
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2010-09-22 15:47:33 -03:00
|
|
|
The global defaults are located in the ``scrapy.settings.default_settings``
|
|
|
|
module and documented in the :ref:`topics-settings-ref` section.
|
2008-12-30 13:28:36 +00:00
|
|
|
|
|
|
|
How to access settings
|
|
|
|
======================
|
|
|
|
|
2009-01-07 18:04:40 +00:00
|
|
|
.. highlight:: python
|
|
|
|
|
2012-08-28 18:31:03 -03:00
|
|
|
Settings can be accessed through the :attr:`scrapy.crawler.Crawler.settings`
|
|
|
|
attribute of the Crawler that is passed to ``from_crawler`` method in
|
|
|
|
extensions and middlewares::
|
|
|
|
|
|
|
|
class MyExtension(object):
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2012-08-28 18:31:03 -03:00
|
|
|
@classmethod
|
|
|
|
def from_crawler(cls, crawler):
|
|
|
|
settings = crawler.settings
|
|
|
|
if settings['LOG_ENABLED']:
|
|
|
|
print "log is enabled!"
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2013-01-22 14:52:18 -08:00
|
|
|
In other words, settings can be accessed like a dict, but it's usually preferred
|
2009-01-11 20:04:13 +00:00
|
|
|
to extract the setting in the format you need it to avoid type errors. In order
|
2012-08-28 18:31:03 -03:00
|
|
|
to do that you'll have to use one of the methods provided the
|
|
|
|
:class:`~scrapy.settings.Settings` API.
|
2009-01-11 20:04:13 +00:00
|
|
|
|
2008-12-30 13:28:36 +00:00
|
|
|
Rationale for setting names
|
|
|
|
===========================
|
|
|
|
|
|
|
|
Setting names are usually prefixed with the component that they configure. For
|
2009-01-07 18:04:40 +00:00
|
|
|
example, proper setting names for a fictional robots.txt extension would be
|
2008-12-30 13:28:36 +00:00
|
|
|
``ROBOTSTXT_ENABLED``, ``ROBOTSTXT_OBEY``, ``ROBOTSTXT_CACHEDIR``, etc.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
|
|
|
|
.. _topics-settings-ref:
|
|
|
|
|
|
|
|
Built-in settings reference
|
|
|
|
===========================
|
|
|
|
|
|
|
|
Here's a list of all available Scrapy settings, in alphabetical order, along
|
2014-01-21 18:25:17 +01:00
|
|
|
with their default values and the scope where they apply.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
The scope, where available, shows where the setting is being used, if it's tied
|
|
|
|
to any particular component. In that case the module of that component will be
|
|
|
|
shown, typically an extension, middleware or pipeline. It also means that the
|
|
|
|
component must be enabled in order for the setting to have any effect.
|
|
|
|
|
2010-08-17 14:27:48 -03:00
|
|
|
.. setting:: AWS_ACCESS_KEY_ID
|
|
|
|
|
|
|
|
AWS_ACCESS_KEY_ID
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
Default: ``None``
|
|
|
|
|
|
|
|
The AWS access key used by code that requires access to `Amazon Web services`_,
|
|
|
|
such as the :ref:`S3 feed storage backend <topics-feed-storage-s3>`.
|
|
|
|
|
|
|
|
.. setting:: AWS_SECRET_ACCESS_KEY
|
|
|
|
|
|
|
|
AWS_SECRET_ACCESS_KEY
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
Default: ``None``
|
|
|
|
|
|
|
|
The AWS secret key used by code that requires access to `Amazon Web services`_,
|
|
|
|
such as the :ref:`S3 feed storage backend <topics-feed-storage-s3>`.
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
.. setting:: BOT_NAME
|
|
|
|
|
|
|
|
BOT_NAME
|
|
|
|
--------
|
|
|
|
|
2010-08-17 14:27:48 -03:00
|
|
|
Default: ``'scrapybot'``
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-08-29 04:29:47 -03:00
|
|
|
The name of the bot implemented by this Scrapy project (also known as the
|
|
|
|
project name). This will be used to construct the User-Agent by default, and
|
|
|
|
also for logging.
|
|
|
|
|
|
|
|
It's automatically populated with your project name when you create your
|
2010-08-19 00:04:52 -03:00
|
|
|
project with the :command:`startproject` command.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. setting:: CONCURRENT_ITEMS
|
|
|
|
|
|
|
|
CONCURRENT_ITEMS
|
|
|
|
----------------
|
|
|
|
|
|
|
|
Default: ``100``
|
|
|
|
|
|
|
|
Maximum number of concurrent items (per response) to process in parallel in the
|
2010-01-31 18:11:13 -02:00
|
|
|
Item Processor (also known as the :ref:`Item Pipeline <topics-item-pipeline>`).
|
|
|
|
|
2011-07-27 13:38:09 -03:00
|
|
|
.. setting:: CONCURRENT_REQUESTS
|
2010-01-31 18:11:13 -02:00
|
|
|
|
2011-07-27 13:38:09 -03:00
|
|
|
CONCURRENT_REQUESTS
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
Default: ``16``
|
|
|
|
|
|
|
|
The maximum number of concurrent (ie. simultaneous) requests that will be
|
|
|
|
performed by the Scrapy downloader.
|
|
|
|
|
|
|
|
|
|
|
|
.. setting:: CONCURRENT_REQUESTS_PER_DOMAIN
|
|
|
|
|
|
|
|
CONCURRENT_REQUESTS_PER_DOMAIN
|
2010-01-31 18:11:13 -02:00
|
|
|
------------------------------
|
|
|
|
|
|
|
|
Default: ``8``
|
|
|
|
|
2011-07-27 13:38:09 -03:00
|
|
|
The maximum number of concurrent (ie. simultaneous) requests that will be
|
|
|
|
performed to any single domain.
|
|
|
|
|
|
|
|
.. setting:: CONCURRENT_REQUESTS_PER_IP
|
|
|
|
|
|
|
|
CONCURRENT_REQUESTS_PER_IP
|
|
|
|
--------------------------
|
|
|
|
|
|
|
|
Default: ``0``
|
|
|
|
|
|
|
|
The maximum number of concurrent (ie. simultaneous) requests that will be
|
|
|
|
performed to any single IP. If non-zero, the
|
|
|
|
:setting:`CONCURRENT_REQUESTS_PER_DOMAIN` setting is ignored, and this one is
|
|
|
|
used instead. In other words, concurrency limits will be applied per IP, not
|
|
|
|
per domain.
|
2010-01-31 18:11:13 -02:00
|
|
|
|
2013-12-28 06:30:34 +06:00
|
|
|
This setting also affects :setting:`DOWNLOAD_DELAY`:
|
|
|
|
if :setting:`CONCURRENT_REQUESTS_PER_IP` is non-zero, download delay is
|
|
|
|
enforced per IP, not per domain.
|
|
|
|
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
.. setting:: DEFAULT_ITEM_CLASS
|
|
|
|
|
|
|
|
DEFAULT_ITEM_CLASS
|
|
|
|
------------------
|
|
|
|
|
2009-08-19 21:39:58 -03:00
|
|
|
Default: ``'scrapy.item.Item'``
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
The default class that will be used for instantiating items in the :ref:`the
|
|
|
|
Scrapy shell <topics-shell>`.
|
|
|
|
|
|
|
|
.. setting:: DEFAULT_REQUEST_HEADERS
|
|
|
|
|
|
|
|
DEFAULT_REQUEST_HEADERS
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
Default::
|
|
|
|
|
|
|
|
{
|
|
|
|
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
|
|
|
|
'Accept-Language': 'en',
|
|
|
|
}
|
|
|
|
|
|
|
|
The default headers used for Scrapy HTTP Requests. They're populated in the
|
|
|
|
:class:`~scrapy.contrib.downloadermiddleware.defaultheaders.DefaultHeadersMiddleware`.
|
|
|
|
|
|
|
|
.. setting:: DEPTH_LIMIT
|
|
|
|
|
|
|
|
DEPTH_LIMIT
|
|
|
|
-----------
|
|
|
|
|
|
|
|
Default: ``0``
|
|
|
|
|
|
|
|
The maximum depth that will be allowed to crawl for any site. If zero, no limit
|
|
|
|
will be imposed.
|
|
|
|
|
2011-08-02 11:57:55 -03:00
|
|
|
.. setting:: DEPTH_PRIORITY
|
|
|
|
|
|
|
|
DEPTH_PRIORITY
|
|
|
|
--------------
|
|
|
|
|
2011-09-23 13:22:25 -03:00
|
|
|
Default: ``0``
|
2011-08-02 11:57:55 -03:00
|
|
|
|
2011-08-19 08:26:41 -03:00
|
|
|
An integer that is used to adjust the request priority based on its depth.
|
2011-08-02 11:57:55 -03:00
|
|
|
|
2011-09-23 13:22:25 -03:00
|
|
|
If zero, no priority adjustment is made from depth.
|
2011-08-02 11:57:55 -03:00
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
.. setting:: DEPTH_STATS
|
|
|
|
|
|
|
|
DEPTH_STATS
|
|
|
|
-----------
|
|
|
|
|
|
|
|
Default: ``True``
|
|
|
|
|
2011-05-18 11:04:48 -03:00
|
|
|
Whether to collect maximum depth stats.
|
|
|
|
|
|
|
|
.. setting:: DEPTH_STATS_VERBOSE
|
|
|
|
|
|
|
|
DEPTH_STATS_VERBOSE
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
Default: ``False``
|
|
|
|
|
|
|
|
Whether to collect verbose depth stats. If this is enabled, the number of
|
|
|
|
requests for each depth is collected in the stats.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2011-08-05 20:41:59 -03:00
|
|
|
.. setting:: DNSCACHE_ENABLED
|
|
|
|
|
|
|
|
DNSCACHE_ENABLED
|
|
|
|
----------------
|
|
|
|
|
|
|
|
Default: ``True``
|
|
|
|
|
|
|
|
Whether to enable DNS in-memory cache.
|
|
|
|
|
2014-06-02 13:05:22 +03:00
|
|
|
.. setting:: DOWNLOADER
|
|
|
|
|
|
|
|
DOWNLOADER
|
2014-06-02 13:05:22 +03:00
|
|
|
----------
|
2014-06-02 13:05:22 +03:00
|
|
|
|
|
|
|
Default: ``'scrapy.core.downloader.Downloader'``
|
|
|
|
|
|
|
|
The downloader to use for crawling.
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
.. setting:: DOWNLOADER_MIDDLEWARES
|
|
|
|
|
|
|
|
DOWNLOADER_MIDDLEWARES
|
|
|
|
----------------------
|
|
|
|
|
|
|
|
Default:: ``{}``
|
|
|
|
|
|
|
|
A dict containing the downloader middlewares enabled in your project, and their
|
|
|
|
orders. For more info see :ref:`topics-downloader-middleware-setting`.
|
|
|
|
|
|
|
|
.. setting:: DOWNLOADER_MIDDLEWARES_BASE
|
|
|
|
|
|
|
|
DOWNLOADER_MIDDLEWARES_BASE
|
|
|
|
---------------------------
|
|
|
|
|
2014-01-21 18:25:17 +01:00
|
|
|
Default::
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
{
|
|
|
|
'scrapy.contrib.downloadermiddleware.robotstxt.RobotsTxtMiddleware': 100,
|
|
|
|
'scrapy.contrib.downloadermiddleware.httpauth.HttpAuthMiddleware': 300,
|
2011-05-18 14:46:20 -03:00
|
|
|
'scrapy.contrib.downloadermiddleware.downloadtimeout.DownloadTimeoutMiddleware': 350,
|
2009-08-18 14:05:15 -03:00
|
|
|
'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': 400,
|
|
|
|
'scrapy.contrib.downloadermiddleware.retry.RetryMiddleware': 500,
|
|
|
|
'scrapy.contrib.downloadermiddleware.defaultheaders.DefaultHeadersMiddleware': 550,
|
2013-01-08 10:50:27 -02:00
|
|
|
'scrapy.contrib.downloadermiddleware.redirect.MetaRefreshMiddleware': 580,
|
|
|
|
'scrapy.contrib.downloadermiddleware.httpcompression.HttpCompressionMiddleware': 590,
|
2009-08-18 14:05:15 -03:00
|
|
|
'scrapy.contrib.downloadermiddleware.redirect.RedirectMiddleware': 600,
|
|
|
|
'scrapy.contrib.downloadermiddleware.cookies.CookiesMiddleware': 700,
|
2009-10-07 21:00:34 -02:00
|
|
|
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 750,
|
2011-08-09 03:03:25 -03:00
|
|
|
'scrapy.contrib.downloadermiddleware.chunked.ChunkedTransferMiddleware': 830,
|
2009-08-18 14:05:15 -03:00
|
|
|
'scrapy.contrib.downloadermiddleware.stats.DownloaderStats': 850,
|
|
|
|
'scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware': 900,
|
|
|
|
}
|
|
|
|
|
|
|
|
A dict containing the downloader middlewares enabled by default in Scrapy. You
|
|
|
|
should never modify this setting in your project, modify
|
|
|
|
:setting:`DOWNLOADER_MIDDLEWARES` instead. For more info see
|
|
|
|
:ref:`topics-downloader-middleware-setting`.
|
|
|
|
|
|
|
|
.. setting:: DOWNLOADER_STATS
|
|
|
|
|
|
|
|
DOWNLOADER_STATS
|
|
|
|
----------------
|
|
|
|
|
|
|
|
Default: ``True``
|
|
|
|
|
|
|
|
Whether to enable downloader stats collection.
|
|
|
|
|
|
|
|
.. setting:: DOWNLOAD_DELAY
|
|
|
|
|
|
|
|
DOWNLOAD_DELAY
|
|
|
|
--------------
|
|
|
|
|
|
|
|
Default: ``0``
|
|
|
|
|
|
|
|
The amount of time (in secs) that the downloader should wait before downloading
|
2013-12-28 06:30:34 +06:00
|
|
|
consecutive pages from the same website. This can be used to throttle the
|
2009-08-18 14:05:15 -03:00
|
|
|
crawling speed to avoid hitting servers too hard. Decimal numbers are
|
|
|
|
supported. Example::
|
|
|
|
|
2013-12-28 06:30:34 +06:00
|
|
|
DOWNLOAD_DELAY = 0.25 # 250 ms of delay
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2010-02-19 21:53:18 -02:00
|
|
|
This setting is also affected by the :setting:`RANDOMIZE_DOWNLOAD_DELAY`
|
|
|
|
setting (which is enabled by default). By default, Scrapy doesn't wait a fixed
|
|
|
|
amount of time between requests, but uses a random interval between 0.5 and 1.5
|
|
|
|
* :setting:`DOWNLOAD_DELAY`.
|
|
|
|
|
2013-12-28 06:30:34 +06:00
|
|
|
When :setting:`CONCURRENT_REQUESTS_PER_IP` is non-zero, delays are enforced
|
|
|
|
per ip address instead of per domain.
|
|
|
|
|
|
|
|
You can also change this setting per spider by setting ``download_delay``
|
|
|
|
spider attribute.
|
2010-02-19 21:53:18 -02:00
|
|
|
|
2010-09-05 19:35:53 -03:00
|
|
|
.. setting:: DOWNLOAD_HANDLERS
|
|
|
|
|
|
|
|
DOWNLOAD_HANDLERS
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
Default: ``{}``
|
|
|
|
|
|
|
|
A dict containing the request downloader handlers enabled in your project.
|
|
|
|
See `DOWNLOAD_HANDLERS_BASE` for example format.
|
|
|
|
|
|
|
|
.. setting:: DOWNLOAD_HANDLERS_BASE
|
|
|
|
|
|
|
|
DOWNLOAD_HANDLERS_BASE
|
|
|
|
----------------------
|
|
|
|
|
2014-01-21 18:25:17 +01:00
|
|
|
Default::
|
2010-09-05 19:35:53 -03:00
|
|
|
|
|
|
|
{
|
|
|
|
'file': 'scrapy.core.downloader.handlers.file.FileDownloadHandler',
|
|
|
|
'http': 'scrapy.core.downloader.handlers.http.HttpDownloadHandler',
|
|
|
|
'https': 'scrapy.core.downloader.handlers.http.HttpDownloadHandler',
|
|
|
|
's3': 'scrapy.core.downloader.handlers.s3.S3DownloadHandler',
|
|
|
|
}
|
|
|
|
|
|
|
|
A dict containing the request download handlers enabled by default in Scrapy.
|
|
|
|
You should never modify this setting in your project, modify
|
2014-01-21 18:25:17 +01:00
|
|
|
:setting:`DOWNLOAD_HANDLERS` instead.
|
2010-09-05 19:35:53 -03:00
|
|
|
|
2014-03-12 23:21:33 -03:00
|
|
|
If you want to disable any of the above download handlers you must define them
|
|
|
|
in your project's :setting:`DOWNLOAD_HANDLERS` setting and assign `None`
|
|
|
|
as their value. For example, if you want to disable the file download
|
|
|
|
handler::
|
|
|
|
|
|
|
|
DOWNLOAD_HANDLERS = {
|
|
|
|
'file': None,
|
|
|
|
}
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
.. setting:: DOWNLOAD_TIMEOUT
|
|
|
|
|
|
|
|
DOWNLOAD_TIMEOUT
|
|
|
|
----------------
|
|
|
|
|
|
|
|
Default: ``180``
|
|
|
|
|
|
|
|
The amount of time (in secs) that the downloader will wait before timing out.
|
|
|
|
|
|
|
|
.. setting:: DUPEFILTER_CLASS
|
|
|
|
|
|
|
|
DUPEFILTER_CLASS
|
|
|
|
----------------
|
|
|
|
|
2011-08-02 11:57:55 -03:00
|
|
|
Default: ``'scrapy.dupefilter.RFPDupeFilter'``
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
The class used to detect and filter duplicate requests.
|
|
|
|
|
2011-08-02 11:57:55 -03:00
|
|
|
The default (``RFPDupeFilter``) filters based on request fingerprint using
|
2014-02-15 17:48:32 +02:00
|
|
|
the ``scrapy.utils.request.request_fingerprint`` function. In order to change
|
|
|
|
the way duplicates are checked you could subclass ``RFPDupeFilter`` and
|
2014-04-26 15:46:53 +03:00
|
|
|
override its ``request_fingerprint`` method. This method should accept
|
|
|
|
scrapy :class:`~scrapy.http.Request` object and return its fingerprint
|
|
|
|
(a string).
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2014-01-21 18:25:17 +01:00
|
|
|
.. setting:: DUPEFILTER_DEBUG
|
|
|
|
|
|
|
|
DUPEFILTER_DEBUG
|
|
|
|
----------------
|
|
|
|
|
|
|
|
Default: ``False``
|
|
|
|
|
|
|
|
By default, ``RFPDupeFilter`` only logs the first duplicate request.
|
|
|
|
Setting :setting:`DUPEFILTER_DEBUG` to ``True`` will make it log all duplicate requests.
|
|
|
|
|
|
|
|
.. setting:: EDITOR
|
2011-06-05 22:02:56 -03:00
|
|
|
|
|
|
|
EDITOR
|
|
|
|
------
|
|
|
|
|
2012-08-28 18:31:03 -03:00
|
|
|
Default: `depends on the environment`
|
|
|
|
|
2011-06-05 22:02:56 -03:00
|
|
|
The editor to use for editing spiders with the :command:`edit` command. It
|
|
|
|
defaults to the ``EDITOR`` environment variable, if set. Otherwise, it defaults
|
|
|
|
to ``vi`` (on Unix systems) or the IDLE editor (on Windows).
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
.. setting:: EXTENSIONS
|
|
|
|
|
|
|
|
EXTENSIONS
|
|
|
|
----------
|
|
|
|
|
|
|
|
Default:: ``{}``
|
|
|
|
|
2014-01-21 18:25:17 +01:00
|
|
|
A dict containing the extensions enabled in your project, and their orders.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. setting:: EXTENSIONS_BASE
|
|
|
|
|
|
|
|
EXTENSIONS_BASE
|
|
|
|
---------------
|
|
|
|
|
2011-05-18 14:46:20 -03:00
|
|
|
Default::
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
{
|
2009-09-01 23:00:49 -03:00
|
|
|
'scrapy.contrib.corestats.CoreStats': 0,
|
2010-06-09 13:46:22 -03:00
|
|
|
'scrapy.telnet.TelnetConsole': 0,
|
2009-08-18 14:05:15 -03:00
|
|
|
'scrapy.contrib.memusage.MemoryUsage': 0,
|
|
|
|
'scrapy.contrib.memdebug.MemoryDebugger': 0,
|
2011-05-18 14:46:20 -03:00
|
|
|
'scrapy.contrib.closespider.CloseSpider': 0,
|
|
|
|
'scrapy.contrib.feedexport.FeedExporter': 0,
|
2011-06-14 00:50:05 -03:00
|
|
|
'scrapy.contrib.logstats.LogStats': 0,
|
2011-09-02 13:06:59 -03:00
|
|
|
'scrapy.contrib.spiderstate.SpiderState': 0,
|
2012-09-20 18:50:59 -03:00
|
|
|
'scrapy.contrib.throttle.AutoThrottle': 0,
|
2009-08-18 14:05:15 -03:00
|
|
|
}
|
|
|
|
|
2010-08-21 01:26:35 -03:00
|
|
|
The list of available extensions. Keep in mind that some of them need to
|
2009-08-18 14:05:15 -03:00
|
|
|
be enabled through a setting. By default, this setting contains all stable
|
2014-01-21 18:25:17 +01:00
|
|
|
built-in extensions.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
For more information See the :ref:`extensions user guide <topics-extensions>`
|
|
|
|
and the :ref:`list of available extensions <topics-extensions-ref>`.
|
|
|
|
|
|
|
|
.. setting:: ITEM_PIPELINES
|
|
|
|
|
|
|
|
ITEM_PIPELINES
|
|
|
|
--------------
|
|
|
|
|
2013-09-23 16:41:58 -03:00
|
|
|
Default: ``{}``
|
|
|
|
|
|
|
|
A dict containing the item pipelines to use, and their orders. The dict is
|
|
|
|
empty by default order values are arbitrary but it's customary to define them
|
|
|
|
in the 0-1000 range.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2013-09-23 16:41:58 -03:00
|
|
|
Lists are supported in :setting:`ITEM_PIPELINES` for backwards compatibility,
|
|
|
|
but they are deprecated.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
Example::
|
|
|
|
|
2013-09-23 16:41:58 -03:00
|
|
|
ITEM_PIPELINES = {
|
2014-02-15 10:59:56 -04:00
|
|
|
'mybot.pipelines.validate.ValidateMyItem': 300,
|
|
|
|
'mybot.pipelines.validate.StoreMyItem': 800,
|
2013-09-23 16:41:58 -03:00
|
|
|
}
|
|
|
|
|
|
|
|
.. setting:: ITEM_PIPELINES_BASE
|
|
|
|
|
|
|
|
ITEM_PIPELINES_BASE
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
Default: ``{}``
|
|
|
|
|
|
|
|
A dict containing the pipelines enabled by default in Scrapy. You should never
|
|
|
|
modify this setting in your project, modify :setting:`ITEM_PIPELINES` instead.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. setting:: LOG_ENABLED
|
|
|
|
|
|
|
|
LOG_ENABLED
|
|
|
|
-----------
|
|
|
|
|
|
|
|
Default: ``True``
|
|
|
|
|
2010-03-24 12:13:38 -03:00
|
|
|
Whether to enable logging.
|
|
|
|
|
|
|
|
.. setting:: LOG_ENCODING
|
|
|
|
|
|
|
|
LOG_ENCODING
|
|
|
|
------------
|
|
|
|
|
|
|
|
Default: ``'utf-8'``
|
|
|
|
|
|
|
|
The encoding to use for logging.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-08-21 08:54:12 -03:00
|
|
|
.. setting:: LOG_FILE
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-08-21 08:54:12 -03:00
|
|
|
LOG_FILE
|
|
|
|
--------
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
Default: ``None``
|
|
|
|
|
2010-01-13 15:51:08 -02:00
|
|
|
File name to use for logging output. If None, standard error will be used.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-08-21 08:54:12 -03:00
|
|
|
.. setting:: LOG_LEVEL
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-08-21 08:54:12 -03:00
|
|
|
LOG_LEVEL
|
|
|
|
---------
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
Default: ``'DEBUG'``
|
|
|
|
|
2009-08-20 18:17:48 -03:00
|
|
|
Minimum level to log. Available levels are: CRITICAL, ERROR, WARNING,
|
|
|
|
INFO, DEBUG. For more info see :ref:`topics-logging`.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-08-21 08:54:12 -03:00
|
|
|
.. setting:: LOG_STDOUT
|
|
|
|
|
|
|
|
LOG_STDOUT
|
|
|
|
----------
|
|
|
|
|
|
|
|
Default: ``False``
|
|
|
|
|
2010-01-13 15:51:08 -02:00
|
|
|
If ``True``, all standard output (and error) of your process will be redirected
|
|
|
|
to the log. For example if you ``print 'hello'`` it will appear in the Scrapy
|
|
|
|
log.
|
2009-08-21 08:54:12 -03:00
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
.. setting:: MEMDEBUG_ENABLED
|
|
|
|
|
|
|
|
MEMDEBUG_ENABLED
|
|
|
|
----------------
|
|
|
|
|
|
|
|
Default: ``False``
|
|
|
|
|
|
|
|
Whether to enable memory debugging.
|
|
|
|
|
|
|
|
.. setting:: MEMDEBUG_NOTIFY
|
|
|
|
|
|
|
|
MEMDEBUG_NOTIFY
|
|
|
|
---------------
|
|
|
|
|
|
|
|
Default: ``[]``
|
|
|
|
|
|
|
|
When memory debugging is enabled a memory report will be sent to the specified
|
|
|
|
addresses if this setting is not empty, otherwise the report will be written to
|
|
|
|
the log.
|
|
|
|
|
|
|
|
Example::
|
|
|
|
|
|
|
|
MEMDEBUG_NOTIFY = ['user@example.com']
|
|
|
|
|
|
|
|
.. setting:: MEMUSAGE_ENABLED
|
|
|
|
|
|
|
|
MEMUSAGE_ENABLED
|
|
|
|
----------------
|
|
|
|
|
|
|
|
Default: ``False``
|
|
|
|
|
|
|
|
Scope: ``scrapy.contrib.memusage``
|
|
|
|
|
|
|
|
Whether to enable the memory usage extension that will shutdown the Scrapy
|
|
|
|
process when it exceeds a memory limit, and also notify by email when that
|
|
|
|
happened.
|
|
|
|
|
|
|
|
See :ref:`topics-extensions-ref-memusage`.
|
|
|
|
|
|
|
|
.. setting:: MEMUSAGE_LIMIT_MB
|
|
|
|
|
|
|
|
MEMUSAGE_LIMIT_MB
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
Default: ``0``
|
|
|
|
|
|
|
|
Scope: ``scrapy.contrib.memusage``
|
|
|
|
|
|
|
|
The maximum amount of memory to allow (in megabytes) before shutting down
|
|
|
|
Scrapy (if MEMUSAGE_ENABLED is True). If zero, no check will be performed.
|
|
|
|
|
|
|
|
See :ref:`topics-extensions-ref-memusage`.
|
|
|
|
|
|
|
|
.. setting:: MEMUSAGE_NOTIFY_MAIL
|
|
|
|
|
|
|
|
MEMUSAGE_NOTIFY_MAIL
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
Default: ``False``
|
|
|
|
|
|
|
|
Scope: ``scrapy.contrib.memusage``
|
|
|
|
|
|
|
|
A list of emails to notify if the memory limit has been reached.
|
|
|
|
|
|
|
|
Example::
|
|
|
|
|
|
|
|
MEMUSAGE_NOTIFY_MAIL = ['user@example.com']
|
|
|
|
|
|
|
|
See :ref:`topics-extensions-ref-memusage`.
|
|
|
|
|
|
|
|
.. setting:: MEMUSAGE_REPORT
|
|
|
|
|
|
|
|
MEMUSAGE_REPORT
|
|
|
|
---------------
|
|
|
|
|
|
|
|
Default: ``False``
|
|
|
|
|
|
|
|
Scope: ``scrapy.contrib.memusage``
|
|
|
|
|
2011-05-18 14:46:20 -03:00
|
|
|
Whether to send a memory usage report after each spider has been closed.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
See :ref:`topics-extensions-ref-memusage`.
|
|
|
|
|
|
|
|
.. setting:: MEMUSAGE_WARNING_MB
|
|
|
|
|
|
|
|
MEMUSAGE_WARNING_MB
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
Default: ``0``
|
|
|
|
|
|
|
|
Scope: ``scrapy.contrib.memusage``
|
|
|
|
|
|
|
|
The maximum amount of memory to allow (in megabytes) before sending a warning
|
|
|
|
email notifying about it. If zero, no warning will be produced.
|
|
|
|
|
|
|
|
.. setting:: NEWSPIDER_MODULE
|
|
|
|
|
|
|
|
NEWSPIDER_MODULE
|
|
|
|
----------------
|
|
|
|
|
|
|
|
Default: ``''``
|
|
|
|
|
2010-08-19 00:04:52 -03:00
|
|
|
Module where to create new spiders using the :command:`genspider` command.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
Example::
|
|
|
|
|
|
|
|
NEWSPIDER_MODULE = 'mybot.spiders_dev'
|
|
|
|
|
2010-02-19 21:53:18 -02:00
|
|
|
.. setting:: RANDOMIZE_DOWNLOAD_DELAY
|
|
|
|
|
|
|
|
RANDOMIZE_DOWNLOAD_DELAY
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
Default: ``True``
|
|
|
|
|
|
|
|
If enabled, Scrapy will wait a random amount of time (between 0.5 and 1.5
|
|
|
|
* :setting:`DOWNLOAD_DELAY`) while fetching requests from the same
|
2013-12-28 06:30:34 +06:00
|
|
|
website.
|
2010-02-19 21:53:18 -02:00
|
|
|
|
|
|
|
This randomization decreases the chance of the crawler being detected (and
|
|
|
|
subsequently blocked) by sites which analyze requests looking for statistically
|
2010-08-21 01:26:35 -03:00
|
|
|
significant similarities in the time between their requests.
|
2010-02-19 21:53:18 -02:00
|
|
|
|
|
|
|
The randomization policy is the same used by `wget`_ ``--random-wait`` option.
|
|
|
|
|
|
|
|
If :setting:`DOWNLOAD_DELAY` is zero (default) this option has no effect.
|
|
|
|
|
|
|
|
.. _wget: http://www.gnu.org/software/wget/manual/wget.html
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
.. setting:: REDIRECT_MAX_TIMES
|
|
|
|
|
|
|
|
REDIRECT_MAX_TIMES
|
|
|
|
------------------
|
|
|
|
|
|
|
|
Default: ``20``
|
|
|
|
|
2013-01-22 14:52:18 -08:00
|
|
|
Defines the maximum times a request can be redirected. After this maximum the
|
2009-08-18 14:05:15 -03:00
|
|
|
request's response is returned as is. We used Firefox default value for the
|
|
|
|
same task.
|
|
|
|
|
|
|
|
.. setting:: REDIRECT_MAX_METAREFRESH_DELAY
|
|
|
|
|
|
|
|
REDIRECT_MAX_METAREFRESH_DELAY
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
Default: ``100``
|
|
|
|
|
|
|
|
Some sites use meta-refresh for redirecting to a session expired page, so we
|
|
|
|
restrict automatic redirection to a maximum delay (in seconds)
|
|
|
|
|
|
|
|
.. setting:: REDIRECT_PRIORITY_ADJUST
|
|
|
|
|
|
|
|
REDIRECT_PRIORITY_ADJUST
|
2010-09-05 04:58:14 -03:00
|
|
|
------------------------
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
Default: ``+2``
|
|
|
|
|
|
|
|
Adjust redirect request priority relative to original request.
|
|
|
|
A negative priority adjust means more priority.
|
|
|
|
|
|
|
|
.. setting:: ROBOTSTXT_OBEY
|
|
|
|
|
|
|
|
ROBOTSTXT_OBEY
|
|
|
|
--------------
|
|
|
|
|
|
|
|
Default: ``False``
|
|
|
|
|
|
|
|
Scope: ``scrapy.contrib.downloadermiddleware.robotstxt``
|
|
|
|
|
|
|
|
If enabled, Scrapy will respect robots.txt policies. For more information see
|
2009-08-21 15:05:06 -03:00
|
|
|
:ref:`topics-dlmw-robots`
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. setting:: SCHEDULER
|
|
|
|
|
|
|
|
SCHEDULER
|
|
|
|
---------
|
|
|
|
|
|
|
|
Default: ``'scrapy.core.scheduler.Scheduler'``
|
|
|
|
|
|
|
|
The scheduler to use for crawling.
|
|
|
|
|
2013-09-25 15:13:17 -03:00
|
|
|
.. setting:: SPIDER_CONTRACTS
|
2012-09-10 23:17:27 +02:00
|
|
|
|
|
|
|
SPIDER_CONTRACTS
|
|
|
|
----------------
|
|
|
|
|
|
|
|
Default:: ``{}``
|
|
|
|
|
|
|
|
A dict containing the scrapy contracts enabled in your project, used for
|
2012-10-09 16:02:12 -02:00
|
|
|
testing spiders. For more info see :ref:`topics-contracts`.
|
2012-09-10 23:17:27 +02:00
|
|
|
|
2013-09-25 15:13:17 -03:00
|
|
|
.. setting:: SPIDER_CONTRACTS_BASE
|
|
|
|
|
2012-09-10 23:17:27 +02:00
|
|
|
SPIDER_CONTRACTS_BASE
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
Default::
|
|
|
|
|
|
|
|
{
|
|
|
|
'scrapy.contracts.default.UrlContract' : 1,
|
|
|
|
'scrapy.contracts.default.ReturnsContract': 2,
|
|
|
|
'scrapy.contracts.default.ScrapesContract': 3,
|
|
|
|
}
|
|
|
|
|
|
|
|
A dict containing the scrapy contracts enabled by default in Scrapy. You should
|
|
|
|
never modify this setting in your project, modify :setting:`SPIDER_CONTRACTS`
|
2012-10-09 16:02:12 -02:00
|
|
|
instead. For more info see :ref:`topics-contracts`.
|
2012-09-10 23:17:27 +02:00
|
|
|
|
2014-07-17 10:25:07 -03:00
|
|
|
.. setting:: SPIDER_MANAGER_CLASS
|
|
|
|
|
|
|
|
SPIDER_MANAGER_CLASS
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
Default: ``'scrapy.spidermanager.SpiderManager'``
|
|
|
|
|
|
|
|
The class that will be used for handling spiders, which must implement the
|
|
|
|
:ref:`topics-api-spidermanager`.
|
|
|
|
|
2013-09-25 15:13:17 -03:00
|
|
|
.. setting:: SPIDER_MIDDLEWARES
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
SPIDER_MIDDLEWARES
|
|
|
|
------------------
|
|
|
|
|
|
|
|
Default:: ``{}``
|
|
|
|
|
|
|
|
A dict containing the spider middlewares enabled in your project, and their
|
|
|
|
orders. For more info see :ref:`topics-spider-middleware-setting`.
|
|
|
|
|
|
|
|
.. setting:: SPIDER_MIDDLEWARES_BASE
|
|
|
|
|
|
|
|
SPIDER_MIDDLEWARES_BASE
|
|
|
|
-----------------------
|
|
|
|
|
|
|
|
Default::
|
|
|
|
|
|
|
|
{
|
|
|
|
'scrapy.contrib.spidermiddleware.httperror.HttpErrorMiddleware': 50,
|
|
|
|
'scrapy.contrib.spidermiddleware.offsite.OffsiteMiddleware': 500,
|
|
|
|
'scrapy.contrib.spidermiddleware.referer.RefererMiddleware': 700,
|
|
|
|
'scrapy.contrib.spidermiddleware.urllength.UrlLengthMiddleware': 800,
|
|
|
|
'scrapy.contrib.spidermiddleware.depth.DepthMiddleware': 900,
|
|
|
|
}
|
|
|
|
|
|
|
|
A dict containing the spider middlewares enabled by default in Scrapy. You
|
|
|
|
should never modify this setting in your project, modify
|
|
|
|
:setting:`SPIDER_MIDDLEWARES` instead. For more info see
|
|
|
|
:ref:`topics-spider-middleware-setting`.
|
|
|
|
|
|
|
|
.. setting:: SPIDER_MODULES
|
|
|
|
|
|
|
|
SPIDER_MODULES
|
|
|
|
--------------
|
|
|
|
|
|
|
|
Default: ``[]``
|
|
|
|
|
|
|
|
A list of modules where Scrapy will look for spiders.
|
|
|
|
|
|
|
|
Example::
|
|
|
|
|
|
|
|
SPIDER_MODULES = ['mybot.spiders_prod', 'mybot.spiders_dev']
|
|
|
|
|
|
|
|
.. setting:: STATS_CLASS
|
|
|
|
|
|
|
|
STATS_CLASS
|
|
|
|
-----------
|
|
|
|
|
2010-08-22 01:24:07 -03:00
|
|
|
Default: ``'scrapy.statscol.MemoryStatsCollector'``
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2012-08-28 18:31:03 -03:00
|
|
|
The class to use for collecting stats, who must implement the
|
|
|
|
:ref:`topics-api-stats`.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. setting:: STATS_DUMP
|
|
|
|
|
|
|
|
STATS_DUMP
|
|
|
|
----------
|
|
|
|
|
2011-05-17 22:42:05 -03:00
|
|
|
Default: ``True``
|
|
|
|
|
2012-11-03 17:05:01 -02:00
|
|
|
Dump the :ref:`Scrapy stats <topics-stats>` (to the Scrapy log) once the spider
|
|
|
|
finishes.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2011-05-17 22:42:05 -03:00
|
|
|
For more info see: :ref:`topics-stats`.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. setting:: STATSMAILER_RCPTS
|
|
|
|
|
|
|
|
STATSMAILER_RCPTS
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
Default: ``[]`` (empty list)
|
|
|
|
|
2011-05-18 14:46:20 -03:00
|
|
|
Send Scrapy stats after spiders finish scraping. See
|
2009-08-18 14:05:15 -03:00
|
|
|
:class:`~scrapy.contrib.statsmailer.StatsMailer` for more info.
|
|
|
|
|
|
|
|
.. setting:: TELNETCONSOLE_ENABLED
|
|
|
|
|
|
|
|
TELNETCONSOLE_ENABLED
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
Default: ``True``
|
|
|
|
|
2010-08-12 20:45:11 -03:00
|
|
|
A boolean which specifies if the :ref:`telnet console <topics-telnetconsole>`
|
|
|
|
will be enabled (provided its extension is also enabled).
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. setting:: TELNETCONSOLE_PORT
|
|
|
|
|
|
|
|
TELNETCONSOLE_PORT
|
|
|
|
------------------
|
|
|
|
|
2010-09-05 06:48:08 -03:00
|
|
|
Default: ``[6023, 6073]``
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2010-09-05 06:48:08 -03:00
|
|
|
The port range to use for the telnet console. If set to ``None`` or ``0``, a
|
2009-08-18 14:05:15 -03:00
|
|
|
dynamically assigned port is used. For more info see
|
|
|
|
:ref:`topics-telnetconsole`.
|
|
|
|
|
|
|
|
.. setting:: TEMPLATES_DIR
|
|
|
|
|
|
|
|
TEMPLATES_DIR
|
|
|
|
-------------
|
|
|
|
|
|
|
|
Default: ``templates`` dir inside scrapy module
|
|
|
|
|
2010-08-19 00:04:52 -03:00
|
|
|
The directory where to look for templates when creating new projects with
|
|
|
|
:command:`startproject` command.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. setting:: URLLENGTH_LIMIT
|
|
|
|
|
|
|
|
URLLENGTH_LIMIT
|
|
|
|
---------------
|
|
|
|
|
|
|
|
Default: ``2083``
|
|
|
|
|
|
|
|
Scope: ``contrib.spidermiddleware.urllength``
|
|
|
|
|
|
|
|
The maximum URL length to allow for crawled URLs. For more information about
|
|
|
|
the default value for this setting see: http://www.boutell.com/newfaq/misc/urllength.html
|
|
|
|
|
|
|
|
.. setting:: USER_AGENT
|
|
|
|
|
|
|
|
USER_AGENT
|
|
|
|
----------
|
|
|
|
|
2012-09-19 01:46:46 -03:00
|
|
|
Default: ``"Scrapy/VERSION (+http://scrapy.org)"``
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2014-01-21 18:25:17 +01:00
|
|
|
The default User-Agent to use when crawling, unless overridden.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2010-08-17 14:27:48 -03:00
|
|
|
.. _Amazon web services: http://aws.amazon.com/
|
2011-08-02 11:57:55 -03:00
|
|
|
.. _breadth-first order: http://en.wikipedia.org/wiki/Breadth-first_search
|
|
|
|
.. _depth-first order: http://en.wikipedia.org/wiki/Depth-first_search
|