2009-01-03 09:14:52 +00:00
|
|
|
.. _topics-settings:
|
|
|
|
|
2009-01-02 16:25:28 +00:00
|
|
|
========
|
|
|
|
Settings
|
|
|
|
========
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2009-01-11 20:04:13 +00:00
|
|
|
.. module:: scrapy.conf
|
|
|
|
:synopsis: Settings manager
|
|
|
|
|
2008-12-30 13:28:36 +00:00
|
|
|
The Scrapy settings allows you to customize the behaviour of all Scrapy
|
|
|
|
components, including the core, extensions, pipelines and spiders themselves.
|
|
|
|
|
|
|
|
The settings infrastructure provides a global namespace of key-value mappings
|
|
|
|
where the code can pull configuration values from. The settings can be
|
|
|
|
populated through different mechanisms, which are described below.
|
|
|
|
|
|
|
|
How to populate settings
|
|
|
|
========================
|
|
|
|
|
|
|
|
Settings can be populated using different mechanisms, each of which having a
|
|
|
|
different precedence. Here is the list of them in decreasing order of
|
|
|
|
precedence:
|
|
|
|
|
|
|
|
1. Global overrides (most precedence)
|
|
|
|
2. Environment variables
|
2009-01-07 18:04:40 +00:00
|
|
|
3. scrapy_settings
|
|
|
|
4. Default settings per-command
|
|
|
|
5. Default global settings (less precedence)
|
2008-12-30 13:28:36 +00:00
|
|
|
|
|
|
|
This mechanisms are described with more detail below.
|
|
|
|
|
|
|
|
1. Global overrides
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
Global overrides are the ones that takes most precedence, and are usually
|
2009-01-07 18:04:40 +00:00
|
|
|
populated by command line options.
|
2008-12-30 13:28:36 +00:00
|
|
|
|
|
|
|
Example::
|
|
|
|
>>> from scrapy.conf import settings
|
|
|
|
>>> settings.overrides['LOG_ENABLED'] = True
|
|
|
|
|
|
|
|
2. Environment variables
|
|
|
|
------------------------
|
|
|
|
|
2009-01-07 18:04:40 +00:00
|
|
|
.. highlight:: sh
|
|
|
|
|
2008-12-30 13:28:36 +00:00
|
|
|
You can populate settings using environment variables prefixed with
|
|
|
|
``SCRAPY_``. For example, to change the log file location::
|
|
|
|
|
|
|
|
$ export SCRAPY_LOG_FILE=/tmp/scrapy.log
|
|
|
|
$ scrapy-ctl.py crawl example.com
|
|
|
|
|
2009-01-07 18:04:40 +00:00
|
|
|
3. scrapy_settings
|
2008-12-30 13:28:36 +00:00
|
|
|
------------------
|
|
|
|
|
|
|
|
scrapy_settings is the standard configuration file for your Scrapy project.
|
|
|
|
It's where most of your custom settings will be populated.
|
|
|
|
|
2009-01-07 18:04:40 +00:00
|
|
|
4. Default settings per-command
|
|
|
|
-------------------------------
|
|
|
|
|
|
|
|
Each scrapy-ctl command can have its own default settings, which override the
|
|
|
|
global default settings. Those custom command settings are located inside the
|
|
|
|
``scrapy.conf.commands`` module, or you can specify custom settings to override
|
|
|
|
per-comand inside your project, by writing them in the module referenced by the
|
|
|
|
:setting:`COMMANDS_SETTINGS_MODULE` setting. Those settings will take more
|
|
|
|
|
|
|
|
5. Default global settings
|
|
|
|
--------------------------
|
2008-12-30 13:28:36 +00:00
|
|
|
|
|
|
|
The global defaults are located in scrapy.conf.default_settings and documented
|
2009-01-02 16:08:18 +00:00
|
|
|
in the :ref:`settings` page.
|
2008-12-30 13:28:36 +00:00
|
|
|
|
|
|
|
|
|
|
|
How to access settings
|
|
|
|
======================
|
|
|
|
|
2009-01-07 18:04:40 +00:00
|
|
|
.. highlight:: python
|
|
|
|
|
2009-01-11 20:04:13 +00:00
|
|
|
Here's an example of the simplest way to access settings from Python code::
|
2008-12-30 13:28:36 +00:00
|
|
|
|
|
|
|
>>> from scrapy.conf import settings
|
|
|
|
>>> print settings['LOG_ENABLED']
|
|
|
|
True
|
|
|
|
|
2009-01-11 20:04:13 +00:00
|
|
|
In other words, settings can be accesed like a dict, but it's usually preferred
|
|
|
|
to extract the setting in the format you need it to avoid type errors. In order
|
|
|
|
to do that you'll have to use one of the following methods:
|
|
|
|
|
|
|
|
.. class:: scrapy.conf.Settings
|
|
|
|
|
|
|
|
.. method:: get(name, default=None)
|
|
|
|
|
|
|
|
Get a setting value without affecting its original type.
|
|
|
|
|
|
|
|
``name`` is a string with the setting name
|
|
|
|
|
|
|
|
``default`` is the value to return if no setting is found
|
|
|
|
|
|
|
|
.. method:: getbool(name, deafult=Flse)
|
|
|
|
|
|
|
|
Get a setting value as a boolean. For example, both ``1`` and ``'1'``, and
|
|
|
|
``True`` return ``True``, while ``0``, ``'0'``, ``False`` and ``None``
|
|
|
|
return ``False````
|
|
|
|
|
|
|
|
For example, settings populated through environment variables set to ``'0'``
|
|
|
|
will return ``False`` when using this method.
|
|
|
|
|
|
|
|
``name`` is a string with the setting name
|
|
|
|
|
|
|
|
``default`` is the value to return if no setting is found
|
|
|
|
|
|
|
|
.. method:: getint(name, default=0)
|
|
|
|
|
|
|
|
Get a setting value as an int
|
|
|
|
|
|
|
|
``name`` is a string with the setting name
|
|
|
|
|
|
|
|
``default`` is the value to return if no setting is found
|
|
|
|
|
|
|
|
.. method:: getfloat(name, default=0.0)
|
|
|
|
|
|
|
|
Get a setting value as a float
|
|
|
|
|
|
|
|
``name`` is a string with the setting name
|
|
|
|
|
|
|
|
``default`` is the value to return if no setting is found
|
|
|
|
|
|
|
|
.. method:: getlist(name, default=None)
|
|
|
|
|
|
|
|
Get a setting value as a list. If the setting original type is a list it
|
|
|
|
will be returned verbatim. If it's a string it will be splitted by ",".
|
|
|
|
|
|
|
|
For example, settings populated through environment variables set to
|
|
|
|
``'one,two'`` will return a list ['one', 'two'] when using this method.
|
|
|
|
|
|
|
|
``name`` is a string with the setting name
|
|
|
|
|
|
|
|
``default`` is the value to return if no setting is found
|
|
|
|
|
|
|
|
Available built-in settings
|
|
|
|
===========================
|
2008-12-30 13:28:36 +00:00
|
|
|
|
2009-01-02 16:08:18 +00:00
|
|
|
See :ref:`settings`.
|
2008-12-30 13:28:36 +00:00
|
|
|
|
|
|
|
Rationale for setting names
|
|
|
|
===========================
|
|
|
|
|
|
|
|
Setting names are usually prefixed with the component that they configure. For
|
2009-01-07 18:04:40 +00:00
|
|
|
example, proper setting names for a fictional robots.txt extension would be
|
2008-12-30 13:28:36 +00:00
|
|
|
``ROBOTSTXT_ENABLED``, ``ROBOTSTXT_OBEY``, ``ROBOTSTXT_CACHEDIR``, etc.
|