scrapy/sep/sep-021.rst

=======  ===================
SEP      21
Title    Add-ons
Author   Pablo Hoffman
Created  2014-02-14
Status   Draft
=======  ===================

================
SEP-021: Add-ons
================

This proposal introduces add-ons, a unified way to manage Scrapy extensions,
middlewares and pipelines.

Scrapy currently supports many hooks and mechanisms for extending its
functionality, but no single entry point for enabling and configuring them.
Instead, the hooks are spread over:

* Spider middlewares (SPIDER_MIDDLEWARES)
* Downloader middlewares (DOWNLOADER_MIDDLEWARES)
* Downloader handlers (DOWNLOADER_HANDLERS)
* Item pipelines (ITEM_PIPELINES)
* Feed exporters and storages (FEED_EXPORTERS, FEED_STORAGES)
* Overrideable components (DUPEFILTER_CLASS, STATS_CLASS, SCHEDULER, SPIDER_MANAGER_CLASS, ITEM_PROCESSOR, etc)
* Generic extensions (EXTENSIONS)
* CLI commands (COMMANDS_MODULE)

One problem of this approach is that enabling an extension often requires
modifying many settings, often in a coordinated way, which is complex and error
prone. Add-ons are meant to fix this by providing a simple mechanism for
enabling extensions.

Design goals and non-goals
==========================

Goals:

* simple to manage: adding or removing extensions should be just a matter of
  adding or removing lines in a ``scrapy.cfg`` file
* backward compatibility with enabling extension the "old way" (ie. modifying
  settings directly)

Non-goals:

* a way to publish, distribute or discover extensions (use pypi for that)


Managing add-ons
================

Add-ons are defined in the ``scrapy.cfg`` file, inside the ``[addons]``
section.

To enable the "httpcache" addon, either shipped with Scrapy or in the Python
search path, create an entry for it in your ``scrapy.cfg``, like this::

    [addons]
    httpcache = 

You may also specify the full path to an add-on (which may be either a .py file
or a folder containing __init__.py)::

    [addons]
    mongodb_pipeline = /path/to/mongodb_pipeline.py


Writing add-ons
===============

Add-ons are Python modules that implement the following callbacks.

addon_configure
---------------

Receives the Settings object and modifies it to enable the required components.
If it raises an exception, Scrapy will print it and exit.

Examples::

    def addon_configure(settings):
        settings.overrides['DOWNLADER_MIDDLEWARES'].update({
            'scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware': 900,
        })

::

    def addon_configure(settings):
        try:
            import boto
        except ImportError:
            raise RuntimeError("boto library is required")


crawler_ready
-------------

``crawler_ready`` receives a Crawler object after it has been initialized and
is meant to be used to perform post-initialization checks like making sure the
extension and its dependencies were configured properly. If it raises an
exception, Scrapy will print and exit.

Examples::

    def crawler_ready(crawler):
        if 'some.other.addon' not in crawler.extensions.enabled:
            raise RuntimeError("Some other addon is required to use this addon")
add SEP-021 (Add-ons) - work in progress 2014-02-11 20:15:49 -02:00			`======= ===================`
			`SEP 21`
			`Title Add-ons`
			`Author Pablo Hoffman`
			`Created 2014-02-14`
			`Status Draft`
			`======= ===================`

			`================`
			`SEP-021: Add-ons`
			`================`

			`This proposal introduces add-ons, a unified way to manage Scrapy extensions,`
			`middlewares and pipelines.`

			`Scrapy currently supports many hooks and mechanisms for extending its`
			`functionality, but no single entry point for enabling and configuring them.`
			`Instead, the hooks are spread over:`

			`* Spider middlewares (SPIDER_MIDDLEWARES)`
			`* Downloader middlewares (DOWNLOADER_MIDDLEWARES)`
			`* Downloader handlers (DOWNLOADER_HANDLERS)`
			`* Item pipelines (ITEM_PIPELINES)`
			`* Feed exporters and storages (FEED_EXPORTERS, FEED_STORAGES)`
			`* Overrideable components (DUPEFILTER_CLASS, STATS_CLASS, SCHEDULER, SPIDER_MANAGER_CLASS, ITEM_PROCESSOR, etc)`
			`* Generic extensions (EXTENSIONS)`
			`* CLI commands (COMMANDS_MODULE)`

			`One problem of this approach is that enabling an extension often requires`
			`modifying many settings, often in a coordinated way, which is complex and error`
			`prone. Add-ons are meant to fix this by providing a simple mechanism for`
			`enabling extensions.`

			`Design goals and non-goals`
			`==========================`

			`Goals:`

			`* simple to manage: adding or removing extensions should be just a matter of`
			adding or removing lines in a ``scrapy.cfg`` file
			`* backward compatibility with enabling extension the "old way" (ie. modifying`
			`settings directly)`

			`Non-goals:`

			`* a way to publish, distribute or discover extensions (use pypi for that)`


			`Managing add-ons`
			`================`

			Add-ons are defined in the ``scrapy.cfg`` file, inside the ``[addons]``
			`section.`

			`To enable the "httpcache" addon, either shipped with Scrapy or in the Python`
			search path, create an entry for it in your ``scrapy.cfg``, like this::

			`[addons]`
			`httpcache =`

			`You may also specify the full path to an add-on (which may be either a .py file`
			`or a folder containing __init__.py)::`

			`[addons]`
			`mongodb_pipeline = /path/to/mongodb_pipeline.py`


			`Writing add-ons`
			`===============`

			`Add-ons are Python modules that implement the following callbacks.`

			`addon_configure`
			`---------------`

			`Receives the Settings object and modifies it to enable the required components.`
			`If it raises an exception, Scrapy will print it and exit.`

			`Examples::`

			`def addon_configure(settings):`
			`settings.overrides['DOWNLADER_MIDDLEWARES'].update({`
			`'scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware': 900,`
			`})`

			`::`

			`def addon_configure(settings):`
			`try:`
			`import boto`
			`except ImportError:`
			`raise RuntimeError("boto library is required")`


			`crawler_ready`
			`-------------`

			``crawler_ready`` receives a Crawler object after it has been initialized and
			`is meant to be used to perform post-initialization checks like making sure the`
			`extension and its dependencies were configured properly. If it raises an`
			`exception, Scrapy will print and exit.`

			`Examples::`

			`def crawler_ready(crawler):`
			`if 'some.other.addon' not in crawler.extensions.enabled:`
			`raise RuntimeError("Some other addon is required to use this addon")`