scrapy/docs/ref/downloader-middleware.rst

.. _ref-downloader-middleware:

========================================
Built-in downloader middleware reference
========================================

This page describes all downloader middleware components that come with
Scrapy. For information on how to use them and how to write your own downloader
middleware, see the :ref:`downloader middleware usage guide
<topics-downloader-middleware>`.

For a list of the components enabled by default (and their orders) see the
:setting:`DOWNLOADER_MIDDLEWARES_BASE` setting.

Available downloader middlewares
================================

DefaultHeadersMiddleware
------------------------

.. module:: scrapy.contrib.downloadermiddleware.defaultheaders
   :synopsis: Default Headers Downloader Middleware

.. class:: DefaultHeadersMiddleware

    This middleware sets all default requests headers specified in the
    :setting:`DEFAULT_REQUEST_HEADERS` setting.

DebugMiddleware
---------------

.. module:: scrapy.contrib.downloadermiddleware.debug
   :synopsis: Downloader middlewares for debugging

.. class:: DebugMiddleware

    This is a convenient middleware to inspect what's passing through the
    downloader middleware. It logs all requests and responses catched by the
    middleware component methods. This middleware does not use any settings and
    does not come enabled by default. Instead, it's meant to be inserted at the
    point of the middleware that you want to inspect.

HttpCacheMiddleware
-------------------

.. module:: scrapy.contrib.downloadermiddleware.httpcache
   :synopsis: HTTP Cache downloader middleware

.. class:: HttpCacheMiddleware

    This middleware provides low-level cache to all HTTP requests and responses.
    Every request and its corresponding response are cached and then, when that
    same request is seen again, the response is returned without transferring
    anything from the Internet.

    The HTTP cache is useful for testing spiders faster (without having to wait for
    downloads every time) and for trying your spider off-line when you don't have
    an Internet connection.

    The :class:`HttpCacheMiddleware` can be configured through the following
    settings (see the settings documentation for more info):

        * :setting:`HTTPCACHE_DIR` - this one actually enables the cache besides
          settings the cache dir
        * :setting:`HTTPCACHE_IGNORE_MISSING` - ignoring missing requests instead
          of downloading them
        * :setting:`HTTPCACHE_SECTORIZE` - split HTTP cache in several directories
          (for performance reasons)
        * :setting:`HTTPCACHE_EXPIRATION_SECS` - how many secs until the cache is
          considered out of date