1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 12:03:58 +00:00
scrapy/docs/ref/downloader-middleware.rst
2009-05-28 13:45:26 -03:00

72 lines
2.6 KiB
ReStructuredText

.. _ref-downloader-middleware:
========================================
Built-in downloader middleware reference
========================================
This page describes all downloader middleware components that come with
Scrapy. For information on how to use them and how to write your own downloader
middleware, see the :ref:`downloader middleware usage guide
<topics-downloader-middleware>`.
For a list of the components enabled by default (and their orders) see the
:setting:`DOWNLOADER_MIDDLEWARES_BASE` setting.
Available downloader middlewares
================================
DefaultHeadersMiddleware
------------------------
.. module:: scrapy.contrib.downloadermiddleware.defaultheaders
:synopsis: Default Headers Downloader Middleware
.. class:: DefaultHeadersMiddleware
This middleware sets all default requests headers specified in the
:setting:`DEFAULT_REQUEST_HEADERS` setting.
DebugMiddleware
---------------
.. module:: scrapy.contrib.downloadermiddleware.debug
:synopsis: Downloader middlewares for debugging
.. class:: DebugMiddleware
This is a convenient middleware to inspect what's passing through the
downloader middleware. It logs all requests and responses catched by the
middleware component methods. This middleware does not use any settings and
does not come enabled by default. Instead, it's meant to be inserted at the
point of the middleware that you want to inspect.
HttpCacheMiddleware
-------------------
.. module:: scrapy.contrib.downloadermiddleware.httpcache
:synopsis: HTTP Cache downloader middleware
.. class:: HttpCacheMiddleware
This middleware provides low-level cache to all HTTP requests and responses.
Every request and its corresponding response are cached and then, when that
same request is seen again, the response is returned without transferring
anything from the Internet.
The HTTP cache is useful for testing spiders faster (without having to wait for
downloads every time) and for trying your spider off-line when you don't have
an Internet connection.
The :class:`HttpCacheMiddleware` can be configured through the following
settings (see the settings documentation for more info):
* :setting:`HTTPCACHE_DIR` - this one actually enables the cache besides
settings the cache dir
* :setting:`HTTPCACHE_IGNORE_MISSING` - ignoring missing requests instead
of downloading them
* :setting:`HTTPCACHE_SECTORIZE` - split HTTP cache in several directories
(for performance reasons)
* :setting:`HTTPCACHE_EXPIRATION_SECS` - how many secs until the cache is
considered out of date