mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-25 12:03:58 +00:00
Merge pull request #1820 from redapple/http-tls-settings
[MRG+1] Document DOWNLOADER_* settings for HTTP/1.0 and TLS
This commit is contained in:
commit
241ae9f22f
@ -26,6 +26,9 @@ This 1.1 release brings a lot of interesting features and bug fixes:
|
||||
- Selectors were extracted to the parsel_ library (:issue:`1409`). This means
|
||||
you can use Scrapy Selectors without Scrapy and also upgrade the
|
||||
selectors engine without needing to upgrade Scrapy.
|
||||
- HTTPS downloader now does TLS protocol negotiation by default,
|
||||
instead of forcing TLS 1.0. You can also set the SSL/TLS method
|
||||
using the new :setting:`DOWNLOADER_CLIENT_TLS_METHOD`.
|
||||
|
||||
- These bug fixes may require your attention:
|
||||
|
||||
@ -85,6 +88,10 @@ Additional New Features and Enhancements
|
||||
interval (:issue:`1282`).
|
||||
- Download handlers are now lazy-loaded on first request using their
|
||||
scheme (:issue:`1390`, :issue:`1421`).
|
||||
- HTTPS download handlers do not force TLS 1.0 anymore; instead,
|
||||
OpenSSL's ``SSLv23_method()/TLS_method()`` is used allowing to try
|
||||
negotiating with the remote hosts the highest TLS protocol version
|
||||
it can (:issue:`1794`, :issue:`1629`).
|
||||
- ``RedirectMiddleware`` now skips the status codes from
|
||||
``handle_httpstatus_list`` on spider attribute
|
||||
or in ``Request``'s ``meta`` key (:issue:`1334`, :issue:`1364`,
|
||||
|
@ -366,6 +366,78 @@ Default: ``'scrapy.core.downloader.Downloader'``
|
||||
|
||||
The downloader to use for crawling.
|
||||
|
||||
.. setting:: DOWNLOADER_HTTPCLIENTFACTORY
|
||||
|
||||
DOWNLOADER_HTTPCLIENTFACTORY
|
||||
----------------------------
|
||||
|
||||
Default: ``'scrapy.core.downloader.webclient.ScrapyHTTPClientFactory'``
|
||||
|
||||
Defines a Twisted ``protocol.ClientFactory`` class to use for HTTP/1.0
|
||||
connections (for ``HTTP10DownloadHandler``).
|
||||
|
||||
.. note::
|
||||
|
||||
HTTP/1.0 is rarely used nowadays so you can safely ignore this setting,
|
||||
unless you use Twisted<11.1, or if you really want to use HTTP/1.0
|
||||
and override :setting:`DOWNLOAD_HANDLERS_BASE` for ``http(s)`` scheme
|
||||
accordingly, i.e. to
|
||||
``'scrapy.core.downloader.handlers.http.HTTP10DownloadHandler'``.
|
||||
|
||||
.. setting:: DOWNLOADER_CLIENTCONTEXTFACTORY
|
||||
|
||||
DOWNLOADER_CLIENTCONTEXTFACTORY
|
||||
-------------------------------
|
||||
|
||||
Default: ``'scrapy.core.downloader.contextfactory.ScrapyClientContextFactory'``
|
||||
|
||||
Represents the classpath to the ContextFactory to use.
|
||||
|
||||
Here, "ContextFactory" is a Twisted term for SSL/TLS contexts, defining
|
||||
the TLS/SSL protocol version to use, whether to do certificate verification,
|
||||
or even enable client-side authentication (and various other things).
|
||||
|
||||
.. note::
|
||||
|
||||
Scrapy default context factory **does NOT perform remote server
|
||||
certificate verification**. This is usually fine for web scraping.
|
||||
|
||||
If you do need remote server certificate verification enabled,
|
||||
Scrapy also has another context factory class that you can set,
|
||||
``'scrapy.core.downloader.contextfactory.BrowserLikeContextFactory'``,
|
||||
which uses the platform's certificates to validate remote endpoints.
|
||||
**This is only available if you use Twisted>=14.0.**
|
||||
|
||||
If you do use a custom ContextFactory, make sure it accepts a ``method``
|
||||
parameter at init (this is the ``OpenSSL.SSL`` method mapping
|
||||
:setting:`DOWNLOADER_CLIENT_TLS_METHOD`).
|
||||
|
||||
.. setting:: DOWNLOADER_CLIENT_TLS_METHOD
|
||||
|
||||
DOWNLOADER_CLIENT_TLS_METHOD
|
||||
----------------------------
|
||||
|
||||
Default: ``'TLS'``
|
||||
|
||||
Use this setting to customize the TLS/SSL method used by the default
|
||||
HTTP/1.1 downloader.
|
||||
|
||||
This setting must be one of these string values:
|
||||
|
||||
- ``'TLS'``: maps to OpenSSL's ``TLS_method()`` (a.k.a ``SSLv23_method()``),
|
||||
which allows protocol negotiation, starting from the highest supported
|
||||
by the platform; **default, recommended**
|
||||
- ``'TLSv1.0'``: this value forces HTTPS connections to use TLS version 1.0 ;
|
||||
set this if you want the behavior of Scrapy<1.1
|
||||
- ``'TLSv1.1'``: forces TLS version 1.1
|
||||
- ``'TLSv1.2'``: forces TLS version 1.2
|
||||
- ``'SSLv3'``: forces SSL version 3 (**not recommended**)
|
||||
|
||||
.. note::
|
||||
|
||||
We recommend that you use PyOpenSSL>=0.13 and Twisted>=0.13
|
||||
or above (Twisted>=14.0 if you can).
|
||||
|
||||
.. setting:: DOWNLOADER_MIDDLEWARES
|
||||
|
||||
DOWNLOADER_MIDDLEWARES
|
||||
|
Loading…
x
Reference in New Issue
Block a user