mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-06 11:00:46 +00:00
Improve internal refs to scrapy.Request and scrapy.Selector (#6526)
* Improve internal refs to scrapy.Selector. * Improve internal refs to scrapy.Request. * More scrapy.http fixes. * Fix FormRequest refs. * More fixes. * Simplifications. * Last fixes. * Add the parsel intersphinx.
This commit is contained in:
parent
5d3aa80ad1
commit
59fcb9b93c
@ -284,6 +284,7 @@ intersphinx_mapping = {
|
||||
"cryptography": ("https://cryptography.io/en/latest/", None),
|
||||
"cssselect": ("https://cssselect.readthedocs.io/en/latest", None),
|
||||
"itemloaders": ("https://itemloaders.readthedocs.io/en/latest/", None),
|
||||
"parsel": ("https://parsel.readthedocs.io/en/latest/", None),
|
||||
"pytest": ("https://docs.pytest.org/en/latest", None),
|
||||
"python": ("https://docs.python.org/3", None),
|
||||
"sphinx": ("https://www.sphinx-doc.org/en/master", None),
|
||||
|
139
docs/news.rst
139
docs/news.rst
@ -635,7 +635,7 @@ Bug fixes
|
||||
exception if ``default`` is ``None``.
|
||||
(:issue:`6308`, :issue:`6310`)
|
||||
|
||||
- :class:`~scrapy.selector.Selector` now uses
|
||||
- :class:`~scrapy.Selector` now uses
|
||||
:func:`scrapy.utils.response.get_base_url` to determine the base URL of a
|
||||
given :class:`~scrapy.http.Response`. (:issue:`6265`)
|
||||
|
||||
@ -653,7 +653,7 @@ Documentation
|
||||
- Add a FAQ entry about :ref:`creating blank requests <faq-blank-request>`.
|
||||
(:issue:`6203`, :issue:`6208`)
|
||||
|
||||
- Document that :attr:`scrapy.selector.Selector.type` can be ``"json"``.
|
||||
- Document that :attr:`scrapy.Selector.type` can be ``"json"``.
|
||||
(:issue:`6328`, :issue:`6334`)
|
||||
|
||||
Quality assurance
|
||||
@ -734,7 +734,7 @@ Documentation
|
||||
- Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
|
||||
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
|
||||
|
||||
- Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
|
||||
- Extended documentation for :attr:`.Request.meta`.
|
||||
(:issue:`5565`)
|
||||
|
||||
- Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
|
||||
@ -1095,7 +1095,7 @@ New features
|
||||
:setting:`RANDOMIZE_DOWNLOAD_DELAY` can now be set on a per-domain basis
|
||||
via the new :setting:`DOWNLOAD_SLOTS` setting. (:issue:`5328`)
|
||||
|
||||
- Added :meth:`TextResponse.jmespath`, a shortcut for JMESPath selectors
|
||||
- Added :meth:`.TextResponse.jmespath`, a shortcut for JMESPath selectors
|
||||
available since parsel_ 1.8.1. (:issue:`5894`, :issue:`5915`)
|
||||
|
||||
- Added :signal:`feed_slot_closed` and :signal:`feed_exporter_closed`
|
||||
@ -1275,7 +1275,7 @@ New features
|
||||
avoid confusion.
|
||||
(:issue:`5717`, :issue:`5722`, :issue:`5727`)
|
||||
|
||||
- The ``callback`` parameter of :class:`~scrapy.http.Request` can now be set
|
||||
- The ``callback`` parameter of :class:`~scrapy.Request` can now be set
|
||||
to :func:`scrapy.http.request.NO_CALLBACK`, to distinguish it from
|
||||
``None``, as the latter indicates that the default spider callback
|
||||
(:meth:`~scrapy.Spider.parse`) is to be used.
|
||||
@ -1772,17 +1772,17 @@ Highlights:
|
||||
Security bug fixes
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- When a :class:`~scrapy.http.Request` object with cookies defined gets a
|
||||
redirect response causing a new :class:`~scrapy.http.Request` object to be
|
||||
- When a :class:`~scrapy.Request` object with cookies defined gets a
|
||||
redirect response causing a new :class:`~scrapy.Request` object to be
|
||||
scheduled, the cookies defined in the original
|
||||
:class:`~scrapy.http.Request` object are no longer copied into the new
|
||||
:class:`~scrapy.http.Request` object.
|
||||
:class:`~scrapy.Request` object are no longer copied into the new
|
||||
:class:`~scrapy.Request` object.
|
||||
|
||||
If you manually set the ``Cookie`` header on a
|
||||
:class:`~scrapy.http.Request` object and the domain name of the redirect
|
||||
:class:`~scrapy.Request` object and the domain name of the redirect
|
||||
URL is not an exact match for the domain of the URL of the original
|
||||
:class:`~scrapy.http.Request` object, your ``Cookie`` header is now dropped
|
||||
from the new :class:`~scrapy.http.Request` object.
|
||||
:class:`~scrapy.Request` object, your ``Cookie`` header is now dropped
|
||||
from the new :class:`~scrapy.Request` object.
|
||||
|
||||
The old behavior could be exploited by an attacker to gain access to your
|
||||
cookies. Please, see the `cjvr-mfj7-j4j8 security advisory`_ for more
|
||||
@ -1795,10 +1795,10 @@ Security bug fixes
|
||||
``example.com`` and any subdomain) by defining the shared domain
|
||||
suffix (e.g. ``example.com``) as the cookie domain when defining
|
||||
your cookies. See the documentation of the
|
||||
:class:`~scrapy.http.Request` class for more information.
|
||||
:class:`~scrapy.Request` class for more information.
|
||||
|
||||
- When the domain of a cookie, either received in the ``Set-Cookie`` header
|
||||
of a response or defined in a :class:`~scrapy.http.Request` object, is set
|
||||
of a response or defined in a :class:`~scrapy.Request` object, is set
|
||||
to a `public suffix <https://publicsuffix.org/>`_, the cookie is now
|
||||
ignored unless the cookie domain is the same as the request domain.
|
||||
|
||||
@ -1849,7 +1849,7 @@ Backward-incompatible changes
|
||||
meet expectations, :exc:`TypeError` is now raised at startup time. Before,
|
||||
other exceptions would be raised at run time. (:issue:`3559`)
|
||||
|
||||
- The ``_encoding`` field of serialized :class:`~scrapy.http.Request` objects
|
||||
- The ``_encoding`` field of serialized :class:`~scrapy.Request` objects
|
||||
is now named ``encoding``, in line with all other fields (:issue:`5130`)
|
||||
|
||||
|
||||
@ -1879,7 +1879,7 @@ Deprecations
|
||||
- :mod:`scrapy.utils.reqser` is deprecated. (:issue:`5130`)
|
||||
|
||||
- Instead of :func:`~scrapy.utils.reqser.request_to_dict`, use the new
|
||||
:meth:`Request.to_dict <scrapy.http.Request.to_dict>` method.
|
||||
:meth:`.Request.to_dict` method.
|
||||
|
||||
- Instead of :func:`~scrapy.utils.reqser.request_from_dict`, use the new
|
||||
:func:`scrapy.utils.request.request_from_dict` function.
|
||||
@ -1984,9 +1984,9 @@ New features
|
||||
using ``queuelib`` 1.6.1 or later), the ``peek`` method raises
|
||||
:exc:`NotImplementedError`.
|
||||
|
||||
- :class:`~scrapy.http.Request` and :class:`~scrapy.http.Response` now have
|
||||
- :class:`~scrapy.Request` and :class:`~scrapy.http.Response` now have
|
||||
an ``attributes`` attribute that makes subclassing easier. For
|
||||
:class:`~scrapy.http.Request`, it also allows subclasses to work with
|
||||
:class:`~scrapy.Request`, it also allows subclasses to work with
|
||||
:func:`scrapy.utils.request.request_from_dict`. (:issue:`1877`,
|
||||
:issue:`5130`, :issue:`5218`)
|
||||
|
||||
@ -2452,14 +2452,13 @@ Backward-incompatible changes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` once again
|
||||
discards cookies defined in :attr:`Request.headers
|
||||
<scrapy.http.Request.headers>`.
|
||||
discards cookies defined in :attr:`.Request.headers`.
|
||||
|
||||
We decided to revert this bug fix, introduced in Scrapy 2.2.0, because it
|
||||
was reported that the current implementation could break existing code.
|
||||
|
||||
If you need to set cookies for a request, use the :class:`Request.cookies
|
||||
<scrapy.http.Request>` parameter.
|
||||
<scrapy.Request>` parameter.
|
||||
|
||||
A future version of Scrapy will include a new, better implementation of the
|
||||
reverted bug fix.
|
||||
@ -2580,16 +2579,16 @@ New features
|
||||
:meth:`~scrapy.downloadermiddlewares.DownloaderMiddleware.process_response`
|
||||
or
|
||||
:meth:`~scrapy.downloadermiddlewares.DownloaderMiddleware.process_exception`
|
||||
with a custom :class:`~scrapy.http.Request` object assigned to
|
||||
with a custom :class:`~scrapy.Request` object assigned to
|
||||
:class:`response.request <scrapy.http.Response.request>`:
|
||||
|
||||
- The response is handled by the callback of that custom
|
||||
:class:`~scrapy.http.Request` object, instead of being handled by the
|
||||
callback of the original :class:`~scrapy.http.Request` object
|
||||
:class:`~scrapy.Request` object, instead of being handled by the
|
||||
callback of the original :class:`~scrapy.Request` object
|
||||
|
||||
- That custom :class:`~scrapy.http.Request` object is now sent as the
|
||||
- That custom :class:`~scrapy.Request` object is now sent as the
|
||||
``request`` argument to the :signal:`response_received` signal, instead
|
||||
of the original :class:`~scrapy.http.Request` object
|
||||
of the original :class:`~scrapy.Request` object
|
||||
|
||||
(:issue:`4529`, :issue:`4632`)
|
||||
|
||||
@ -2760,7 +2759,7 @@ New features
|
||||
* The :command:`parse` command now allows specifying an output file
|
||||
(:issue:`4317`, :issue:`4377`)
|
||||
|
||||
* :meth:`Request.from_curl <scrapy.http.Request.from_curl>` and
|
||||
* :meth:`.Request.from_curl` and
|
||||
:func:`~scrapy.utils.curl.curl_to_request_kwargs` now also support
|
||||
``--data-raw`` (:issue:`4612`)
|
||||
|
||||
@ -2776,7 +2775,7 @@ Bug fixes
|
||||
:ref:`dataclass items <dataclass-items>` and :ref:`attr.s items
|
||||
<attrs-items>` (:issue:`4667`, :issue:`4668`)
|
||||
|
||||
* :meth:`Request.from_curl <scrapy.http.Request.from_curl>` and
|
||||
* :meth:`.Request.from_curl` and
|
||||
:func:`~scrapy.utils.curl.curl_to_request_kwargs` now set the request
|
||||
method to ``POST`` when a request body is specified and no request method
|
||||
is specified (:issue:`4612`)
|
||||
@ -2861,8 +2860,7 @@ Backward-incompatible changes
|
||||
Deprecations
|
||||
~~~~~~~~~~~~
|
||||
|
||||
* :meth:`TextResponse.body_as_unicode
|
||||
<scrapy.http.TextResponse.body_as_unicode>` is now deprecated, use
|
||||
* ``TextResponse.body_as_unicode()`` is now deprecated, use
|
||||
:attr:`TextResponse.text <scrapy.http.TextResponse.text>` instead
|
||||
(:issue:`4546`, :issue:`4555`, :issue:`4579`)
|
||||
|
||||
@ -2901,9 +2899,8 @@ New features
|
||||
|
||||
* :ref:`Link extractors <topics-link-extractors>` are now serializable,
|
||||
as long as you do not use :ref:`lambdas <lambda>` for parameters; for
|
||||
example, you can now pass link extractors in :attr:`Request.cb_kwargs
|
||||
<scrapy.http.Request.cb_kwargs>` or
|
||||
:attr:`Request.meta <scrapy.http.Request.meta>` when :ref:`persisting
|
||||
example, you can now pass link extractors in :attr:`.Request.cb_kwargs`
|
||||
or :attr:`.Request.meta` when :ref:`persisting
|
||||
scheduled requests <topics-jobs>` (:issue:`4554`)
|
||||
|
||||
* Upgraded the :ref:`pickle protocol <pickle-protocols>` that Scrapy uses
|
||||
@ -2922,11 +2919,11 @@ Bug fixes
|
||||
|
||||
* :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` no longer
|
||||
discards cookies defined in :attr:`Request.headers
|
||||
<scrapy.http.Request.headers>` (:issue:`1992`, :issue:`2400`)
|
||||
<scrapy.Request.headers>` (:issue:`1992`, :issue:`2400`)
|
||||
|
||||
* :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` no longer
|
||||
re-encodes cookies defined as :class:`bytes` in the ``cookies`` parameter
|
||||
of the ``__init__`` method of :class:`~scrapy.http.Request`
|
||||
of the ``__init__`` method of :class:`~scrapy.Request`
|
||||
(:issue:`2400`, :issue:`3575`)
|
||||
|
||||
* When :setting:`FEEDS` defines multiple URIs, :setting:`FEED_STORE_EMPTY` is
|
||||
@ -2935,7 +2932,7 @@ Bug fixes
|
||||
|
||||
* :class:`~scrapy.spiders.Spider` callbacks defined using :doc:`coroutine
|
||||
syntax <topics/coroutines>` no longer need to return an iterable, and may
|
||||
instead return a :class:`~scrapy.http.Request` object, an
|
||||
instead return a :class:`~scrapy.Request` object, an
|
||||
:ref:`item <topics-items>`, or ``None`` (:issue:`4609`)
|
||||
|
||||
* The :command:`startproject` command now ensures that the generated project
|
||||
@ -2976,8 +2973,8 @@ Documentation
|
||||
:issue:`4587`)
|
||||
|
||||
* The display-on-hover behavior of internal documentation references now also
|
||||
covers links to :ref:`commands <topics-commands>`, :attr:`Request.meta
|
||||
<scrapy.http.Request.meta>` keys, :ref:`settings <topics-settings>` and
|
||||
covers links to :ref:`commands <topics-commands>`, :attr:`.Request.meta`
|
||||
keys, :ref:`settings <topics-settings>` and
|
||||
:ref:`signals <topics-signals>` (:issue:`4495`, :issue:`4563`)
|
||||
|
||||
* It is again possible to download the documentation for offline reading
|
||||
@ -3262,7 +3259,7 @@ Deprecation removals
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* The :ref:`Scrapy shell <topics-shell>` no longer provides a `sel` proxy
|
||||
object, use :meth:`response.selector <scrapy.http.Response.selector>`
|
||||
object, use :meth:`response.selector <scrapy.http.TextResponse.selector>`
|
||||
instead (:issue:`4347`)
|
||||
|
||||
* LevelDB support has been removed (:issue:`4112`)
|
||||
@ -3332,10 +3329,10 @@ New features
|
||||
|
||||
* The new :attr:`Response.cb_kwargs <scrapy.http.Response.cb_kwargs>`
|
||||
attribute serves as a shortcut for :attr:`Response.request.cb_kwargs
|
||||
<scrapy.http.Request.cb_kwargs>` (:issue:`4331`)
|
||||
<scrapy.Request.cb_kwargs>` (:issue:`4331`)
|
||||
|
||||
* :meth:`Response.follow <scrapy.http.Response.follow>` now supports a
|
||||
``flags`` parameter, for consistency with :class:`~scrapy.http.Request`
|
||||
``flags`` parameter, for consistency with :class:`~scrapy.Request`
|
||||
(:issue:`4277`, :issue:`4279`)
|
||||
|
||||
* :ref:`Item loader processors <topics-loaders-processors>` can now be
|
||||
@ -3344,7 +3341,7 @@ New features
|
||||
* :class:`~scrapy.spiders.Rule` now accepts an ``errback`` parameter
|
||||
(:issue:`4000`)
|
||||
|
||||
* :class:`~scrapy.http.Request` no longer requires a ``callback`` parameter
|
||||
* :class:`~scrapy.Request` no longer requires a ``callback`` parameter
|
||||
when an ``errback`` parameter is specified (:issue:`3586`, :issue:`4008`)
|
||||
|
||||
* :class:`~scrapy.logformatter.LogFormatter` now supports some additional
|
||||
@ -3416,7 +3413,7 @@ Bug fixes
|
||||
* Redirects to URLs starting with 3 slashes (``///``) are now supported
|
||||
(:issue:`4032`, :issue:`4042`)
|
||||
|
||||
* :class:`~scrapy.http.Request` no longer accepts strings as ``url`` simply
|
||||
* :class:`~scrapy.Request` no longer accepts strings as ``url`` simply
|
||||
because they have a colon (:issue:`2552`, :issue:`4094`)
|
||||
|
||||
* The correct encoding is now used for attach names in
|
||||
@ -3462,7 +3459,7 @@ Documentation
|
||||
using :class:`~scrapy.crawler.CrawlerProcess` (:issue:`2149`,
|
||||
:issue:`2352`, :issue:`3146`, :issue:`3960`)
|
||||
|
||||
* Clarified the requirements for :class:`~scrapy.http.Request` objects
|
||||
* Clarified the requirements for :class:`~scrapy.Request` objects
|
||||
:ref:`when using persistence <request-serialization>` (:issue:`4124`,
|
||||
:issue:`4139`)
|
||||
|
||||
@ -3731,17 +3728,17 @@ Scrapy 1.8.2 (2022-03-01)
|
||||
|
||||
**Security bug fixes:**
|
||||
|
||||
- When a :class:`~scrapy.http.Request` object with cookies defined gets a
|
||||
redirect response causing a new :class:`~scrapy.http.Request` object to be
|
||||
- When a :class:`~scrapy.Request` object with cookies defined gets a
|
||||
redirect response causing a new :class:`~scrapy.Request` object to be
|
||||
scheduled, the cookies defined in the original
|
||||
:class:`~scrapy.http.Request` object are no longer copied into the new
|
||||
:class:`~scrapy.http.Request` object.
|
||||
:class:`~scrapy.Request` object are no longer copied into the new
|
||||
:class:`~scrapy.Request` object.
|
||||
|
||||
If you manually set the ``Cookie`` header on a
|
||||
:class:`~scrapy.http.Request` object and the domain name of the redirect
|
||||
:class:`~scrapy.Request` object and the domain name of the redirect
|
||||
URL is not an exact match for the domain of the URL of the original
|
||||
:class:`~scrapy.http.Request` object, your ``Cookie`` header is now dropped
|
||||
from the new :class:`~scrapy.http.Request` object.
|
||||
:class:`~scrapy.Request` object, your ``Cookie`` header is now dropped
|
||||
from the new :class:`~scrapy.Request` object.
|
||||
|
||||
The old behavior could be exploited by an attacker to gain access to your
|
||||
cookies. Please, see the `cjvr-mfj7-j4j8 security advisory`_ for more
|
||||
@ -3754,10 +3751,10 @@ Scrapy 1.8.2 (2022-03-01)
|
||||
``example.com`` and any subdomain) by defining the shared domain
|
||||
suffix (e.g. ``example.com``) as the cookie domain when defining
|
||||
your cookies. See the documentation of the
|
||||
:class:`~scrapy.http.Request` class for more information.
|
||||
:class:`~scrapy.Request` class for more information.
|
||||
|
||||
- When the domain of a cookie, either received in the ``Set-Cookie`` header
|
||||
of a response or defined in a :class:`~scrapy.http.Request` object, is set
|
||||
of a response or defined in a :class:`~scrapy.Request` object, is set
|
||||
to a `public suffix <https://publicsuffix.org/>`_, the cookie is now
|
||||
ignored unless the cookie domain is the same as the request domain.
|
||||
|
||||
@ -3815,7 +3812,7 @@ Highlights:
|
||||
|
||||
* Dropped Python 3.4 support and updated minimum requirements; made Python 3.8
|
||||
support official
|
||||
* New :meth:`Request.from_curl <scrapy.http.Request.from_curl>` class method
|
||||
* New :meth:`.Request.from_curl` class method
|
||||
* New :setting:`ROBOTSTXT_PARSER` and :setting:`ROBOTSTXT_USER_AGENT` settings
|
||||
* New :setting:`DOWNLOADER_CLIENT_TLS_CIPHERS` and
|
||||
:setting:`DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING` settings
|
||||
@ -3869,7 +3866,7 @@ See also :ref:`1.8-deprecation-removals` below.
|
||||
New features
|
||||
~~~~~~~~~~~~
|
||||
|
||||
* A new :meth:`Request.from_curl <scrapy.http.Request.from_curl>` class
|
||||
* A new :meth:`Request.from_curl <scrapy.Request.from_curl>` class
|
||||
method allows :ref:`creating a request from a cURL command
|
||||
<requests-from-curl>` (:issue:`2985`, :issue:`3862`)
|
||||
|
||||
@ -3898,9 +3895,8 @@ New features
|
||||
``True`` to enable debug-level messages about TLS connection parameters
|
||||
after establishing HTTPS connections (:issue:`2111`, :issue:`3450`)
|
||||
|
||||
* Callbacks that receive keyword arguments
|
||||
(see :attr:`Request.cb_kwargs <scrapy.http.Request.cb_kwargs>`) can now be
|
||||
tested using the new :class:`@cb_kwargs
|
||||
* Callbacks that receive keyword arguments (see :attr:`.Request.cb_kwargs`)
|
||||
can now be tested using the new :class:`@cb_kwargs
|
||||
<scrapy.contracts.default.CallbackKeywordArgumentsContract>`
|
||||
:ref:`spider contract <topics-contracts>` (:issue:`3985`, :issue:`3988`)
|
||||
|
||||
@ -4089,7 +4085,7 @@ Backward-incompatible changes
|
||||
|
||||
* Non-default values for the :setting:`SCHEDULER_PRIORITY_QUEUE` setting
|
||||
may stop working. Scheduler priority queue classes now need to handle
|
||||
:class:`~scrapy.http.Request` objects instead of arbitrary Python data
|
||||
:class:`~scrapy.Request` objects instead of arbitrary Python data
|
||||
structures.
|
||||
|
||||
* An additional ``crawler`` parameter has been added to the ``__init__``
|
||||
@ -4111,7 +4107,7 @@ New features
|
||||
scheduling improvement on crawls targeting multiple web domains, at the
|
||||
cost of no :setting:`CONCURRENT_REQUESTS_PER_IP` support (:issue:`3520`)
|
||||
|
||||
* A new :attr:`Request.cb_kwargs <scrapy.http.Request.cb_kwargs>` attribute
|
||||
* A new :attr:`.Request.cb_kwargs` attribute
|
||||
provides a cleaner way to pass keyword arguments to callback methods
|
||||
(:issue:`1138`, :issue:`3563`)
|
||||
|
||||
@ -4192,7 +4188,7 @@ Bug fixes
|
||||
* Requests with private callbacks are now correctly unserialized from disk
|
||||
(:issue:`3790`)
|
||||
|
||||
* :meth:`FormRequest.from_response() <scrapy.http.FormRequest.from_response>`
|
||||
* :meth:`.FormRequest.from_response`
|
||||
now handles invalid methods like major web browsers (:issue:`3777`,
|
||||
:issue:`3794`)
|
||||
|
||||
@ -4272,13 +4268,13 @@ The following deprecated APIs have been removed (:issue:`3578`):
|
||||
|
||||
* From both ``scrapy.selector`` and ``scrapy.selector.lxmlsel``:
|
||||
|
||||
* ``HtmlXPathSelector`` (use :class:`~scrapy.selector.Selector`)
|
||||
* ``HtmlXPathSelector`` (use :class:`~scrapy.Selector`)
|
||||
|
||||
* ``XmlXPathSelector`` (use :class:`~scrapy.selector.Selector`)
|
||||
* ``XmlXPathSelector`` (use :class:`~scrapy.Selector`)
|
||||
|
||||
* ``XPathSelector`` (use :class:`~scrapy.selector.Selector`)
|
||||
* ``XPathSelector`` (use :class:`~scrapy.Selector`)
|
||||
|
||||
* ``XPathSelectorList`` (use :class:`~scrapy.selector.Selector`)
|
||||
* ``XPathSelectorList`` (use :class:`~scrapy.Selector`)
|
||||
|
||||
* From ``scrapy.selector.csstranslator``:
|
||||
|
||||
@ -4288,7 +4284,7 @@ The following deprecated APIs have been removed (:issue:`3578`):
|
||||
|
||||
* ``ScrapyXPathExpr`` (use parsel.csstranslator.XPathExpr_)
|
||||
|
||||
* From :class:`~scrapy.selector.Selector`:
|
||||
* From :class:`~scrapy.Selector`:
|
||||
|
||||
* ``_root`` (both the ``__init__`` method argument and the object property, use
|
||||
``root``)
|
||||
@ -4818,7 +4814,7 @@ New Features
|
||||
(:issue:`2535`)
|
||||
- New :ref:`response.follow <response-follow-example>` shortcut
|
||||
for creating requests (:issue:`1940`)
|
||||
- Added ``flags`` argument and attribute to :class:`Request <scrapy.http.Request>`
|
||||
- Added ``flags`` argument and attribute to :class:`~scrapy.Request`
|
||||
objects (:issue:`2047`)
|
||||
- Support Anonymous FTP (:issue:`2342`)
|
||||
- Added ``retry/count``, ``retry/max_reached`` and ``retry/reason_count/<reason>``
|
||||
@ -4860,7 +4856,7 @@ Bug fixes
|
||||
- LinkExtractor now strips leading and trailing whitespaces from attributes
|
||||
(:issue:`2547`, fixes :issue:`1614`)
|
||||
- Properly handle whitespaces in action attribute in
|
||||
:class:`~scrapy.http.FormRequest` (:issue:`2548`)
|
||||
:class:`~scrapy.FormRequest` (:issue:`2548`)
|
||||
- Buffer CONNECT response bytes from proxy until all HTTP headers are received
|
||||
(:issue:`2495`, fixes :issue:`2491`)
|
||||
- FTP downloader now works on Python 3, provided you use Twisted>=17.1
|
||||
@ -4902,8 +4898,7 @@ Documentation
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
- Binary mode is required for exporters (:issue:`2564`, fixes :issue:`2553`)
|
||||
- Mention issue with :meth:`FormRequest.from_response
|
||||
<scrapy.http.FormRequest.from_response>` due to bug in lxml (:issue:`2572`)
|
||||
- Mention issue with :meth:`.FormRequest.from_response` due to bug in lxml (:issue:`2572`)
|
||||
- Use single quotes uniformly in templates (:issue:`2596`)
|
||||
- Document :reqmeta:`ftp_user` and :reqmeta:`ftp_password` meta keys (:issue:`2587`)
|
||||
- Removed section on deprecated ``contrib/`` (:issue:`2636`)
|
||||
@ -5442,7 +5437,7 @@ Bugfixes
|
||||
- Support empty password for http_proxy config (:issue:`1274`).
|
||||
- Interpret ``application/x-json`` as ``TextResponse`` (:issue:`1333`).
|
||||
- Support link rel attribute with multiple values (:issue:`1201`).
|
||||
- Fixed ``scrapy.http.FormRequest.from_response`` when there is a ``<base>``
|
||||
- Fixed ``scrapy.FormRequest.from_response`` when there is a ``<base>``
|
||||
tag (:issue:`1564`).
|
||||
- Fixed :setting:`TEMPLATES_DIR` handling (:issue:`1575`).
|
||||
- Various ``FormRequest`` fixes (:issue:`1595`, :issue:`1596`, :issue:`1597`).
|
||||
@ -6369,7 +6364,7 @@ Scrapy 0.18.0 (released 2013-08-09)
|
||||
- Moved persistent (on disk) queues to a separate project (queuelib_) which Scrapy now depends on
|
||||
- Add Scrapy commands using external libraries (:issue:`260`)
|
||||
- Added ``--pdb`` option to ``scrapy`` command line tool
|
||||
- Added :meth:`XPathSelector.remove_namespaces <scrapy.selector.Selector.remove_namespaces>` which allows to remove all namespaces from XML documents for convenience (to work with namespace-less XPaths). Documented in :ref:`topics-selectors`.
|
||||
- Added :meth:`XPathSelector.remove_namespaces <scrapy.Selector.remove_namespaces>` which allows to remove all namespaces from XML documents for convenience (to work with namespace-less XPaths). Documented in :ref:`topics-selectors`.
|
||||
- Several improvements to spider contracts
|
||||
- New default middleware named MetaRefreshMiddleware that handles meta-refresh html tag redirections,
|
||||
- MetaRefreshMiddleware and RedirectMiddleware have different priorities to address #62
|
||||
|
@ -80,7 +80,7 @@ object gives you access, for example, to the :ref:`settings <topics-settings>`.
|
||||
middleware.
|
||||
|
||||
:meth:`process_request` should either: return ``None``, return a
|
||||
:class:`~scrapy.Response` object, return a :class:`~scrapy.http.Request`
|
||||
:class:`~scrapy.http.Response` object, return a :class:`~scrapy.Request`
|
||||
object, or raise :exc:`~scrapy.exceptions.IgnoreRequest`.
|
||||
|
||||
If it returns ``None``, Scrapy will continue processing this request, executing all
|
||||
|
@ -117,7 +117,8 @@ data from it depends on the type of response:
|
||||
- If the response is HTML, XML or JSON, use :ref:`selectors
|
||||
<topics-selectors>` as usual.
|
||||
|
||||
- If the response is JSON, use :func:`response.json()` to load the desired data:
|
||||
- If the response is JSON, use :func:`response.json()
|
||||
<scrapy.http.TextResponse.json>` to load the desired data:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@ -143,7 +144,7 @@ data from it depends on the type of response:
|
||||
|
||||
- If the response is an image or another format based on images (e.g. PDF),
|
||||
read the response as bytes from
|
||||
:attr:`response.body <scrapy.http.TextResponse.body>` and use an OCR
|
||||
:attr:`response.body <scrapy.http.Response.body>` and use an OCR
|
||||
solution to extract the desired data as text.
|
||||
|
||||
For example, you can use pytesseract_. To read a table from a PDF,
|
||||
|
@ -105,7 +105,7 @@ response:
|
||||
In both cases, the response could have its body truncated: the body contains
|
||||
all bytes received up until the exception is raised, including the bytes
|
||||
received in the signal handler that raises the exception. Also, the response
|
||||
object is marked with ``"download_stopped"`` in its :attr:`Response.flags`
|
||||
object is marked with ``"download_stopped"`` in its :attr:`~scrapy.http.Response.flags`
|
||||
attribute.
|
||||
|
||||
.. note:: ``fail`` is a keyword-only parameter, i.e. raising
|
||||
|
@ -7,15 +7,15 @@ Requests and Responses
|
||||
.. module:: scrapy.http
|
||||
:synopsis: Request and Response classes
|
||||
|
||||
Scrapy uses :class:`Request` and :class:`Response` objects for crawling web
|
||||
Scrapy uses :class:`~scrapy.Request` and :class:`Response` objects for crawling web
|
||||
sites.
|
||||
|
||||
Typically, :class:`Request` objects are generated in the spiders and pass
|
||||
Typically, :class:`~scrapy.Request` objects are generated in the spiders and pass
|
||||
across the system until they reach the Downloader, which executes the request
|
||||
and returns a :class:`Response` object which travels back to the spider that
|
||||
issued the request.
|
||||
|
||||
Both :class:`Request` and :class:`Response` classes have subclasses which add
|
||||
Both :class:`~scrapy.Request` and :class:`Response` classes have subclasses which add
|
||||
functionality not required in the base classes. These are described
|
||||
below in :ref:`topics-request-response-ref-request-subclasses` and
|
||||
:ref:`topics-request-response-ref-response-subclasses`.
|
||||
@ -24,7 +24,7 @@ below in :ref:`topics-request-response-ref-request-subclasses` and
|
||||
Request objects
|
||||
===============
|
||||
|
||||
.. autoclass:: Request
|
||||
.. autoclass:: scrapy.Request
|
||||
|
||||
:param url: the URL of this request
|
||||
|
||||
@ -52,7 +52,7 @@ Request objects
|
||||
:param method: the HTTP method of this request. Defaults to ``'GET'``.
|
||||
:type method: str
|
||||
|
||||
:param meta: the initial values for the :attr:`Request.meta` attribute. If
|
||||
:param meta: the initial values for the :attr:`.Request.meta` attribute. If
|
||||
given, the dict passed in this parameter will be shallow copied.
|
||||
:type meta: dict
|
||||
|
||||
@ -67,10 +67,10 @@ Request objects
|
||||
(for single valued headers) or lists (for multi-valued headers). If
|
||||
``None`` is passed as value, the HTTP header will not be sent at all.
|
||||
|
||||
.. caution:: Cookies set via the ``Cookie`` header are not considered by the
|
||||
:ref:`cookies-mw`. If you need to set cookies for a request, use the
|
||||
:class:`Request.cookies <scrapy.Request>` parameter. This is a known
|
||||
current limitation that is being worked on.
|
||||
.. caution:: Cookies set via the ``Cookie`` header are not considered by the
|
||||
:ref:`cookies-mw`. If you need to set cookies for a request, use the
|
||||
``cookies`` argument. This is a known current limitation that is being
|
||||
worked on.
|
||||
|
||||
:type headers: dict
|
||||
|
||||
@ -124,7 +124,7 @@ Request objects
|
||||
|
||||
.. caution:: Cookies set via the ``Cookie`` header are not considered by the
|
||||
:ref:`cookies-mw`. If you need to set cookies for a request, use the
|
||||
:class:`Request.cookies <scrapy.Request>` parameter. This is a known
|
||||
:class:`scrapy.Request.cookies <scrapy.Request>` parameter. This is a known
|
||||
current limitation that is being worked on.
|
||||
|
||||
.. versionadded:: 2.6.0
|
||||
@ -172,7 +172,7 @@ Request objects
|
||||
|
||||
A string containing the URL of this request. Keep in mind that this
|
||||
attribute contains the escaped URL, so it can differ from the URL passed in
|
||||
the ``__init__`` method.
|
||||
the ``__init__()`` method.
|
||||
|
||||
This attribute is read-only. To change the URL of a Request use
|
||||
:meth:`replace`.
|
||||
@ -184,7 +184,8 @@ Request objects
|
||||
|
||||
.. attribute:: Request.headers
|
||||
|
||||
A dictionary-like object which contains the request headers.
|
||||
A dictionary-like (:class:`scrapy.http.headers.Headers`) object which contains
|
||||
the request headers.
|
||||
|
||||
.. attribute:: Request.body
|
||||
|
||||
@ -240,8 +241,8 @@ Request objects
|
||||
|
||||
A dictionary that contains arbitrary metadata for this request. Its contents
|
||||
will be passed to the Request's callback as keyword arguments. It is empty
|
||||
for new Requests, which means by default callbacks only get a :class:`Response`
|
||||
object as argument.
|
||||
for new Requests, which means by default callbacks only get a
|
||||
:class:`~scrapy.http.Response` object as argument.
|
||||
|
||||
This dict is :doc:`shallow copied <library/copy>` when the request is
|
||||
cloned using the ``copy()`` or ``replace()`` methods, and can also be
|
||||
@ -262,7 +263,7 @@ Request objects
|
||||
|
||||
Return a Request object with the same members, except for those members
|
||||
given new values by whichever keyword arguments are specified. The
|
||||
:attr:`Request.cb_kwargs` and :attr:`Request.meta` attributes are shallow
|
||||
:attr:`~scrapy.Request.cb_kwargs` and :attr:`~scrapy.Request.meta` attributes are shallow
|
||||
copied by default (unless new values are given as arguments). See also
|
||||
:ref:`topics-request-response-ref-request-callback-arguments`.
|
||||
|
||||
@ -305,7 +306,7 @@ Example:
|
||||
In some cases you may be interested in passing arguments to those callback
|
||||
functions so you can receive the arguments later, in the second callback.
|
||||
The following example shows how to achieve this by using the
|
||||
:attr:`Request.cb_kwargs` attribute:
|
||||
:attr:`.Request.cb_kwargs` attribute:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@ -326,10 +327,10 @@ The following example shows how to achieve this by using the
|
||||
foo=foo,
|
||||
)
|
||||
|
||||
.. caution:: :attr:`Request.cb_kwargs` was introduced in version ``1.7``.
|
||||
Prior to that, using :attr:`Request.meta` was recommended for passing
|
||||
information around callbacks. After ``1.7``, :attr:`Request.cb_kwargs`
|
||||
became the preferred way for handling user information, leaving :attr:`Request.meta`
|
||||
.. caution:: :attr:`.Request.cb_kwargs` was introduced in version ``1.7``.
|
||||
Prior to that, using :attr:`.Request.meta` was recommended for passing
|
||||
information around callbacks. After ``1.7``, :attr:`.Request.cb_kwargs`
|
||||
became the preferred way for handling user information, leaving :attr:`.Request.meta`
|
||||
for communication with components like middlewares and extensions.
|
||||
|
||||
.. _topics-request-response-ref-errbacks:
|
||||
@ -441,7 +442,7 @@ Request fingerprints
|
||||
There are some aspects of scraping, such as filtering out duplicate requests
|
||||
(see :setting:`DUPEFILTER_CLASS`) or caching responses (see
|
||||
:setting:`HTTPCACHE_POLICY`), where you need the ability to generate a short,
|
||||
unique identifier from a :class:`~scrapy.http.Request` object: a request
|
||||
unique identifier from a :class:`~scrapy.Request` object: a request
|
||||
fingerprint.
|
||||
|
||||
You often do not need to worry about request fingerprints, the default request
|
||||
@ -486,7 +487,7 @@ A request fingerprinter is a class that must implement the following method:
|
||||
See also :ref:`request-fingerprint-restrictions`.
|
||||
|
||||
:param request: request to fingerprint
|
||||
:type request: scrapy.http.Request
|
||||
:type request: scrapy.Request
|
||||
|
||||
Additionally, it may also implement the following method:
|
||||
|
||||
@ -566,7 +567,7 @@ URL canonicalization or taking the request method or body into account:
|
||||
|
||||
If you need to be able to override the request fingerprinting for arbitrary
|
||||
requests from your spider callbacks, you may implement a request fingerprinter
|
||||
that reads fingerprints from :attr:`request.meta <scrapy.http.Request.meta>`
|
||||
that reads fingerprints from :attr:`request.meta <scrapy.Request.meta>`
|
||||
when available, and then falls back to
|
||||
:func:`scrapy.utils.request.fingerprint`. For example:
|
||||
|
||||
@ -581,10 +582,8 @@ when available, and then falls back to
|
||||
return request.meta["fingerprint"]
|
||||
return fingerprint(request)
|
||||
|
||||
If you need to reproduce the same fingerprinting algorithm as Scrapy 2.6
|
||||
without using the deprecated ``'2.6'`` value of the
|
||||
:setting:`REQUEST_FINGERPRINTER_IMPLEMENTATION` setting, use the following
|
||||
request fingerprinter:
|
||||
If you need to reproduce the same fingerprinting algorithm as Scrapy 2.6, use
|
||||
the following request fingerprinter:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@ -628,7 +627,7 @@ The following built-in Scrapy components have such restrictions:
|
||||
:setting:`HTTPCACHE_DIR` also apply. Inside :setting:`HTTPCACHE_DIR`,
|
||||
the following directory structure is created:
|
||||
|
||||
- :attr:`Spider.name <scrapy.spiders.Spider.name>`
|
||||
- :attr:`.Spider.name`
|
||||
|
||||
- first byte of a request fingerprint as hexadecimal
|
||||
|
||||
@ -656,7 +655,7 @@ The following built-in Scrapy components have such restrictions:
|
||||
Request.meta special keys
|
||||
=========================
|
||||
|
||||
The :attr:`Request.meta` attribute can contain any arbitrary data, but there
|
||||
The :attr:`.Request.meta` attribute can contain any arbitrary data, but there
|
||||
are some special keys recognized by Scrapy and its built-in extensions.
|
||||
|
||||
Those are:
|
||||
@ -780,24 +779,25 @@ call their callback instead, like in this example, pass ``fail=False`` to the
|
||||
Request subclasses
|
||||
==================
|
||||
|
||||
Here is the list of built-in :class:`Request` subclasses. You can also subclass
|
||||
Here is the list of built-in :class:`~scrapy.Request` subclasses. You can also subclass
|
||||
it to implement your own custom functionality.
|
||||
|
||||
FormRequest objects
|
||||
-------------------
|
||||
|
||||
The FormRequest class extends the base :class:`Request` with functionality for
|
||||
The FormRequest class extends the base :class:`~scrapy.Request` with functionality for
|
||||
dealing with HTML forms. It uses `lxml.html forms`_ to pre-populate form
|
||||
fields with form data from :class:`Response` objects.
|
||||
|
||||
.. _lxml.html forms: https://lxml.de/lxmlhtml.html#forms
|
||||
|
||||
.. class:: scrapy.http.request.form.FormRequest
|
||||
.. class:: scrapy.http.FormRequest
|
||||
.. class:: scrapy.FormRequest(url, [formdata, ...])
|
||||
.. currentmodule:: None
|
||||
|
||||
The :class:`FormRequest` class adds a new keyword parameter to the ``__init__`` method. The
|
||||
remaining arguments are the same as for the :class:`Request` class and are
|
||||
.. class:: scrapy.FormRequest(url, [formdata, ...])
|
||||
:canonical: scrapy.http.request.form.FormRequest
|
||||
|
||||
The :class:`~scrapy.FormRequest` class adds a new keyword parameter to the ``__init__()`` method. The
|
||||
remaining arguments are the same as for the :class:`~scrapy.Request` class and are
|
||||
not documented here.
|
||||
|
||||
:param formdata: is a dictionary (or iterable of (key, value) tuples)
|
||||
@ -805,12 +805,12 @@ fields with form data from :class:`Response` objects.
|
||||
body of the request.
|
||||
:type formdata: dict or collections.abc.Iterable
|
||||
|
||||
The :class:`FormRequest` objects support the following class method in
|
||||
addition to the standard :class:`Request` methods:
|
||||
The :class:`~scrapy.FormRequest` objects support the following class method in
|
||||
addition to the standard :class:`~scrapy.Request` methods:
|
||||
|
||||
.. classmethod:: FormRequest.from_response(response, [formname=None, formid=None, formnumber=0, formdata=None, formxpath=None, formcss=None, clickdata=None, dont_click=False, ...])
|
||||
.. classmethod:: from_response(response, [formname=None, formid=None, formnumber=0, formdata=None, formxpath=None, formcss=None, clickdata=None, dont_click=False, ...])
|
||||
|
||||
Returns a new :class:`FormRequest` object with its form field values
|
||||
Returns a new :class:`~scrapy.FormRequest` object with its form field values
|
||||
pre-populated with those found in the HTML ``<form>`` element contained
|
||||
in the given response. For an example see
|
||||
:ref:`topics-request-response-ref-request-userlogin`.
|
||||
@ -832,7 +832,7 @@ fields with form data from :class:`Response` objects.
|
||||
|
||||
:param response: the response containing a HTML form which will be used
|
||||
to pre-populate the form fields
|
||||
:type response: :class:`Response` object
|
||||
:type response: :class:`~scrapy.http.Response` object
|
||||
|
||||
:param formname: if given, the form with name attribute set to this value will be used.
|
||||
:type formname: str
|
||||
@ -869,7 +869,9 @@ fields with form data from :class:`Response` objects.
|
||||
:type dont_click: bool
|
||||
|
||||
The other parameters of this class method are passed directly to the
|
||||
:class:`FormRequest` ``__init__`` method.
|
||||
:class:`~scrapy.FormRequest` ``__init__()`` method.
|
||||
|
||||
.. currentmodule:: scrapy.http
|
||||
|
||||
Request usage examples
|
||||
----------------------
|
||||
@ -878,7 +880,7 @@ Using FormRequest to send data via HTTP POST
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
If you want to simulate a HTML Form POST in your spider and send a couple of
|
||||
key-value fields, you can return a :class:`FormRequest` object (from your
|
||||
key-value fields, you can return a :class:`~scrapy.FormRequest` object (from your
|
||||
spider) like this:
|
||||
|
||||
.. skip: next
|
||||
@ -901,7 +903,7 @@ It is usual for web sites to provide pre-populated form fields through ``<input
|
||||
type="hidden">`` elements, such as session related data or authentication
|
||||
tokens (for login pages). When scraping, you'll want these fields to be
|
||||
automatically pre-populated and only override a couple of them, such as the
|
||||
user name and password. You can use the :meth:`FormRequest.from_response`
|
||||
user name and password. You can use the :meth:`.FormRequest.from_response()`
|
||||
method for this job. Here's an example spider which uses it:
|
||||
|
||||
.. code-block:: python
|
||||
@ -936,21 +938,22 @@ method for this job. Here's an example spider which uses it:
|
||||
JsonRequest
|
||||
-----------
|
||||
|
||||
The JsonRequest class extends the base :class:`Request` class with functionality for
|
||||
The JsonRequest class extends the base :class:`~scrapy.Request` class with functionality for
|
||||
dealing with JSON requests.
|
||||
|
||||
.. class:: JsonRequest(url, [... data, dumps_kwargs])
|
||||
|
||||
The :class:`JsonRequest` class adds two new keyword parameters to the ``__init__`` method. The
|
||||
remaining arguments are the same as for the :class:`Request` class and are
|
||||
The :class:`JsonRequest` class adds two new keyword parameters to the ``__init__()`` method. The
|
||||
remaining arguments are the same as for the :class:`~scrapy.Request` class and are
|
||||
not documented here.
|
||||
|
||||
Using the :class:`JsonRequest` will set the ``Content-Type`` header to ``application/json``
|
||||
and ``Accept`` header to ``application/json, text/javascript, */*; q=0.01``
|
||||
|
||||
:param data: is any JSON serializable object that needs to be JSON encoded and assigned to body.
|
||||
if :attr:`Request.body` argument is provided this parameter will be ignored.
|
||||
if :attr:`Request.body` argument is not provided and data argument is provided :attr:`Request.method` will be
|
||||
If the :attr:`~scrapy.Request.body` argument is provided this parameter will be ignored.
|
||||
If the :attr:`~scrapy.Request.body` argument is not provided and the
|
||||
``data`` argument is provided the :attr:`~scrapy.Request.method` will be
|
||||
set to ``'POST'`` automatically.
|
||||
:type data: object
|
||||
|
||||
@ -1002,7 +1005,7 @@ Response objects
|
||||
:type flags: list
|
||||
|
||||
:param request: the initial value of the :attr:`Response.request` attribute.
|
||||
This represents the :class:`Request` that generated this response.
|
||||
This represents the :class:`~scrapy.Request` that generated this response.
|
||||
:type request: scrapy.Request
|
||||
|
||||
:param certificate: an object representing the server's SSL certificate.
|
||||
@ -1038,11 +1041,12 @@ Response objects
|
||||
|
||||
.. attribute:: Response.headers
|
||||
|
||||
A dictionary-like object which contains the response headers. Values can
|
||||
be accessed using :meth:`get` to return the first header value with the
|
||||
specified name or :meth:`getlist` to return all header values with the
|
||||
specified name. For example, this call will give you all cookies in the
|
||||
headers::
|
||||
A dictionary-like (:class:`scrapy.http.headers.Headers`) object which contains
|
||||
the response headers. Values can be accessed using
|
||||
:meth:`~scrapy.http.headers.Headers.get` to return the first header value with
|
||||
the specified name or :meth:`~scrapy.http.headers.Headers.getlist` to return
|
||||
all header values with the specified name. For example, this call will give you
|
||||
all cookies in the headers::
|
||||
|
||||
response.headers.getlist('Set-Cookie')
|
||||
|
||||
@ -1058,7 +1062,7 @@ Response objects
|
||||
|
||||
.. attribute:: Response.request
|
||||
|
||||
The :class:`Request` object that generated this response. This attribute is
|
||||
The :class:`~scrapy.Request` object that generated this response. This attribute is
|
||||
assigned in the Scrapy engine, after the response and the request have passed
|
||||
through all :ref:`Downloader Middlewares <topics-downloader-middleware>`.
|
||||
In particular, this means that:
|
||||
@ -1077,34 +1081,33 @@ Response objects
|
||||
|
||||
.. attribute:: Response.meta
|
||||
|
||||
A shortcut to the :attr:`Request.meta` attribute of the
|
||||
A shortcut to the :attr:`~scrapy.Request.meta` attribute of the
|
||||
:attr:`Response.request` object (i.e. ``self.request.meta``).
|
||||
|
||||
Unlike the :attr:`Response.request` attribute, the :attr:`Response.meta`
|
||||
attribute is propagated along redirects and retries, so you will get
|
||||
the original :attr:`Request.meta` sent from your spider.
|
||||
the original :attr:`.Request.meta` sent from your spider.
|
||||
|
||||
.. seealso:: :attr:`Request.meta` attribute
|
||||
.. seealso:: :attr:`.Request.meta` attribute
|
||||
|
||||
.. attribute:: Response.cb_kwargs
|
||||
|
||||
.. versionadded:: 2.0
|
||||
|
||||
A shortcut to the :attr:`Request.cb_kwargs` attribute of the
|
||||
A shortcut to the :attr:`~scrapy.Request.cb_kwargs` attribute of the
|
||||
:attr:`Response.request` object (i.e. ``self.request.cb_kwargs``).
|
||||
|
||||
Unlike the :attr:`Response.request` attribute, the
|
||||
:attr:`Response.cb_kwargs` attribute is propagated along redirects and
|
||||
retries, so you will get the original :attr:`Request.cb_kwargs` sent
|
||||
from your spider.
|
||||
retries, so you will get the original :attr:`.Request.cb_kwargs` sent from your spider.
|
||||
|
||||
.. seealso:: :attr:`Request.cb_kwargs` attribute
|
||||
.. seealso:: :attr:`.Request.cb_kwargs` attribute
|
||||
|
||||
.. attribute:: Response.flags
|
||||
|
||||
A list that contains flags for this response. Flags are labels used for
|
||||
tagging Responses. For example: ``'cached'``, ``'redirected``', etc. And
|
||||
they're shown on the string representation of the Response (`__str__`
|
||||
they're shown on the string representation of the Response (``__str__()``
|
||||
method) which is used by the engine for logging.
|
||||
|
||||
.. attribute:: Response.certificate
|
||||
@ -1181,7 +1184,7 @@ TextResponse objects
|
||||
:class:`Response` class, which is meant to be used only for binary data,
|
||||
such as images, sounds or any media file.
|
||||
|
||||
:class:`TextResponse` objects support a new ``__init__`` method argument, in
|
||||
:class:`TextResponse` objects support a new ``__init__()`` method argument, in
|
||||
addition to the base :class:`Response` objects. The remaining functionality
|
||||
is the same as for the :class:`Response` class and is not documented here.
|
||||
|
||||
@ -1219,7 +1222,7 @@ TextResponse objects
|
||||
A string with the encoding of this response. The encoding is resolved by
|
||||
trying the following mechanisms, in order:
|
||||
|
||||
1. the encoding passed in the ``__init__`` method ``encoding`` argument
|
||||
1. the encoding passed in the ``__init__()`` method ``encoding`` argument
|
||||
|
||||
2. the encoding declared in the Content-Type HTTP header. If this
|
||||
encoding is not valid (i.e. unknown), it is ignored and the next
|
||||
@ -1273,7 +1276,7 @@ TextResponse objects
|
||||
|
||||
Constructs an absolute url by combining the Response's base url with
|
||||
a possible relative url. The base url shall be extracted from the
|
||||
``<base>`` tag, or just the Response's :attr:`url` if there is no such
|
||||
``<base>`` tag, or just :attr:`Response.url` if there is no such
|
||||
tag.
|
||||
|
||||
|
||||
|
@ -777,7 +777,7 @@ Removing namespaces
|
||||
When dealing with scraping projects, it is often quite convenient to get rid of
|
||||
namespaces altogether and just work with element names, to write more
|
||||
simple/convenient XPaths. You can use the
|
||||
:meth:`Selector.remove_namespaces` method for that.
|
||||
:meth:`.Selector.remove_namespaces` method for that.
|
||||
|
||||
Let's show an example that illustrates this with the Python Insider blog atom feed.
|
||||
|
||||
@ -814,7 +814,7 @@ doesn't work (because the Atom XML namespace is obfuscating those nodes):
|
||||
>>> response.xpath("//link")
|
||||
[]
|
||||
|
||||
But once we call the :meth:`Selector.remove_namespaces` method, all
|
||||
But once we call the :meth:`.Selector.remove_namespaces` method, all
|
||||
nodes can be accessed directly by their names:
|
||||
|
||||
.. code-block:: pycon
|
||||
@ -1046,7 +1046,7 @@ Built-in Selectors reference
|
||||
Selector objects
|
||||
----------------
|
||||
|
||||
.. autoclass:: Selector
|
||||
.. autoclass:: scrapy.Selector
|
||||
|
||||
.. automethod:: xpath
|
||||
|
||||
@ -1126,8 +1126,8 @@ Examples
|
||||
Selector examples on HTML response
|
||||
----------------------------------
|
||||
|
||||
Here are some :class:`Selector` examples to illustrate several concepts.
|
||||
In all cases, we assume there is already a :class:`Selector` instantiated with
|
||||
Here are some :class:`~scrapy.Selector` examples to illustrate several concepts.
|
||||
In all cases, we assume there is already a :class:`~scrapy.Selector` instantiated with
|
||||
a :class:`~scrapy.http.HtmlResponse` object like this:
|
||||
|
||||
.. code-block:: python
|
||||
@ -1135,7 +1135,7 @@ a :class:`~scrapy.http.HtmlResponse` object like this:
|
||||
sel = Selector(html_response)
|
||||
|
||||
1. Select all ``<h1>`` elements from an HTML response body, returning a list of
|
||||
:class:`Selector` objects (i.e. a :class:`SelectorList` object):
|
||||
:class:`~scrapy.Selector` objects (i.e. a :class:`SelectorList` object):
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@ -1165,7 +1165,7 @@ Selector examples on XML response
|
||||
|
||||
.. skip: start
|
||||
|
||||
Here are some examples to illustrate concepts for :class:`Selector` objects
|
||||
Here are some examples to illustrate concepts for :class:`~scrapy.Selector` objects
|
||||
instantiated with an :class:`~scrapy.http.XmlResponse` object:
|
||||
|
||||
.. code-block:: python
|
||||
@ -1173,7 +1173,7 @@ instantiated with an :class:`~scrapy.http.XmlResponse` object:
|
||||
sel = Selector(xml_response)
|
||||
|
||||
1. Select all ``<product>`` elements from an XML response body, returning a list
|
||||
of :class:`Selector` objects (i.e. a :class:`SelectorList` object):
|
||||
of :class:`~scrapy.Selector` objects (i.e. a :class:`SelectorList` object):
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
|
@ -115,7 +115,7 @@ class BaseScheduler(metaclass=BaseSchedulerMeta):
|
||||
@abstractmethod
|
||||
def next_request(self) -> Request | None:
|
||||
"""
|
||||
Return the next :class:`~scrapy.http.Request` to be processed, or ``None``
|
||||
Return the next :class:`~scrapy.Request` to be processed, or ``None``
|
||||
to indicate that there are no requests to be considered ready at the moment.
|
||||
|
||||
Returning ``None`` implies that no request from the scheduler will be sent
|
||||
@ -263,7 +263,7 @@ class Scheduler(BaseScheduler):
|
||||
|
||||
def next_request(self) -> Request | None:
|
||||
"""
|
||||
Return a :class:`~scrapy.http.Request` object from the memory queue,
|
||||
Return a :class:`~scrapy.Request` object from the memory queue,
|
||||
falling back to the disk queue if the memory queue is empty.
|
||||
Return ``None`` if there are no more enqueued requests.
|
||||
|
||||
|
@ -59,7 +59,7 @@ RequestTypeVar = TypeVar("RequestTypeVar", bound="Request")
|
||||
|
||||
def NO_CALLBACK(*args: Any, **kwargs: Any) -> NoReturn:
|
||||
"""When assigned to the ``callback`` parameter of
|
||||
:class:`~scrapy.http.Request`, it indicates that the request is not meant
|
||||
:class:`~scrapy.Request`, it indicates that the request is not meant
|
||||
to have a spider callback at all.
|
||||
|
||||
For example:
|
||||
@ -83,7 +83,7 @@ def NO_CALLBACK(*args: Any, **kwargs: Any) -> NoReturn:
|
||||
|
||||
class Request(object_ref):
|
||||
"""Represents an HTTP request, which is usually generated in a Spider and
|
||||
executed by the Downloader, thus generating a :class:`Response`.
|
||||
executed by the Downloader, thus generating a :class:`~scrapy.http.Response`.
|
||||
"""
|
||||
|
||||
attributes: tuple[str, ...] = (
|
||||
@ -103,9 +103,9 @@ class Request(object_ref):
|
||||
)
|
||||
"""A tuple of :class:`str` objects containing the name of all public
|
||||
attributes of the class that are also keyword parameters of the
|
||||
``__init__`` method.
|
||||
``__init__()`` method.
|
||||
|
||||
Currently used by :meth:`Request.replace`, :meth:`Request.to_dict` and
|
||||
Currently used by :meth:`.Request.replace`, :meth:`.Request.to_dict` and
|
||||
:func:`~scrapy.utils.request.request_from_dict`.
|
||||
"""
|
||||
|
||||
@ -233,7 +233,7 @@ class Request(object_ref):
|
||||
finding unknown options call this method by passing
|
||||
``ignore_unknown_options=False``.
|
||||
|
||||
.. caution:: Using :meth:`from_curl` from :class:`~scrapy.http.Request`
|
||||
.. caution:: Using :meth:`from_curl` from :class:`~scrapy.Request`
|
||||
subclasses, such as :class:`~scrapy.http.JsonRequest`, or
|
||||
:class:`~scrapy.http.XmlRpcRequest`, as well as having
|
||||
:ref:`downloader middlewares <topics-downloader-middleware>`
|
||||
@ -244,7 +244,7 @@ class Request(object_ref):
|
||||
:class:`~scrapy.downloadermiddlewares.useragent.UserAgentMiddleware`,
|
||||
or
|
||||
:class:`~scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware`,
|
||||
may modify the :class:`~scrapy.http.Request` object.
|
||||
may modify the :class:`~scrapy.Request` object.
|
||||
|
||||
To translate a cURL command into a Scrapy request,
|
||||
you may use `curl2scrapy <https://michael-shub.github.io/curl2scrapy/>`_.
|
||||
|
@ -51,7 +51,7 @@ class Response(object_ref):
|
||||
)
|
||||
"""A tuple of :class:`str` objects containing the name of all public
|
||||
attributes of the class that are also keyword parameters of the
|
||||
``__init__`` method.
|
||||
``__init__()`` method.
|
||||
|
||||
Currently used by :meth:`Response.replace`.
|
||||
"""
|
||||
@ -199,8 +199,8 @@ class Response(object_ref):
|
||||
) -> Request:
|
||||
"""
|
||||
Return a :class:`~.Request` instance to follow a link ``url``.
|
||||
It accepts the same arguments as ``Request.__init__`` method,
|
||||
but ``url`` can be a relative URL or a ``scrapy.link.Link`` object,
|
||||
It accepts the same arguments as ``Request.__init__()`` method,
|
||||
but ``url`` can be a relative URL or a :class:`~scrapy.link.Link` object,
|
||||
not only an absolute URL.
|
||||
|
||||
:class:`~.TextResponse` provides a :meth:`~.TextResponse.follow`
|
||||
@ -254,7 +254,7 @@ class Response(object_ref):
|
||||
.. versionadded:: 2.0
|
||||
|
||||
Return an iterable of :class:`~.Request` instances to follow all links
|
||||
in ``urls``. It accepts the same arguments as ``Request.__init__`` method,
|
||||
in ``urls``. It accepts the same arguments as ``Request.__init__()`` method,
|
||||
but elements of ``urls`` can be relative URLs or :class:`~scrapy.link.Link` objects,
|
||||
not only absolute URLs.
|
||||
|
||||
|
@ -185,15 +185,15 @@ class TextResponse(Response):
|
||||
) -> Request:
|
||||
"""
|
||||
Return a :class:`~.Request` instance to follow a link ``url``.
|
||||
It accepts the same arguments as ``Request.__init__`` method,
|
||||
It accepts the same arguments as ``Request.__init__()`` method,
|
||||
but ``url`` can be not only an absolute URL, but also
|
||||
|
||||
* a relative URL
|
||||
* a :class:`~scrapy.link.Link` object, e.g. the result of
|
||||
:ref:`topics-link-extractors`
|
||||
* a :class:`~scrapy.selector.Selector` object for a ``<link>`` or ``<a>`` element, e.g.
|
||||
* a :class:`~scrapy.Selector` object for a ``<link>`` or ``<a>`` element, e.g.
|
||||
``response.css('a.my_link')[0]``
|
||||
* an attribute :class:`~scrapy.selector.Selector` (not SelectorList), e.g.
|
||||
* an attribute :class:`~scrapy.Selector` (not SelectorList), e.g.
|
||||
``response.css('a::attr(href)')[0]`` or
|
||||
``response.xpath('//img/@src')[0]``
|
||||
|
||||
@ -241,20 +241,20 @@ class TextResponse(Response):
|
||||
"""
|
||||
A generator that produces :class:`~.Request` instances to follow all
|
||||
links in ``urls``. It accepts the same arguments as the :class:`~.Request`'s
|
||||
``__init__`` method, except that each ``urls`` element does not need to be
|
||||
``__init__()`` method, except that each ``urls`` element does not need to be
|
||||
an absolute URL, it can be any of the following:
|
||||
|
||||
* a relative URL
|
||||
* a :class:`~scrapy.link.Link` object, e.g. the result of
|
||||
:ref:`topics-link-extractors`
|
||||
* a :class:`~scrapy.selector.Selector` object for a ``<link>`` or ``<a>`` element, e.g.
|
||||
* a :class:`~scrapy.Selector` object for a ``<link>`` or ``<a>`` element, e.g.
|
||||
``response.css('a.my_link')[0]``
|
||||
* an attribute :class:`~scrapy.selector.Selector` (not SelectorList), e.g.
|
||||
* an attribute :class:`~scrapy.Selector` (not SelectorList), e.g.
|
||||
``response.css('a::attr(href)')[0]`` or
|
||||
``response.xpath('//img/@src')[0]``
|
||||
|
||||
In addition, ``css`` and ``xpath`` arguments are accepted to perform the link extraction
|
||||
within the ``follow_all`` method (only one of ``urls``, ``css`` and ``xpath`` is accepted).
|
||||
within the ``follow_all()`` method (only one of ``urls``, ``css`` and ``xpath`` is accepted).
|
||||
|
||||
Note that when passing a ``SelectorList`` as argument for the ``urls`` parameter or
|
||||
using the ``css`` or ``xpath`` parameters, this method will not produce requests for
|
||||
|
@ -32,7 +32,7 @@ class ItemLoader(itemloaders.ItemLoader):
|
||||
:param selector: The selector to extract data from, when using the
|
||||
:meth:`add_xpath`, :meth:`add_css`, :meth:`replace_xpath`, or
|
||||
:meth:`replace_css` method.
|
||||
:type selector: :class:`~scrapy.selector.Selector` object
|
||||
:type selector: :class:`~scrapy.Selector` object
|
||||
|
||||
:param response: The response used to construct the selector using the
|
||||
:attr:`default_selector_class`, unless the selector argument is given,
|
||||
@ -79,7 +79,7 @@ class ItemLoader(itemloaders.ItemLoader):
|
||||
|
||||
.. attribute:: selector
|
||||
|
||||
The :class:`~scrapy.selector.Selector` object to extract data from.
|
||||
The :class:`~scrapy.Selector` object to extract data from.
|
||||
It's either the selector given in the ``__init__`` method or one created from
|
||||
the response given in the ``__init__`` method using the
|
||||
:attr:`default_selector_class`. This attribute is meant to be
|
||||
|
@ -1,6 +1,6 @@
|
||||
"""
|
||||
This module provides some useful functions for working with
|
||||
scrapy.http.Request objects
|
||||
scrapy.Request objects
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@ -109,12 +109,10 @@ class RequestFingerprinter:
|
||||
|
||||
It takes into account a canonical version
|
||||
(:func:`w3lib.url.canonicalize_url`) of :attr:`request.url
|
||||
<scrapy.http.Request.url>` and the values of :attr:`request.method
|
||||
<scrapy.http.Request.method>` and :attr:`request.body
|
||||
<scrapy.http.Request.body>`. It then generates an `SHA1
|
||||
<scrapy.Request.url>` and the values of :attr:`request.method
|
||||
<scrapy.Request.method>` and :attr:`request.body
|
||||
<scrapy.Request.body>`. It then generates an `SHA1
|
||||
<https://en.wikipedia.org/wiki/SHA-1>`_ hash.
|
||||
|
||||
.. seealso:: :setting:`REQUEST_FINGERPRINTER_IMPLEMENTATION`.
|
||||
"""
|
||||
|
||||
@classmethod
|
||||
|
Loading…
x
Reference in New Issue
Block a user