1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-06 11:00:46 +00:00

Improve internal refs to scrapy.Request and scrapy.Selector (#6526)

* Improve internal refs to scrapy.Selector.

* Improve internal refs to scrapy.Request.

* More scrapy.http fixes.

* Fix FormRequest refs.

* More fixes.

* Simplifications.

* Last fixes.

* Add the parsel intersphinx.
This commit is contained in:
Andrey Rakhmatullin 2025-01-07 15:18:18 +04:00 committed by GitHub
parent 5d3aa80ad1
commit 59fcb9b93c
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
13 changed files with 176 additions and 178 deletions

View File

@ -284,6 +284,7 @@ intersphinx_mapping = {
"cryptography": ("https://cryptography.io/en/latest/", None),
"cssselect": ("https://cssselect.readthedocs.io/en/latest", None),
"itemloaders": ("https://itemloaders.readthedocs.io/en/latest/", None),
"parsel": ("https://parsel.readthedocs.io/en/latest/", None),
"pytest": ("https://docs.pytest.org/en/latest", None),
"python": ("https://docs.python.org/3", None),
"sphinx": ("https://www.sphinx-doc.org/en/master", None),

View File

@ -635,7 +635,7 @@ Bug fixes
exception if ``default`` is ``None``.
(:issue:`6308`, :issue:`6310`)
- :class:`~scrapy.selector.Selector` now uses
- :class:`~scrapy.Selector` now uses
:func:`scrapy.utils.response.get_base_url` to determine the base URL of a
given :class:`~scrapy.http.Response`. (:issue:`6265`)
@ -653,7 +653,7 @@ Documentation
- Add a FAQ entry about :ref:`creating blank requests <faq-blank-request>`.
(:issue:`6203`, :issue:`6208`)
- Document that :attr:`scrapy.selector.Selector.type` can be ``"json"``.
- Document that :attr:`scrapy.Selector.type` can be ``"json"``.
(:issue:`6328`, :issue:`6334`)
Quality assurance
@ -734,7 +734,7 @@ Documentation
- Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
- Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
- Extended documentation for :attr:`.Request.meta`.
(:issue:`5565`)
- Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
@ -1095,7 +1095,7 @@ New features
:setting:`RANDOMIZE_DOWNLOAD_DELAY` can now be set on a per-domain basis
via the new :setting:`DOWNLOAD_SLOTS` setting. (:issue:`5328`)
- Added :meth:`TextResponse.jmespath`, a shortcut for JMESPath selectors
- Added :meth:`.TextResponse.jmespath`, a shortcut for JMESPath selectors
available since parsel_ 1.8.1. (:issue:`5894`, :issue:`5915`)
- Added :signal:`feed_slot_closed` and :signal:`feed_exporter_closed`
@ -1275,7 +1275,7 @@ New features
avoid confusion.
(:issue:`5717`, :issue:`5722`, :issue:`5727`)
- The ``callback`` parameter of :class:`~scrapy.http.Request` can now be set
- The ``callback`` parameter of :class:`~scrapy.Request` can now be set
to :func:`scrapy.http.request.NO_CALLBACK`, to distinguish it from
``None``, as the latter indicates that the default spider callback
(:meth:`~scrapy.Spider.parse`) is to be used.
@ -1772,17 +1772,17 @@ Highlights:
Security bug fixes
~~~~~~~~~~~~~~~~~~
- When a :class:`~scrapy.http.Request` object with cookies defined gets a
redirect response causing a new :class:`~scrapy.http.Request` object to be
- When a :class:`~scrapy.Request` object with cookies defined gets a
redirect response causing a new :class:`~scrapy.Request` object to be
scheduled, the cookies defined in the original
:class:`~scrapy.http.Request` object are no longer copied into the new
:class:`~scrapy.http.Request` object.
:class:`~scrapy.Request` object are no longer copied into the new
:class:`~scrapy.Request` object.
If you manually set the ``Cookie`` header on a
:class:`~scrapy.http.Request` object and the domain name of the redirect
:class:`~scrapy.Request` object and the domain name of the redirect
URL is not an exact match for the domain of the URL of the original
:class:`~scrapy.http.Request` object, your ``Cookie`` header is now dropped
from the new :class:`~scrapy.http.Request` object.
:class:`~scrapy.Request` object, your ``Cookie`` header is now dropped
from the new :class:`~scrapy.Request` object.
The old behavior could be exploited by an attacker to gain access to your
cookies. Please, see the `cjvr-mfj7-j4j8 security advisory`_ for more
@ -1795,10 +1795,10 @@ Security bug fixes
``example.com`` and any subdomain) by defining the shared domain
suffix (e.g. ``example.com``) as the cookie domain when defining
your cookies. See the documentation of the
:class:`~scrapy.http.Request` class for more information.
:class:`~scrapy.Request` class for more information.
- When the domain of a cookie, either received in the ``Set-Cookie`` header
of a response or defined in a :class:`~scrapy.http.Request` object, is set
of a response or defined in a :class:`~scrapy.Request` object, is set
to a `public suffix <https://publicsuffix.org/>`_, the cookie is now
ignored unless the cookie domain is the same as the request domain.
@ -1849,7 +1849,7 @@ Backward-incompatible changes
meet expectations, :exc:`TypeError` is now raised at startup time. Before,
other exceptions would be raised at run time. (:issue:`3559`)
- The ``_encoding`` field of serialized :class:`~scrapy.http.Request` objects
- The ``_encoding`` field of serialized :class:`~scrapy.Request` objects
is now named ``encoding``, in line with all other fields (:issue:`5130`)
@ -1879,7 +1879,7 @@ Deprecations
- :mod:`scrapy.utils.reqser` is deprecated. (:issue:`5130`)
- Instead of :func:`~scrapy.utils.reqser.request_to_dict`, use the new
:meth:`Request.to_dict <scrapy.http.Request.to_dict>` method.
:meth:`.Request.to_dict` method.
- Instead of :func:`~scrapy.utils.reqser.request_from_dict`, use the new
:func:`scrapy.utils.request.request_from_dict` function.
@ -1984,9 +1984,9 @@ New features
using ``queuelib`` 1.6.1 or later), the ``peek`` method raises
:exc:`NotImplementedError`.
- :class:`~scrapy.http.Request` and :class:`~scrapy.http.Response` now have
- :class:`~scrapy.Request` and :class:`~scrapy.http.Response` now have
an ``attributes`` attribute that makes subclassing easier. For
:class:`~scrapy.http.Request`, it also allows subclasses to work with
:class:`~scrapy.Request`, it also allows subclasses to work with
:func:`scrapy.utils.request.request_from_dict`. (:issue:`1877`,
:issue:`5130`, :issue:`5218`)
@ -2452,14 +2452,13 @@ Backward-incompatible changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` once again
discards cookies defined in :attr:`Request.headers
<scrapy.http.Request.headers>`.
discards cookies defined in :attr:`.Request.headers`.
We decided to revert this bug fix, introduced in Scrapy 2.2.0, because it
was reported that the current implementation could break existing code.
If you need to set cookies for a request, use the :class:`Request.cookies
<scrapy.http.Request>` parameter.
<scrapy.Request>` parameter.
A future version of Scrapy will include a new, better implementation of the
reverted bug fix.
@ -2580,16 +2579,16 @@ New features
:meth:`~scrapy.downloadermiddlewares.DownloaderMiddleware.process_response`
or
:meth:`~scrapy.downloadermiddlewares.DownloaderMiddleware.process_exception`
with a custom :class:`~scrapy.http.Request` object assigned to
with a custom :class:`~scrapy.Request` object assigned to
:class:`response.request <scrapy.http.Response.request>`:
- The response is handled by the callback of that custom
:class:`~scrapy.http.Request` object, instead of being handled by the
callback of the original :class:`~scrapy.http.Request` object
:class:`~scrapy.Request` object, instead of being handled by the
callback of the original :class:`~scrapy.Request` object
- That custom :class:`~scrapy.http.Request` object is now sent as the
- That custom :class:`~scrapy.Request` object is now sent as the
``request`` argument to the :signal:`response_received` signal, instead
of the original :class:`~scrapy.http.Request` object
of the original :class:`~scrapy.Request` object
(:issue:`4529`, :issue:`4632`)
@ -2760,7 +2759,7 @@ New features
* The :command:`parse` command now allows specifying an output file
(:issue:`4317`, :issue:`4377`)
* :meth:`Request.from_curl <scrapy.http.Request.from_curl>` and
* :meth:`.Request.from_curl` and
:func:`~scrapy.utils.curl.curl_to_request_kwargs` now also support
``--data-raw`` (:issue:`4612`)
@ -2776,7 +2775,7 @@ Bug fixes
:ref:`dataclass items <dataclass-items>` and :ref:`attr.s items
<attrs-items>` (:issue:`4667`, :issue:`4668`)
* :meth:`Request.from_curl <scrapy.http.Request.from_curl>` and
* :meth:`.Request.from_curl` and
:func:`~scrapy.utils.curl.curl_to_request_kwargs` now set the request
method to ``POST`` when a request body is specified and no request method
is specified (:issue:`4612`)
@ -2861,8 +2860,7 @@ Backward-incompatible changes
Deprecations
~~~~~~~~~~~~
* :meth:`TextResponse.body_as_unicode
<scrapy.http.TextResponse.body_as_unicode>` is now deprecated, use
* ``TextResponse.body_as_unicode()`` is now deprecated, use
:attr:`TextResponse.text <scrapy.http.TextResponse.text>` instead
(:issue:`4546`, :issue:`4555`, :issue:`4579`)
@ -2901,9 +2899,8 @@ New features
* :ref:`Link extractors <topics-link-extractors>` are now serializable,
as long as you do not use :ref:`lambdas <lambda>` for parameters; for
example, you can now pass link extractors in :attr:`Request.cb_kwargs
<scrapy.http.Request.cb_kwargs>` or
:attr:`Request.meta <scrapy.http.Request.meta>` when :ref:`persisting
example, you can now pass link extractors in :attr:`.Request.cb_kwargs`
or :attr:`.Request.meta` when :ref:`persisting
scheduled requests <topics-jobs>` (:issue:`4554`)
* Upgraded the :ref:`pickle protocol <pickle-protocols>` that Scrapy uses
@ -2922,11 +2919,11 @@ Bug fixes
* :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` no longer
discards cookies defined in :attr:`Request.headers
<scrapy.http.Request.headers>` (:issue:`1992`, :issue:`2400`)
<scrapy.Request.headers>` (:issue:`1992`, :issue:`2400`)
* :class:`~scrapy.downloadermiddlewares.cookies.CookiesMiddleware` no longer
re-encodes cookies defined as :class:`bytes` in the ``cookies`` parameter
of the ``__init__`` method of :class:`~scrapy.http.Request`
of the ``__init__`` method of :class:`~scrapy.Request`
(:issue:`2400`, :issue:`3575`)
* When :setting:`FEEDS` defines multiple URIs, :setting:`FEED_STORE_EMPTY` is
@ -2935,7 +2932,7 @@ Bug fixes
* :class:`~scrapy.spiders.Spider` callbacks defined using :doc:`coroutine
syntax <topics/coroutines>` no longer need to return an iterable, and may
instead return a :class:`~scrapy.http.Request` object, an
instead return a :class:`~scrapy.Request` object, an
:ref:`item <topics-items>`, or ``None`` (:issue:`4609`)
* The :command:`startproject` command now ensures that the generated project
@ -2976,8 +2973,8 @@ Documentation
:issue:`4587`)
* The display-on-hover behavior of internal documentation references now also
covers links to :ref:`commands <topics-commands>`, :attr:`Request.meta
<scrapy.http.Request.meta>` keys, :ref:`settings <topics-settings>` and
covers links to :ref:`commands <topics-commands>`, :attr:`.Request.meta`
keys, :ref:`settings <topics-settings>` and
:ref:`signals <topics-signals>` (:issue:`4495`, :issue:`4563`)
* It is again possible to download the documentation for offline reading
@ -3262,7 +3259,7 @@ Deprecation removals
~~~~~~~~~~~~~~~~~~~~
* The :ref:`Scrapy shell <topics-shell>` no longer provides a `sel` proxy
object, use :meth:`response.selector <scrapy.http.Response.selector>`
object, use :meth:`response.selector <scrapy.http.TextResponse.selector>`
instead (:issue:`4347`)
* LevelDB support has been removed (:issue:`4112`)
@ -3332,10 +3329,10 @@ New features
* The new :attr:`Response.cb_kwargs <scrapy.http.Response.cb_kwargs>`
attribute serves as a shortcut for :attr:`Response.request.cb_kwargs
<scrapy.http.Request.cb_kwargs>` (:issue:`4331`)
<scrapy.Request.cb_kwargs>` (:issue:`4331`)
* :meth:`Response.follow <scrapy.http.Response.follow>` now supports a
``flags`` parameter, for consistency with :class:`~scrapy.http.Request`
``flags`` parameter, for consistency with :class:`~scrapy.Request`
(:issue:`4277`, :issue:`4279`)
* :ref:`Item loader processors <topics-loaders-processors>` can now be
@ -3344,7 +3341,7 @@ New features
* :class:`~scrapy.spiders.Rule` now accepts an ``errback`` parameter
(:issue:`4000`)
* :class:`~scrapy.http.Request` no longer requires a ``callback`` parameter
* :class:`~scrapy.Request` no longer requires a ``callback`` parameter
when an ``errback`` parameter is specified (:issue:`3586`, :issue:`4008`)
* :class:`~scrapy.logformatter.LogFormatter` now supports some additional
@ -3416,7 +3413,7 @@ Bug fixes
* Redirects to URLs starting with 3 slashes (``///``) are now supported
(:issue:`4032`, :issue:`4042`)
* :class:`~scrapy.http.Request` no longer accepts strings as ``url`` simply
* :class:`~scrapy.Request` no longer accepts strings as ``url`` simply
because they have a colon (:issue:`2552`, :issue:`4094`)
* The correct encoding is now used for attach names in
@ -3462,7 +3459,7 @@ Documentation
using :class:`~scrapy.crawler.CrawlerProcess` (:issue:`2149`,
:issue:`2352`, :issue:`3146`, :issue:`3960`)
* Clarified the requirements for :class:`~scrapy.http.Request` objects
* Clarified the requirements for :class:`~scrapy.Request` objects
:ref:`when using persistence <request-serialization>` (:issue:`4124`,
:issue:`4139`)
@ -3731,17 +3728,17 @@ Scrapy 1.8.2 (2022-03-01)
**Security bug fixes:**
- When a :class:`~scrapy.http.Request` object with cookies defined gets a
redirect response causing a new :class:`~scrapy.http.Request` object to be
- When a :class:`~scrapy.Request` object with cookies defined gets a
redirect response causing a new :class:`~scrapy.Request` object to be
scheduled, the cookies defined in the original
:class:`~scrapy.http.Request` object are no longer copied into the new
:class:`~scrapy.http.Request` object.
:class:`~scrapy.Request` object are no longer copied into the new
:class:`~scrapy.Request` object.
If you manually set the ``Cookie`` header on a
:class:`~scrapy.http.Request` object and the domain name of the redirect
:class:`~scrapy.Request` object and the domain name of the redirect
URL is not an exact match for the domain of the URL of the original
:class:`~scrapy.http.Request` object, your ``Cookie`` header is now dropped
from the new :class:`~scrapy.http.Request` object.
:class:`~scrapy.Request` object, your ``Cookie`` header is now dropped
from the new :class:`~scrapy.Request` object.
The old behavior could be exploited by an attacker to gain access to your
cookies. Please, see the `cjvr-mfj7-j4j8 security advisory`_ for more
@ -3754,10 +3751,10 @@ Scrapy 1.8.2 (2022-03-01)
``example.com`` and any subdomain) by defining the shared domain
suffix (e.g. ``example.com``) as the cookie domain when defining
your cookies. See the documentation of the
:class:`~scrapy.http.Request` class for more information.
:class:`~scrapy.Request` class for more information.
- When the domain of a cookie, either received in the ``Set-Cookie`` header
of a response or defined in a :class:`~scrapy.http.Request` object, is set
of a response or defined in a :class:`~scrapy.Request` object, is set
to a `public suffix <https://publicsuffix.org/>`_, the cookie is now
ignored unless the cookie domain is the same as the request domain.
@ -3815,7 +3812,7 @@ Highlights:
* Dropped Python 3.4 support and updated minimum requirements; made Python 3.8
support official
* New :meth:`Request.from_curl <scrapy.http.Request.from_curl>` class method
* New :meth:`.Request.from_curl` class method
* New :setting:`ROBOTSTXT_PARSER` and :setting:`ROBOTSTXT_USER_AGENT` settings
* New :setting:`DOWNLOADER_CLIENT_TLS_CIPHERS` and
:setting:`DOWNLOADER_CLIENT_TLS_VERBOSE_LOGGING` settings
@ -3869,7 +3866,7 @@ See also :ref:`1.8-deprecation-removals` below.
New features
~~~~~~~~~~~~
* A new :meth:`Request.from_curl <scrapy.http.Request.from_curl>` class
* A new :meth:`Request.from_curl <scrapy.Request.from_curl>` class
method allows :ref:`creating a request from a cURL command
<requests-from-curl>` (:issue:`2985`, :issue:`3862`)
@ -3898,9 +3895,8 @@ New features
``True`` to enable debug-level messages about TLS connection parameters
after establishing HTTPS connections (:issue:`2111`, :issue:`3450`)
* Callbacks that receive keyword arguments
(see :attr:`Request.cb_kwargs <scrapy.http.Request.cb_kwargs>`) can now be
tested using the new :class:`@cb_kwargs
* Callbacks that receive keyword arguments (see :attr:`.Request.cb_kwargs`)
can now be tested using the new :class:`@cb_kwargs
<scrapy.contracts.default.CallbackKeywordArgumentsContract>`
:ref:`spider contract <topics-contracts>` (:issue:`3985`, :issue:`3988`)
@ -4089,7 +4085,7 @@ Backward-incompatible changes
* Non-default values for the :setting:`SCHEDULER_PRIORITY_QUEUE` setting
may stop working. Scheduler priority queue classes now need to handle
:class:`~scrapy.http.Request` objects instead of arbitrary Python data
:class:`~scrapy.Request` objects instead of arbitrary Python data
structures.
* An additional ``crawler`` parameter has been added to the ``__init__``
@ -4111,7 +4107,7 @@ New features
scheduling improvement on crawls targeting multiple web domains, at the
cost of no :setting:`CONCURRENT_REQUESTS_PER_IP` support (:issue:`3520`)
* A new :attr:`Request.cb_kwargs <scrapy.http.Request.cb_kwargs>` attribute
* A new :attr:`.Request.cb_kwargs` attribute
provides a cleaner way to pass keyword arguments to callback methods
(:issue:`1138`, :issue:`3563`)
@ -4192,7 +4188,7 @@ Bug fixes
* Requests with private callbacks are now correctly unserialized from disk
(:issue:`3790`)
* :meth:`FormRequest.from_response() <scrapy.http.FormRequest.from_response>`
* :meth:`.FormRequest.from_response`
now handles invalid methods like major web browsers (:issue:`3777`,
:issue:`3794`)
@ -4272,13 +4268,13 @@ The following deprecated APIs have been removed (:issue:`3578`):
* From both ``scrapy.selector`` and ``scrapy.selector.lxmlsel``:
* ``HtmlXPathSelector`` (use :class:`~scrapy.selector.Selector`)
* ``HtmlXPathSelector`` (use :class:`~scrapy.Selector`)
* ``XmlXPathSelector`` (use :class:`~scrapy.selector.Selector`)
* ``XmlXPathSelector`` (use :class:`~scrapy.Selector`)
* ``XPathSelector`` (use :class:`~scrapy.selector.Selector`)
* ``XPathSelector`` (use :class:`~scrapy.Selector`)
* ``XPathSelectorList`` (use :class:`~scrapy.selector.Selector`)
* ``XPathSelectorList`` (use :class:`~scrapy.Selector`)
* From ``scrapy.selector.csstranslator``:
@ -4288,7 +4284,7 @@ The following deprecated APIs have been removed (:issue:`3578`):
* ``ScrapyXPathExpr`` (use parsel.csstranslator.XPathExpr_)
* From :class:`~scrapy.selector.Selector`:
* From :class:`~scrapy.Selector`:
* ``_root`` (both the ``__init__`` method argument and the object property, use
``root``)
@ -4818,7 +4814,7 @@ New Features
(:issue:`2535`)
- New :ref:`response.follow <response-follow-example>` shortcut
for creating requests (:issue:`1940`)
- Added ``flags`` argument and attribute to :class:`Request <scrapy.http.Request>`
- Added ``flags`` argument and attribute to :class:`~scrapy.Request`
objects (:issue:`2047`)
- Support Anonymous FTP (:issue:`2342`)
- Added ``retry/count``, ``retry/max_reached`` and ``retry/reason_count/<reason>``
@ -4860,7 +4856,7 @@ Bug fixes
- LinkExtractor now strips leading and trailing whitespaces from attributes
(:issue:`2547`, fixes :issue:`1614`)
- Properly handle whitespaces in action attribute in
:class:`~scrapy.http.FormRequest` (:issue:`2548`)
:class:`~scrapy.FormRequest` (:issue:`2548`)
- Buffer CONNECT response bytes from proxy until all HTTP headers are received
(:issue:`2495`, fixes :issue:`2491`)
- FTP downloader now works on Python 3, provided you use Twisted>=17.1
@ -4902,8 +4898,7 @@ Documentation
~~~~~~~~~~~~~
- Binary mode is required for exporters (:issue:`2564`, fixes :issue:`2553`)
- Mention issue with :meth:`FormRequest.from_response
<scrapy.http.FormRequest.from_response>` due to bug in lxml (:issue:`2572`)
- Mention issue with :meth:`.FormRequest.from_response` due to bug in lxml (:issue:`2572`)
- Use single quotes uniformly in templates (:issue:`2596`)
- Document :reqmeta:`ftp_user` and :reqmeta:`ftp_password` meta keys (:issue:`2587`)
- Removed section on deprecated ``contrib/`` (:issue:`2636`)
@ -5442,7 +5437,7 @@ Bugfixes
- Support empty password for http_proxy config (:issue:`1274`).
- Interpret ``application/x-json`` as ``TextResponse`` (:issue:`1333`).
- Support link rel attribute with multiple values (:issue:`1201`).
- Fixed ``scrapy.http.FormRequest.from_response`` when there is a ``<base>``
- Fixed ``scrapy.FormRequest.from_response`` when there is a ``<base>``
tag (:issue:`1564`).
- Fixed :setting:`TEMPLATES_DIR` handling (:issue:`1575`).
- Various ``FormRequest`` fixes (:issue:`1595`, :issue:`1596`, :issue:`1597`).
@ -6369,7 +6364,7 @@ Scrapy 0.18.0 (released 2013-08-09)
- Moved persistent (on disk) queues to a separate project (queuelib_) which Scrapy now depends on
- Add Scrapy commands using external libraries (:issue:`260`)
- Added ``--pdb`` option to ``scrapy`` command line tool
- Added :meth:`XPathSelector.remove_namespaces <scrapy.selector.Selector.remove_namespaces>` which allows to remove all namespaces from XML documents for convenience (to work with namespace-less XPaths). Documented in :ref:`topics-selectors`.
- Added :meth:`XPathSelector.remove_namespaces <scrapy.Selector.remove_namespaces>` which allows to remove all namespaces from XML documents for convenience (to work with namespace-less XPaths). Documented in :ref:`topics-selectors`.
- Several improvements to spider contracts
- New default middleware named MetaRefreshMiddleware that handles meta-refresh html tag redirections,
- MetaRefreshMiddleware and RedirectMiddleware have different priorities to address #62

View File

@ -80,7 +80,7 @@ object gives you access, for example, to the :ref:`settings <topics-settings>`.
middleware.
:meth:`process_request` should either: return ``None``, return a
:class:`~scrapy.Response` object, return a :class:`~scrapy.http.Request`
:class:`~scrapy.http.Response` object, return a :class:`~scrapy.Request`
object, or raise :exc:`~scrapy.exceptions.IgnoreRequest`.
If it returns ``None``, Scrapy will continue processing this request, executing all

View File

@ -117,7 +117,8 @@ data from it depends on the type of response:
- If the response is HTML, XML or JSON, use :ref:`selectors
<topics-selectors>` as usual.
- If the response is JSON, use :func:`response.json()` to load the desired data:
- If the response is JSON, use :func:`response.json()
<scrapy.http.TextResponse.json>` to load the desired data:
.. code-block:: python
@ -143,7 +144,7 @@ data from it depends on the type of response:
- If the response is an image or another format based on images (e.g. PDF),
read the response as bytes from
:attr:`response.body <scrapy.http.TextResponse.body>` and use an OCR
:attr:`response.body <scrapy.http.Response.body>` and use an OCR
solution to extract the desired data as text.
For example, you can use pytesseract_. To read a table from a PDF,

View File

@ -105,7 +105,7 @@ response:
In both cases, the response could have its body truncated: the body contains
all bytes received up until the exception is raised, including the bytes
received in the signal handler that raises the exception. Also, the response
object is marked with ``"download_stopped"`` in its :attr:`Response.flags`
object is marked with ``"download_stopped"`` in its :attr:`~scrapy.http.Response.flags`
attribute.
.. note:: ``fail`` is a keyword-only parameter, i.e. raising

View File

@ -7,15 +7,15 @@ Requests and Responses
.. module:: scrapy.http
:synopsis: Request and Response classes
Scrapy uses :class:`Request` and :class:`Response` objects for crawling web
Scrapy uses :class:`~scrapy.Request` and :class:`Response` objects for crawling web
sites.
Typically, :class:`Request` objects are generated in the spiders and pass
Typically, :class:`~scrapy.Request` objects are generated in the spiders and pass
across the system until they reach the Downloader, which executes the request
and returns a :class:`Response` object which travels back to the spider that
issued the request.
Both :class:`Request` and :class:`Response` classes have subclasses which add
Both :class:`~scrapy.Request` and :class:`Response` classes have subclasses which add
functionality not required in the base classes. These are described
below in :ref:`topics-request-response-ref-request-subclasses` and
:ref:`topics-request-response-ref-response-subclasses`.
@ -24,7 +24,7 @@ below in :ref:`topics-request-response-ref-request-subclasses` and
Request objects
===============
.. autoclass:: Request
.. autoclass:: scrapy.Request
:param url: the URL of this request
@ -52,7 +52,7 @@ Request objects
:param method: the HTTP method of this request. Defaults to ``'GET'``.
:type method: str
:param meta: the initial values for the :attr:`Request.meta` attribute. If
:param meta: the initial values for the :attr:`.Request.meta` attribute. If
given, the dict passed in this parameter will be shallow copied.
:type meta: dict
@ -67,10 +67,10 @@ Request objects
(for single valued headers) or lists (for multi-valued headers). If
``None`` is passed as value, the HTTP header will not be sent at all.
.. caution:: Cookies set via the ``Cookie`` header are not considered by the
:ref:`cookies-mw`. If you need to set cookies for a request, use the
:class:`Request.cookies <scrapy.Request>` parameter. This is a known
current limitation that is being worked on.
.. caution:: Cookies set via the ``Cookie`` header are not considered by the
:ref:`cookies-mw`. If you need to set cookies for a request, use the
``cookies`` argument. This is a known current limitation that is being
worked on.
:type headers: dict
@ -124,7 +124,7 @@ Request objects
.. caution:: Cookies set via the ``Cookie`` header are not considered by the
:ref:`cookies-mw`. If you need to set cookies for a request, use the
:class:`Request.cookies <scrapy.Request>` parameter. This is a known
:class:`scrapy.Request.cookies <scrapy.Request>` parameter. This is a known
current limitation that is being worked on.
.. versionadded:: 2.6.0
@ -172,7 +172,7 @@ Request objects
A string containing the URL of this request. Keep in mind that this
attribute contains the escaped URL, so it can differ from the URL passed in
the ``__init__`` method.
the ``__init__()`` method.
This attribute is read-only. To change the URL of a Request use
:meth:`replace`.
@ -184,7 +184,8 @@ Request objects
.. attribute:: Request.headers
A dictionary-like object which contains the request headers.
A dictionary-like (:class:`scrapy.http.headers.Headers`) object which contains
the request headers.
.. attribute:: Request.body
@ -240,8 +241,8 @@ Request objects
A dictionary that contains arbitrary metadata for this request. Its contents
will be passed to the Request's callback as keyword arguments. It is empty
for new Requests, which means by default callbacks only get a :class:`Response`
object as argument.
for new Requests, which means by default callbacks only get a
:class:`~scrapy.http.Response` object as argument.
This dict is :doc:`shallow copied <library/copy>` when the request is
cloned using the ``copy()`` or ``replace()`` methods, and can also be
@ -262,7 +263,7 @@ Request objects
Return a Request object with the same members, except for those members
given new values by whichever keyword arguments are specified. The
:attr:`Request.cb_kwargs` and :attr:`Request.meta` attributes are shallow
:attr:`~scrapy.Request.cb_kwargs` and :attr:`~scrapy.Request.meta` attributes are shallow
copied by default (unless new values are given as arguments). See also
:ref:`topics-request-response-ref-request-callback-arguments`.
@ -305,7 +306,7 @@ Example:
In some cases you may be interested in passing arguments to those callback
functions so you can receive the arguments later, in the second callback.
The following example shows how to achieve this by using the
:attr:`Request.cb_kwargs` attribute:
:attr:`.Request.cb_kwargs` attribute:
.. code-block:: python
@ -326,10 +327,10 @@ The following example shows how to achieve this by using the
foo=foo,
)
.. caution:: :attr:`Request.cb_kwargs` was introduced in version ``1.7``.
Prior to that, using :attr:`Request.meta` was recommended for passing
information around callbacks. After ``1.7``, :attr:`Request.cb_kwargs`
became the preferred way for handling user information, leaving :attr:`Request.meta`
.. caution:: :attr:`.Request.cb_kwargs` was introduced in version ``1.7``.
Prior to that, using :attr:`.Request.meta` was recommended for passing
information around callbacks. After ``1.7``, :attr:`.Request.cb_kwargs`
became the preferred way for handling user information, leaving :attr:`.Request.meta`
for communication with components like middlewares and extensions.
.. _topics-request-response-ref-errbacks:
@ -441,7 +442,7 @@ Request fingerprints
There are some aspects of scraping, such as filtering out duplicate requests
(see :setting:`DUPEFILTER_CLASS`) or caching responses (see
:setting:`HTTPCACHE_POLICY`), where you need the ability to generate a short,
unique identifier from a :class:`~scrapy.http.Request` object: a request
unique identifier from a :class:`~scrapy.Request` object: a request
fingerprint.
You often do not need to worry about request fingerprints, the default request
@ -486,7 +487,7 @@ A request fingerprinter is a class that must implement the following method:
See also :ref:`request-fingerprint-restrictions`.
:param request: request to fingerprint
:type request: scrapy.http.Request
:type request: scrapy.Request
Additionally, it may also implement the following method:
@ -566,7 +567,7 @@ URL canonicalization or taking the request method or body into account:
If you need to be able to override the request fingerprinting for arbitrary
requests from your spider callbacks, you may implement a request fingerprinter
that reads fingerprints from :attr:`request.meta <scrapy.http.Request.meta>`
that reads fingerprints from :attr:`request.meta <scrapy.Request.meta>`
when available, and then falls back to
:func:`scrapy.utils.request.fingerprint`. For example:
@ -581,10 +582,8 @@ when available, and then falls back to
return request.meta["fingerprint"]
return fingerprint(request)
If you need to reproduce the same fingerprinting algorithm as Scrapy 2.6
without using the deprecated ``'2.6'`` value of the
:setting:`REQUEST_FINGERPRINTER_IMPLEMENTATION` setting, use the following
request fingerprinter:
If you need to reproduce the same fingerprinting algorithm as Scrapy 2.6, use
the following request fingerprinter:
.. code-block:: python
@ -628,7 +627,7 @@ The following built-in Scrapy components have such restrictions:
:setting:`HTTPCACHE_DIR` also apply. Inside :setting:`HTTPCACHE_DIR`,
the following directory structure is created:
- :attr:`Spider.name <scrapy.spiders.Spider.name>`
- :attr:`.Spider.name`
- first byte of a request fingerprint as hexadecimal
@ -656,7 +655,7 @@ The following built-in Scrapy components have such restrictions:
Request.meta special keys
=========================
The :attr:`Request.meta` attribute can contain any arbitrary data, but there
The :attr:`.Request.meta` attribute can contain any arbitrary data, but there
are some special keys recognized by Scrapy and its built-in extensions.
Those are:
@ -780,24 +779,25 @@ call their callback instead, like in this example, pass ``fail=False`` to the
Request subclasses
==================
Here is the list of built-in :class:`Request` subclasses. You can also subclass
Here is the list of built-in :class:`~scrapy.Request` subclasses. You can also subclass
it to implement your own custom functionality.
FormRequest objects
-------------------
The FormRequest class extends the base :class:`Request` with functionality for
The FormRequest class extends the base :class:`~scrapy.Request` with functionality for
dealing with HTML forms. It uses `lxml.html forms`_ to pre-populate form
fields with form data from :class:`Response` objects.
.. _lxml.html forms: https://lxml.de/lxmlhtml.html#forms
.. class:: scrapy.http.request.form.FormRequest
.. class:: scrapy.http.FormRequest
.. class:: scrapy.FormRequest(url, [formdata, ...])
.. currentmodule:: None
The :class:`FormRequest` class adds a new keyword parameter to the ``__init__`` method. The
remaining arguments are the same as for the :class:`Request` class and are
.. class:: scrapy.FormRequest(url, [formdata, ...])
:canonical: scrapy.http.request.form.FormRequest
The :class:`~scrapy.FormRequest` class adds a new keyword parameter to the ``__init__()`` method. The
remaining arguments are the same as for the :class:`~scrapy.Request` class and are
not documented here.
:param formdata: is a dictionary (or iterable of (key, value) tuples)
@ -805,12 +805,12 @@ fields with form data from :class:`Response` objects.
body of the request.
:type formdata: dict or collections.abc.Iterable
The :class:`FormRequest` objects support the following class method in
addition to the standard :class:`Request` methods:
The :class:`~scrapy.FormRequest` objects support the following class method in
addition to the standard :class:`~scrapy.Request` methods:
.. classmethod:: FormRequest.from_response(response, [formname=None, formid=None, formnumber=0, formdata=None, formxpath=None, formcss=None, clickdata=None, dont_click=False, ...])
.. classmethod:: from_response(response, [formname=None, formid=None, formnumber=0, formdata=None, formxpath=None, formcss=None, clickdata=None, dont_click=False, ...])
Returns a new :class:`FormRequest` object with its form field values
Returns a new :class:`~scrapy.FormRequest` object with its form field values
pre-populated with those found in the HTML ``<form>`` element contained
in the given response. For an example see
:ref:`topics-request-response-ref-request-userlogin`.
@ -832,7 +832,7 @@ fields with form data from :class:`Response` objects.
:param response: the response containing a HTML form which will be used
to pre-populate the form fields
:type response: :class:`Response` object
:type response: :class:`~scrapy.http.Response` object
:param formname: if given, the form with name attribute set to this value will be used.
:type formname: str
@ -869,7 +869,9 @@ fields with form data from :class:`Response` objects.
:type dont_click: bool
The other parameters of this class method are passed directly to the
:class:`FormRequest` ``__init__`` method.
:class:`~scrapy.FormRequest` ``__init__()`` method.
.. currentmodule:: scrapy.http
Request usage examples
----------------------
@ -878,7 +880,7 @@ Using FormRequest to send data via HTTP POST
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you want to simulate a HTML Form POST in your spider and send a couple of
key-value fields, you can return a :class:`FormRequest` object (from your
key-value fields, you can return a :class:`~scrapy.FormRequest` object (from your
spider) like this:
.. skip: next
@ -901,7 +903,7 @@ It is usual for web sites to provide pre-populated form fields through ``<input
type="hidden">`` elements, such as session related data or authentication
tokens (for login pages). When scraping, you'll want these fields to be
automatically pre-populated and only override a couple of them, such as the
user name and password. You can use the :meth:`FormRequest.from_response`
user name and password. You can use the :meth:`.FormRequest.from_response()`
method for this job. Here's an example spider which uses it:
.. code-block:: python
@ -936,21 +938,22 @@ method for this job. Here's an example spider which uses it:
JsonRequest
-----------
The JsonRequest class extends the base :class:`Request` class with functionality for
The JsonRequest class extends the base :class:`~scrapy.Request` class with functionality for
dealing with JSON requests.
.. class:: JsonRequest(url, [... data, dumps_kwargs])
The :class:`JsonRequest` class adds two new keyword parameters to the ``__init__`` method. The
remaining arguments are the same as for the :class:`Request` class and are
The :class:`JsonRequest` class adds two new keyword parameters to the ``__init__()`` method. The
remaining arguments are the same as for the :class:`~scrapy.Request` class and are
not documented here.
Using the :class:`JsonRequest` will set the ``Content-Type`` header to ``application/json``
and ``Accept`` header to ``application/json, text/javascript, */*; q=0.01``
:param data: is any JSON serializable object that needs to be JSON encoded and assigned to body.
if :attr:`Request.body` argument is provided this parameter will be ignored.
if :attr:`Request.body` argument is not provided and data argument is provided :attr:`Request.method` will be
If the :attr:`~scrapy.Request.body` argument is provided this parameter will be ignored.
If the :attr:`~scrapy.Request.body` argument is not provided and the
``data`` argument is provided the :attr:`~scrapy.Request.method` will be
set to ``'POST'`` automatically.
:type data: object
@ -1002,7 +1005,7 @@ Response objects
:type flags: list
:param request: the initial value of the :attr:`Response.request` attribute.
This represents the :class:`Request` that generated this response.
This represents the :class:`~scrapy.Request` that generated this response.
:type request: scrapy.Request
:param certificate: an object representing the server's SSL certificate.
@ -1038,11 +1041,12 @@ Response objects
.. attribute:: Response.headers
A dictionary-like object which contains the response headers. Values can
be accessed using :meth:`get` to return the first header value with the
specified name or :meth:`getlist` to return all header values with the
specified name. For example, this call will give you all cookies in the
headers::
A dictionary-like (:class:`scrapy.http.headers.Headers`) object which contains
the response headers. Values can be accessed using
:meth:`~scrapy.http.headers.Headers.get` to return the first header value with
the specified name or :meth:`~scrapy.http.headers.Headers.getlist` to return
all header values with the specified name. For example, this call will give you
all cookies in the headers::
response.headers.getlist('Set-Cookie')
@ -1058,7 +1062,7 @@ Response objects
.. attribute:: Response.request
The :class:`Request` object that generated this response. This attribute is
The :class:`~scrapy.Request` object that generated this response. This attribute is
assigned in the Scrapy engine, after the response and the request have passed
through all :ref:`Downloader Middlewares <topics-downloader-middleware>`.
In particular, this means that:
@ -1077,34 +1081,33 @@ Response objects
.. attribute:: Response.meta
A shortcut to the :attr:`Request.meta` attribute of the
A shortcut to the :attr:`~scrapy.Request.meta` attribute of the
:attr:`Response.request` object (i.e. ``self.request.meta``).
Unlike the :attr:`Response.request` attribute, the :attr:`Response.meta`
attribute is propagated along redirects and retries, so you will get
the original :attr:`Request.meta` sent from your spider.
the original :attr:`.Request.meta` sent from your spider.
.. seealso:: :attr:`Request.meta` attribute
.. seealso:: :attr:`.Request.meta` attribute
.. attribute:: Response.cb_kwargs
.. versionadded:: 2.0
A shortcut to the :attr:`Request.cb_kwargs` attribute of the
A shortcut to the :attr:`~scrapy.Request.cb_kwargs` attribute of the
:attr:`Response.request` object (i.e. ``self.request.cb_kwargs``).
Unlike the :attr:`Response.request` attribute, the
:attr:`Response.cb_kwargs` attribute is propagated along redirects and
retries, so you will get the original :attr:`Request.cb_kwargs` sent
from your spider.
retries, so you will get the original :attr:`.Request.cb_kwargs` sent from your spider.
.. seealso:: :attr:`Request.cb_kwargs` attribute
.. seealso:: :attr:`.Request.cb_kwargs` attribute
.. attribute:: Response.flags
A list that contains flags for this response. Flags are labels used for
tagging Responses. For example: ``'cached'``, ``'redirected``', etc. And
they're shown on the string representation of the Response (`__str__`
they're shown on the string representation of the Response (``__str__()``
method) which is used by the engine for logging.
.. attribute:: Response.certificate
@ -1181,7 +1184,7 @@ TextResponse objects
:class:`Response` class, which is meant to be used only for binary data,
such as images, sounds or any media file.
:class:`TextResponse` objects support a new ``__init__`` method argument, in
:class:`TextResponse` objects support a new ``__init__()`` method argument, in
addition to the base :class:`Response` objects. The remaining functionality
is the same as for the :class:`Response` class and is not documented here.
@ -1219,7 +1222,7 @@ TextResponse objects
A string with the encoding of this response. The encoding is resolved by
trying the following mechanisms, in order:
1. the encoding passed in the ``__init__`` method ``encoding`` argument
1. the encoding passed in the ``__init__()`` method ``encoding`` argument
2. the encoding declared in the Content-Type HTTP header. If this
encoding is not valid (i.e. unknown), it is ignored and the next
@ -1273,7 +1276,7 @@ TextResponse objects
Constructs an absolute url by combining the Response's base url with
a possible relative url. The base url shall be extracted from the
``<base>`` tag, or just the Response's :attr:`url` if there is no such
``<base>`` tag, or just :attr:`Response.url` if there is no such
tag.

View File

@ -777,7 +777,7 @@ Removing namespaces
When dealing with scraping projects, it is often quite convenient to get rid of
namespaces altogether and just work with element names, to write more
simple/convenient XPaths. You can use the
:meth:`Selector.remove_namespaces` method for that.
:meth:`.Selector.remove_namespaces` method for that.
Let's show an example that illustrates this with the Python Insider blog atom feed.
@ -814,7 +814,7 @@ doesn't work (because the Atom XML namespace is obfuscating those nodes):
>>> response.xpath("//link")
[]
But once we call the :meth:`Selector.remove_namespaces` method, all
But once we call the :meth:`.Selector.remove_namespaces` method, all
nodes can be accessed directly by their names:
.. code-block:: pycon
@ -1046,7 +1046,7 @@ Built-in Selectors reference
Selector objects
----------------
.. autoclass:: Selector
.. autoclass:: scrapy.Selector
.. automethod:: xpath
@ -1126,8 +1126,8 @@ Examples
Selector examples on HTML response
----------------------------------
Here are some :class:`Selector` examples to illustrate several concepts.
In all cases, we assume there is already a :class:`Selector` instantiated with
Here are some :class:`~scrapy.Selector` examples to illustrate several concepts.
In all cases, we assume there is already a :class:`~scrapy.Selector` instantiated with
a :class:`~scrapy.http.HtmlResponse` object like this:
.. code-block:: python
@ -1135,7 +1135,7 @@ a :class:`~scrapy.http.HtmlResponse` object like this:
sel = Selector(html_response)
1. Select all ``<h1>`` elements from an HTML response body, returning a list of
:class:`Selector` objects (i.e. a :class:`SelectorList` object):
:class:`~scrapy.Selector` objects (i.e. a :class:`SelectorList` object):
.. code-block:: python
@ -1165,7 +1165,7 @@ Selector examples on XML response
.. skip: start
Here are some examples to illustrate concepts for :class:`Selector` objects
Here are some examples to illustrate concepts for :class:`~scrapy.Selector` objects
instantiated with an :class:`~scrapy.http.XmlResponse` object:
.. code-block:: python
@ -1173,7 +1173,7 @@ instantiated with an :class:`~scrapy.http.XmlResponse` object:
sel = Selector(xml_response)
1. Select all ``<product>`` elements from an XML response body, returning a list
of :class:`Selector` objects (i.e. a :class:`SelectorList` object):
of :class:`~scrapy.Selector` objects (i.e. a :class:`SelectorList` object):
.. code-block:: python

View File

@ -115,7 +115,7 @@ class BaseScheduler(metaclass=BaseSchedulerMeta):
@abstractmethod
def next_request(self) -> Request | None:
"""
Return the next :class:`~scrapy.http.Request` to be processed, or ``None``
Return the next :class:`~scrapy.Request` to be processed, or ``None``
to indicate that there are no requests to be considered ready at the moment.
Returning ``None`` implies that no request from the scheduler will be sent
@ -263,7 +263,7 @@ class Scheduler(BaseScheduler):
def next_request(self) -> Request | None:
"""
Return a :class:`~scrapy.http.Request` object from the memory queue,
Return a :class:`~scrapy.Request` object from the memory queue,
falling back to the disk queue if the memory queue is empty.
Return ``None`` if there are no more enqueued requests.

View File

@ -59,7 +59,7 @@ RequestTypeVar = TypeVar("RequestTypeVar", bound="Request")
def NO_CALLBACK(*args: Any, **kwargs: Any) -> NoReturn:
"""When assigned to the ``callback`` parameter of
:class:`~scrapy.http.Request`, it indicates that the request is not meant
:class:`~scrapy.Request`, it indicates that the request is not meant
to have a spider callback at all.
For example:
@ -83,7 +83,7 @@ def NO_CALLBACK(*args: Any, **kwargs: Any) -> NoReturn:
class Request(object_ref):
"""Represents an HTTP request, which is usually generated in a Spider and
executed by the Downloader, thus generating a :class:`Response`.
executed by the Downloader, thus generating a :class:`~scrapy.http.Response`.
"""
attributes: tuple[str, ...] = (
@ -103,9 +103,9 @@ class Request(object_ref):
)
"""A tuple of :class:`str` objects containing the name of all public
attributes of the class that are also keyword parameters of the
``__init__`` method.
``__init__()`` method.
Currently used by :meth:`Request.replace`, :meth:`Request.to_dict` and
Currently used by :meth:`.Request.replace`, :meth:`.Request.to_dict` and
:func:`~scrapy.utils.request.request_from_dict`.
"""
@ -233,7 +233,7 @@ class Request(object_ref):
finding unknown options call this method by passing
``ignore_unknown_options=False``.
.. caution:: Using :meth:`from_curl` from :class:`~scrapy.http.Request`
.. caution:: Using :meth:`from_curl` from :class:`~scrapy.Request`
subclasses, such as :class:`~scrapy.http.JsonRequest`, or
:class:`~scrapy.http.XmlRpcRequest`, as well as having
:ref:`downloader middlewares <topics-downloader-middleware>`
@ -244,7 +244,7 @@ class Request(object_ref):
:class:`~scrapy.downloadermiddlewares.useragent.UserAgentMiddleware`,
or
:class:`~scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware`,
may modify the :class:`~scrapy.http.Request` object.
may modify the :class:`~scrapy.Request` object.
To translate a cURL command into a Scrapy request,
you may use `curl2scrapy <https://michael-shub.github.io/curl2scrapy/>`_.

View File

@ -51,7 +51,7 @@ class Response(object_ref):
)
"""A tuple of :class:`str` objects containing the name of all public
attributes of the class that are also keyword parameters of the
``__init__`` method.
``__init__()`` method.
Currently used by :meth:`Response.replace`.
"""
@ -199,8 +199,8 @@ class Response(object_ref):
) -> Request:
"""
Return a :class:`~.Request` instance to follow a link ``url``.
It accepts the same arguments as ``Request.__init__`` method,
but ``url`` can be a relative URL or a ``scrapy.link.Link`` object,
It accepts the same arguments as ``Request.__init__()`` method,
but ``url`` can be a relative URL or a :class:`~scrapy.link.Link` object,
not only an absolute URL.
:class:`~.TextResponse` provides a :meth:`~.TextResponse.follow`
@ -254,7 +254,7 @@ class Response(object_ref):
.. versionadded:: 2.0
Return an iterable of :class:`~.Request` instances to follow all links
in ``urls``. It accepts the same arguments as ``Request.__init__`` method,
in ``urls``. It accepts the same arguments as ``Request.__init__()`` method,
but elements of ``urls`` can be relative URLs or :class:`~scrapy.link.Link` objects,
not only absolute URLs.

View File

@ -185,15 +185,15 @@ class TextResponse(Response):
) -> Request:
"""
Return a :class:`~.Request` instance to follow a link ``url``.
It accepts the same arguments as ``Request.__init__`` method,
It accepts the same arguments as ``Request.__init__()`` method,
but ``url`` can be not only an absolute URL, but also
* a relative URL
* a :class:`~scrapy.link.Link` object, e.g. the result of
:ref:`topics-link-extractors`
* a :class:`~scrapy.selector.Selector` object for a ``<link>`` or ``<a>`` element, e.g.
* a :class:`~scrapy.Selector` object for a ``<link>`` or ``<a>`` element, e.g.
``response.css('a.my_link')[0]``
* an attribute :class:`~scrapy.selector.Selector` (not SelectorList), e.g.
* an attribute :class:`~scrapy.Selector` (not SelectorList), e.g.
``response.css('a::attr(href)')[0]`` or
``response.xpath('//img/@src')[0]``
@ -241,20 +241,20 @@ class TextResponse(Response):
"""
A generator that produces :class:`~.Request` instances to follow all
links in ``urls``. It accepts the same arguments as the :class:`~.Request`'s
``__init__`` method, except that each ``urls`` element does not need to be
``__init__()`` method, except that each ``urls`` element does not need to be
an absolute URL, it can be any of the following:
* a relative URL
* a :class:`~scrapy.link.Link` object, e.g. the result of
:ref:`topics-link-extractors`
* a :class:`~scrapy.selector.Selector` object for a ``<link>`` or ``<a>`` element, e.g.
* a :class:`~scrapy.Selector` object for a ``<link>`` or ``<a>`` element, e.g.
``response.css('a.my_link')[0]``
* an attribute :class:`~scrapy.selector.Selector` (not SelectorList), e.g.
* an attribute :class:`~scrapy.Selector` (not SelectorList), e.g.
``response.css('a::attr(href)')[0]`` or
``response.xpath('//img/@src')[0]``
In addition, ``css`` and ``xpath`` arguments are accepted to perform the link extraction
within the ``follow_all`` method (only one of ``urls``, ``css`` and ``xpath`` is accepted).
within the ``follow_all()`` method (only one of ``urls``, ``css`` and ``xpath`` is accepted).
Note that when passing a ``SelectorList`` as argument for the ``urls`` parameter or
using the ``css`` or ``xpath`` parameters, this method will not produce requests for

View File

@ -32,7 +32,7 @@ class ItemLoader(itemloaders.ItemLoader):
:param selector: The selector to extract data from, when using the
:meth:`add_xpath`, :meth:`add_css`, :meth:`replace_xpath`, or
:meth:`replace_css` method.
:type selector: :class:`~scrapy.selector.Selector` object
:type selector: :class:`~scrapy.Selector` object
:param response: The response used to construct the selector using the
:attr:`default_selector_class`, unless the selector argument is given,
@ -79,7 +79,7 @@ class ItemLoader(itemloaders.ItemLoader):
.. attribute:: selector
The :class:`~scrapy.selector.Selector` object to extract data from.
The :class:`~scrapy.Selector` object to extract data from.
It's either the selector given in the ``__init__`` method or one created from
the response given in the ``__init__`` method using the
:attr:`default_selector_class`. This attribute is meant to be

View File

@ -1,6 +1,6 @@
"""
This module provides some useful functions for working with
scrapy.http.Request objects
scrapy.Request objects
"""
from __future__ import annotations
@ -109,12 +109,10 @@ class RequestFingerprinter:
It takes into account a canonical version
(:func:`w3lib.url.canonicalize_url`) of :attr:`request.url
<scrapy.http.Request.url>` and the values of :attr:`request.method
<scrapy.http.Request.method>` and :attr:`request.body
<scrapy.http.Request.body>`. It then generates an `SHA1
<scrapy.Request.url>` and the values of :attr:`request.method
<scrapy.Request.method>` and :attr:`request.body
<scrapy.Request.body>`. It then generates an `SHA1
<https://en.wikipedia.org/wiki/SHA-1>`_ hash.
.. seealso:: :setting:`REQUEST_FINGERPRINTER_IMPLEMENTATION`.
"""
@classmethod