Request.cb_kwargs: Update docs

2025-02-24 05:45:31 +00:00 · 2019-03-28 15:18:00 -03:00 · 2019-03-28 15:18:00 -03:00 · 8fb077694f
commit 8fb077694f
parent e8af6331b5
4 changed files with 24 additions and 25 deletions
--- a/docs/topics/debug.rst
+++ b/docs/topics/debug.rst
@ -28,16 +28,15 @@ Consider the following scrapy spider below::
            item = MyItem()
            # populate `item` fields
            # and extract item_details_url
-            yield scrapy.Request(item_details_url, self.parse_details, meta={'item': item})
+            yield scrapy.Request(item_details_url, self.parse_details, cb_kwargs={'item': item})

-        def parse_details(self, response):
-            item = response.meta['item']
+        def parse_details(self, response, item):
            # populate more `item` fields
            return item

 Basically this is a simple spider which parses two pages of items (the
 start_urls). Items also have a details page with additional information, so we
-use the ``meta`` functionality of :class:`~scrapy.http.Request` to pass a
+use the ``cb_kwargs`` functionality of :class:`~scrapy.http.Request` to pass a
 partially populated item.


@ -100,8 +99,7 @@ Fortunately, the :command:`shell` is your bread and butter in this case (see

    from scrapy.shell import inspect_response

-    def parse_details(self, response):
-        item = response.meta.get('item', None)
+    def parse_details(self, response, item=None):
        if item:
            # populate more `item` fields
            return item
@ -134,8 +132,7 @@ Logging is another useful option for getting information about your spider run.
 Although not as convenient, it comes with the advantage that the logs will be
 available in all future runs should they be necessary again::

-    def parse_details(self, response):
-        item = response.meta.get('item', None)
+    def parse_details(self, response, item=None):
        if item:
            # populate more `item` fields
            return item
--- a/docs/topics/jobs.rst
+++ b/docs/topics/jobs.rst
@ -81,7 +81,8 @@ So, for example, this won't work::

    def some_callback(self, response):
        somearg = 'test'
-        return scrapy.Request('http://www.example.com', callback=lambda r: self.other_callback(r, somearg))
+        return scrapy.Request('http://www.example.com',
+                              callback=lambda r: self.other_callback(r, somearg))

    def other_callback(self, response, somearg):
        print("the argument passed is: %s" % somearg)
@ -90,10 +91,10 @@ But this will::

    def some_callback(self, response):
        somearg = 'test'
-        return scrapy.Request('http://www.example.com', callback=self.other_callback, meta={'somearg': somearg})
+        return scrapy.Request('http://www.example.com',
+                              callback=self.other_callback, cb_kwargs={'somearg': somearg})

-    def other_callback(self, response):
-        somearg = response.meta['somearg']
+    def other_callback(self, response, somearg):
        print("the argument passed is: %s" % somearg)

 If you wish to log the requests that couldn't be serialized, you can set the
--- a/docs/topics/leaks.rst
+++ b/docs/topics/leaks.rst
@ -27,10 +27,11 @@ Common causes of memory leaks

 It happens quite often (sometimes by accident, sometimes on purpose) that the
 Scrapy developer passes objects referenced in Requests (for example, using the
-:attr:`~scrapy.http.Request.meta` attribute or the request callback function)
-and that effectively bounds the lifetime of those referenced objects to the
-lifetime of the Request. This is, by far, the most common cause of memory leaks
-in Scrapy projects, and a quite difficult one to debug for newcomers.
+:attr:`~scrapy.http.Request.cb_kwargs` or :attr:`~scrapy.http.Request.meta`
+attributes or the request callback function) and that effectively bounds the
+lifetime of those referenced objects to the lifetime of the Request. This is,
+by far, the most common cause of memory leaks in Scrapy projects, and a quite
+difficult one to debug for newcomers.

 In big projects, the spiders are typically written by different people and some
 of those spiders could be "leaking" and thus affecting the rest of the other
@ -48,7 +49,8 @@ Too Many Requests?

 By default Scrapy keeps the request queue in memory; it includes
 :class:`~scrapy.http.Request` objects and all objects
-referenced in Request attributes (e.g. in :attr:`~scrapy.http.Request.meta`).
+referenced in Request attributes (e.g. in :attr:`~scrapy.http.Request.cb_kwargs`
+and :attr:`~scrapy.http.Request.meta`).
 While not necessarily a leak, this can take a lot of memory. Enabling
 :ref:`persistent job queue <topics-jobs>` could help keeping memory usage
 in control.
@ -101,7 +103,7 @@ Let's see a concrete example of a hypothetical case of memory leaks.
 Suppose we have some spider with a line similar to this one::

    return Request("http://www.somenastyspider.com/product.php?pid=%d" % product_id,
-        callback=self.parse, meta={referer: response})
+                   callback=self.parse, cb_kwargs={'referer': response})

 That line is passing a response reference inside a request which effectively
 ties the response lifetime to the requests' one, and that would definitely
--- a/docs/topics/request-response.rst
+++ b/docs/topics/request-response.rst
@ -186,12 +186,12 @@ Request objects
       Return a new Request which is a copy of this Request. See also:
       :ref:`topics-request-response-ref-request-callback-arguments`.

-    .. method:: Request.replace([url, method, headers, body, cookies, meta, encoding, dont_filter, callback, errback])
+    .. method:: Request.replace([url, method, headers, body, cookies, meta, flags, encoding, priority, dont_filter, callback, errback, cb_kwargs])

       Return a Request object with the same members, except for those members
       given new values by whichever keyword arguments are specified. The
-       attribute :attr:`Request.meta` is copied by default (unless a new value
-       is given in the ``meta`` argument). See also
+       :attr:`Request.cb_kwargs` and :attr:`Request.meta` attributes are copied by default
+       (unless new values are given as arguments). See also
       :ref:`topics-request-response-ref-request-callback-arguments`.

 .. _topics-request-response-ref-request-callback-arguments:
@ -237,11 +237,10 @@ The following example shows how to achieve this by using the

 .. caution:: :attr:`Request.cb_kwargs` was introduced in version ``1.7``.
   Prior to that, :attr:`Request.meta` was the recommended option for passing
-   information around callbacks. However, after ``1.7`` :attr:`Request.cb_kwargs`
+   information around callbacks. However, after ``1.7``, using :attr:`Request.cb_kwargs`
   became the preferred way of passing user information, leaving :attr:`Request.meta`
-   to be used by internal components like spider or downloader middlewares.
-   The following example, which uses :attr:`Request.meta`, is only kept for historical
-   reasons.
+   to be populated by internal components like spider or downloader middlewares.
+   The following :attr:`Request.meta` example is only kept for historical reasons.

 ::