1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 05:45:31 +00:00

Request.cb_kwargs: Update docs

This commit is contained in:
Eugenio Lacuesta 2019-03-28 15:18:00 -03:00
parent e8af6331b5
commit 8fb077694f
4 changed files with 24 additions and 25 deletions

View File

@ -28,16 +28,15 @@ Consider the following scrapy spider below::
item = MyItem()
# populate `item` fields
# and extract item_details_url
yield scrapy.Request(item_details_url, self.parse_details, meta={'item': item})
yield scrapy.Request(item_details_url, self.parse_details, cb_kwargs={'item': item})
def parse_details(self, response):
item = response.meta['item']
def parse_details(self, response, item):
# populate more `item` fields
return item
Basically this is a simple spider which parses two pages of items (the
start_urls). Items also have a details page with additional information, so we
use the ``meta`` functionality of :class:`~scrapy.http.Request` to pass a
use the ``cb_kwargs`` functionality of :class:`~scrapy.http.Request` to pass a
partially populated item.
@ -100,8 +99,7 @@ Fortunately, the :command:`shell` is your bread and butter in this case (see
from scrapy.shell import inspect_response
def parse_details(self, response):
item = response.meta.get('item', None)
def parse_details(self, response, item=None):
if item:
# populate more `item` fields
return item
@ -134,8 +132,7 @@ Logging is another useful option for getting information about your spider run.
Although not as convenient, it comes with the advantage that the logs will be
available in all future runs should they be necessary again::
def parse_details(self, response):
item = response.meta.get('item', None)
def parse_details(self, response, item=None):
if item:
# populate more `item` fields
return item

View File

@ -81,7 +81,8 @@ So, for example, this won't work::
def some_callback(self, response):
somearg = 'test'
return scrapy.Request('http://www.example.com', callback=lambda r: self.other_callback(r, somearg))
return scrapy.Request('http://www.example.com',
callback=lambda r: self.other_callback(r, somearg))
def other_callback(self, response, somearg):
print("the argument passed is: %s" % somearg)
@ -90,10 +91,10 @@ But this will::
def some_callback(self, response):
somearg = 'test'
return scrapy.Request('http://www.example.com', callback=self.other_callback, meta={'somearg': somearg})
return scrapy.Request('http://www.example.com',
callback=self.other_callback, cb_kwargs={'somearg': somearg})
def other_callback(self, response):
somearg = response.meta['somearg']
def other_callback(self, response, somearg):
print("the argument passed is: %s" % somearg)
If you wish to log the requests that couldn't be serialized, you can set the

View File

@ -27,10 +27,11 @@ Common causes of memory leaks
It happens quite often (sometimes by accident, sometimes on purpose) that the
Scrapy developer passes objects referenced in Requests (for example, using the
:attr:`~scrapy.http.Request.meta` attribute or the request callback function)
and that effectively bounds the lifetime of those referenced objects to the
lifetime of the Request. This is, by far, the most common cause of memory leaks
in Scrapy projects, and a quite difficult one to debug for newcomers.
:attr:`~scrapy.http.Request.cb_kwargs` or :attr:`~scrapy.http.Request.meta`
attributes or the request callback function) and that effectively bounds the
lifetime of those referenced objects to the lifetime of the Request. This is,
by far, the most common cause of memory leaks in Scrapy projects, and a quite
difficult one to debug for newcomers.
In big projects, the spiders are typically written by different people and some
of those spiders could be "leaking" and thus affecting the rest of the other
@ -48,7 +49,8 @@ Too Many Requests?
By default Scrapy keeps the request queue in memory; it includes
:class:`~scrapy.http.Request` objects and all objects
referenced in Request attributes (e.g. in :attr:`~scrapy.http.Request.meta`).
referenced in Request attributes (e.g. in :attr:`~scrapy.http.Request.cb_kwargs`
and :attr:`~scrapy.http.Request.meta`).
While not necessarily a leak, this can take a lot of memory. Enabling
:ref:`persistent job queue <topics-jobs>` could help keeping memory usage
in control.
@ -101,7 +103,7 @@ Let's see a concrete example of a hypothetical case of memory leaks.
Suppose we have some spider with a line similar to this one::
return Request("http://www.somenastyspider.com/product.php?pid=%d" % product_id,
callback=self.parse, meta={referer: response})
callback=self.parse, cb_kwargs={'referer': response})
That line is passing a response reference inside a request which effectively
ties the response lifetime to the requests' one, and that would definitely

View File

@ -186,12 +186,12 @@ Request objects
Return a new Request which is a copy of this Request. See also:
:ref:`topics-request-response-ref-request-callback-arguments`.
.. method:: Request.replace([url, method, headers, body, cookies, meta, encoding, dont_filter, callback, errback])
.. method:: Request.replace([url, method, headers, body, cookies, meta, flags, encoding, priority, dont_filter, callback, errback, cb_kwargs])
Return a Request object with the same members, except for those members
given new values by whichever keyword arguments are specified. The
attribute :attr:`Request.meta` is copied by default (unless a new value
is given in the ``meta`` argument). See also
:attr:`Request.cb_kwargs` and :attr:`Request.meta` attributes are copied by default
(unless new values are given as arguments). See also
:ref:`topics-request-response-ref-request-callback-arguments`.
.. _topics-request-response-ref-request-callback-arguments:
@ -237,11 +237,10 @@ The following example shows how to achieve this by using the
.. caution:: :attr:`Request.cb_kwargs` was introduced in version ``1.7``.
Prior to that, :attr:`Request.meta` was the recommended option for passing
information around callbacks. However, after ``1.7`` :attr:`Request.cb_kwargs`
information around callbacks. However, after ``1.7``, using :attr:`Request.cb_kwargs`
became the preferred way of passing user information, leaving :attr:`Request.meta`
to be used by internal components like spider or downloader middlewares.
The following example, which uses :attr:`Request.meta`, is only kept for historical
reasons.
to be populated by internal components like spider or downloader middlewares.
The following :attr:`Request.meta` example is only kept for historical reasons.
::