1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-03-14 15:48:38 +00:00

async/deferred signal handlers (#4390)

* [docs] async/deferred signal handlers

* [docs] update deferred signals example

* [docs] add subsections for built-in signals

* docs(signals): update signal handler example

* docs(signals): update signal handler example
This commit is contained in:
Aditya Kumar 2020-04-20 21:17:57 +05:30 committed by GitHub
parent bfeb2c8c13
commit e4750f2fbd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -16,8 +16,7 @@ deliver the arguments that the handler receives.
You can connect to signals (or send your own) through the
:ref:`topics-api-signals`.
Here is a simple example showing how you can catch signals and perform some action:
::
Here is a simple example showing how you can catch signals and perform some action::
from scrapy import signals
from scrapy import Spider
@ -52,9 +51,45 @@ Deferred signal handlers
========================
Some signals support returning :class:`~twisted.internet.defer.Deferred`
objects from their handlers, see the :ref:`topics-signals-ref` below to know
which ones.
objects from their handlers, allowing you to run asynchronous code that
does not block Scrapy. If a signal handler returns a
:class:`~twisted.internet.defer.Deferred`, Scrapy waits for that
:class:`~twisted.internet.defer.Deferred` to fire.
Let's take an example::
class SignalSpider(scrapy.Spider):
name = 'signals'
start_urls = ['http://quotes.toscrape.com/page/1/']
@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
spider = super(SignalSpider, cls).from_crawler(crawler, *args, **kwargs)
crawler.signals.connect(spider.item_scraped, signal=signals.item_scraped)
return spider
def item_scraped(self, item):
# Send the scraped item to the server
d = treq.post(
'http://example.com/post',
json.dumps(item).encode('ascii'),
headers={b'Content-Type': [b'application/json']}
)
# The next item will be scraped only after
# deferred (d) is fired
return d
def parse(self, response):
for quote in response.css('div.quote'):
yield {
'text': quote.css('span.text::text').get(),
'author': quote.css('small.author::text').get(),
'tags': quote.css('div.tags a.tag::text').getall(),
}
See the :ref:`topics-signals-ref` below to know which signals support
:class:`~twisted.internet.defer.Deferred`.
.. _topics-signals-ref:
@ -66,9 +101,12 @@ Built-in signals reference
Here's the list of Scrapy built-in signals and their meaning.
engine_started
Engine signals
--------------
engine_started
~~~~~~~~~~~~~~
.. signal:: engine_started
.. function:: engine_started()
@ -81,7 +119,7 @@ engine_started
getting fired before :signal:`spider_opened`.
engine_stopped
--------------
~~~~~~~~~~~~~~
.. signal:: engine_stopped
.. function:: engine_stopped()
@ -91,9 +129,20 @@ engine_stopped
This signal supports returning deferreds from their handlers.
item_scraped
Item signals
------------
.. note::
As at max :setting:`CONCURRENT_ITEMS` items are processed in
parallel, many deferreds are fired together using
:class:`~twisted.internet.defer.DeferredList`. Hence the next
batch waits for the :class:`~twisted.internet.defer.DeferredList`
to fire and then runs the respective item signal handler for
the next batch of scraped items.
item_scraped
~~~~~~~~~~~~
.. signal:: item_scraped
.. function:: item_scraped(item, response, spider)
@ -112,7 +161,7 @@ item_scraped
:type response: :class:`~scrapy.http.Response` object
item_dropped
------------
~~~~~~~~~~~~
.. signal:: item_dropped
.. function:: item_dropped(item, response, exception, spider)
@ -137,7 +186,7 @@ item_dropped
:type exception: :exc:`~scrapy.exceptions.DropItem` exception
item_error
------------
~~~~~~~~~~
.. signal:: item_error
.. function:: item_error(item, response, spider, failure)
@ -159,8 +208,11 @@ item_error
:param failure: the exception raised
:type failure: twisted.python.failure.Failure
Spider signals
--------------
spider_closed
-------------
~~~~~~~~~~~~~
.. signal:: spider_closed
.. function:: spider_closed(spider, reason)
@ -183,7 +235,7 @@ spider_closed
:type reason: str
spider_opened
-------------
~~~~~~~~~~~~~
.. signal:: spider_opened
.. function:: spider_opened(spider)
@ -198,7 +250,7 @@ spider_opened
:type spider: :class:`~scrapy.spiders.Spider` object
spider_idle
-----------
~~~~~~~~~~~
.. signal:: spider_idle
.. function:: spider_idle(spider)
@ -228,7 +280,7 @@ spider_idle
due to duplication).
spider_error
------------
~~~~~~~~~~~~
.. signal:: spider_error
.. function:: spider_error(failure, response, spider)
@ -246,8 +298,11 @@ spider_error
:param spider: the spider which raised the exception
:type spider: :class:`~scrapy.spiders.Spider` object
Request signals
---------------
request_scheduled
-----------------
~~~~~~~~~~~~~~~~~
.. signal:: request_scheduled
.. function:: request_scheduled(request, spider)
@ -264,7 +319,7 @@ request_scheduled
:type spider: :class:`~scrapy.spiders.Spider` object
request_dropped
---------------
~~~~~~~~~~~~~~~
.. signal:: request_dropped
.. function:: request_dropped(request, spider)
@ -281,7 +336,7 @@ request_dropped
:type spider: :class:`~scrapy.spiders.Spider` object
request_reached_downloader
---------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. signal:: request_reached_downloader
.. function:: request_reached_downloader(request, spider)
@ -297,7 +352,7 @@ request_reached_downloader
:type spider: :class:`~scrapy.spiders.Spider` object
request_left_downloader
-----------------------
~~~~~~~~~~~~~~~~~~~~~~~
.. signal:: request_left_downloader
.. function:: request_left_downloader(request, spider)
@ -315,8 +370,11 @@ request_left_downloader
:param spider: the spider that yielded the request
:type spider: :class:`~scrapy.spiders.Spider` object
Response signals
----------------
response_received
-----------------
~~~~~~~~~~~~~~~~~
.. signal:: response_received
.. function:: response_received(response, request, spider)
@ -336,7 +394,7 @@ response_received
:type spider: :class:`~scrapy.spiders.Spider` object
response_downloaded
-------------------
~~~~~~~~~~~~~~~~~~~
.. signal:: response_downloaded
.. function:: response_downloaded(response, request, spider)