mirror of
https://github.com/scrapy/scrapy.git
synced 2025-03-14 15:48:38 +00:00
async/deferred signal handlers (#4390)
* [docs] async/deferred signal handlers * [docs] update deferred signals example * [docs] add subsections for built-in signals * docs(signals): update signal handler example * docs(signals): update signal handler example
This commit is contained in:
parent
bfeb2c8c13
commit
e4750f2fbd
@ -16,8 +16,7 @@ deliver the arguments that the handler receives.
|
||||
You can connect to signals (or send your own) through the
|
||||
:ref:`topics-api-signals`.
|
||||
|
||||
Here is a simple example showing how you can catch signals and perform some action:
|
||||
::
|
||||
Here is a simple example showing how you can catch signals and perform some action::
|
||||
|
||||
from scrapy import signals
|
||||
from scrapy import Spider
|
||||
@ -52,9 +51,45 @@ Deferred signal handlers
|
||||
========================
|
||||
|
||||
Some signals support returning :class:`~twisted.internet.defer.Deferred`
|
||||
objects from their handlers, see the :ref:`topics-signals-ref` below to know
|
||||
which ones.
|
||||
objects from their handlers, allowing you to run asynchronous code that
|
||||
does not block Scrapy. If a signal handler returns a
|
||||
:class:`~twisted.internet.defer.Deferred`, Scrapy waits for that
|
||||
:class:`~twisted.internet.defer.Deferred` to fire.
|
||||
|
||||
Let's take an example::
|
||||
|
||||
class SignalSpider(scrapy.Spider):
|
||||
name = 'signals'
|
||||
start_urls = ['http://quotes.toscrape.com/page/1/']
|
||||
|
||||
@classmethod
|
||||
def from_crawler(cls, crawler, *args, **kwargs):
|
||||
spider = super(SignalSpider, cls).from_crawler(crawler, *args, **kwargs)
|
||||
crawler.signals.connect(spider.item_scraped, signal=signals.item_scraped)
|
||||
return spider
|
||||
|
||||
def item_scraped(self, item):
|
||||
# Send the scraped item to the server
|
||||
d = treq.post(
|
||||
'http://example.com/post',
|
||||
json.dumps(item).encode('ascii'),
|
||||
headers={b'Content-Type': [b'application/json']}
|
||||
)
|
||||
|
||||
# The next item will be scraped only after
|
||||
# deferred (d) is fired
|
||||
return d
|
||||
|
||||
def parse(self, response):
|
||||
for quote in response.css('div.quote'):
|
||||
yield {
|
||||
'text': quote.css('span.text::text').get(),
|
||||
'author': quote.css('small.author::text').get(),
|
||||
'tags': quote.css('div.tags a.tag::text').getall(),
|
||||
}
|
||||
|
||||
See the :ref:`topics-signals-ref` below to know which signals support
|
||||
:class:`~twisted.internet.defer.Deferred`.
|
||||
|
||||
.. _topics-signals-ref:
|
||||
|
||||
@ -66,9 +101,12 @@ Built-in signals reference
|
||||
|
||||
Here's the list of Scrapy built-in signals and their meaning.
|
||||
|
||||
engine_started
|
||||
Engine signals
|
||||
--------------
|
||||
|
||||
engine_started
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. signal:: engine_started
|
||||
.. function:: engine_started()
|
||||
|
||||
@ -81,7 +119,7 @@ engine_started
|
||||
getting fired before :signal:`spider_opened`.
|
||||
|
||||
engine_stopped
|
||||
--------------
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
.. signal:: engine_stopped
|
||||
.. function:: engine_stopped()
|
||||
@ -91,9 +129,20 @@ engine_stopped
|
||||
|
||||
This signal supports returning deferreds from their handlers.
|
||||
|
||||
item_scraped
|
||||
Item signals
|
||||
------------
|
||||
|
||||
.. note::
|
||||
As at max :setting:`CONCURRENT_ITEMS` items are processed in
|
||||
parallel, many deferreds are fired together using
|
||||
:class:`~twisted.internet.defer.DeferredList`. Hence the next
|
||||
batch waits for the :class:`~twisted.internet.defer.DeferredList`
|
||||
to fire and then runs the respective item signal handler for
|
||||
the next batch of scraped items.
|
||||
|
||||
item_scraped
|
||||
~~~~~~~~~~~~
|
||||
|
||||
.. signal:: item_scraped
|
||||
.. function:: item_scraped(item, response, spider)
|
||||
|
||||
@ -112,7 +161,7 @@ item_scraped
|
||||
:type response: :class:`~scrapy.http.Response` object
|
||||
|
||||
item_dropped
|
||||
------------
|
||||
~~~~~~~~~~~~
|
||||
|
||||
.. signal:: item_dropped
|
||||
.. function:: item_dropped(item, response, exception, spider)
|
||||
@ -137,7 +186,7 @@ item_dropped
|
||||
:type exception: :exc:`~scrapy.exceptions.DropItem` exception
|
||||
|
||||
item_error
|
||||
------------
|
||||
~~~~~~~~~~
|
||||
|
||||
.. signal:: item_error
|
||||
.. function:: item_error(item, response, spider, failure)
|
||||
@ -159,8 +208,11 @@ item_error
|
||||
:param failure: the exception raised
|
||||
:type failure: twisted.python.failure.Failure
|
||||
|
||||
Spider signals
|
||||
--------------
|
||||
|
||||
spider_closed
|
||||
-------------
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
.. signal:: spider_closed
|
||||
.. function:: spider_closed(spider, reason)
|
||||
@ -183,7 +235,7 @@ spider_closed
|
||||
:type reason: str
|
||||
|
||||
spider_opened
|
||||
-------------
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
.. signal:: spider_opened
|
||||
.. function:: spider_opened(spider)
|
||||
@ -198,7 +250,7 @@ spider_opened
|
||||
:type spider: :class:`~scrapy.spiders.Spider` object
|
||||
|
||||
spider_idle
|
||||
-----------
|
||||
~~~~~~~~~~~
|
||||
|
||||
.. signal:: spider_idle
|
||||
.. function:: spider_idle(spider)
|
||||
@ -228,7 +280,7 @@ spider_idle
|
||||
due to duplication).
|
||||
|
||||
spider_error
|
||||
------------
|
||||
~~~~~~~~~~~~
|
||||
|
||||
.. signal:: spider_error
|
||||
.. function:: spider_error(failure, response, spider)
|
||||
@ -246,8 +298,11 @@ spider_error
|
||||
:param spider: the spider which raised the exception
|
||||
:type spider: :class:`~scrapy.spiders.Spider` object
|
||||
|
||||
Request signals
|
||||
---------------
|
||||
|
||||
request_scheduled
|
||||
-----------------
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. signal:: request_scheduled
|
||||
.. function:: request_scheduled(request, spider)
|
||||
@ -264,7 +319,7 @@ request_scheduled
|
||||
:type spider: :class:`~scrapy.spiders.Spider` object
|
||||
|
||||
request_dropped
|
||||
---------------
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
.. signal:: request_dropped
|
||||
.. function:: request_dropped(request, spider)
|
||||
@ -281,7 +336,7 @@ request_dropped
|
||||
:type spider: :class:`~scrapy.spiders.Spider` object
|
||||
|
||||
request_reached_downloader
|
||||
---------------------------
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. signal:: request_reached_downloader
|
||||
.. function:: request_reached_downloader(request, spider)
|
||||
@ -297,7 +352,7 @@ request_reached_downloader
|
||||
:type spider: :class:`~scrapy.spiders.Spider` object
|
||||
|
||||
request_left_downloader
|
||||
-----------------------
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. signal:: request_left_downloader
|
||||
.. function:: request_left_downloader(request, spider)
|
||||
@ -315,8 +370,11 @@ request_left_downloader
|
||||
:param spider: the spider that yielded the request
|
||||
:type spider: :class:`~scrapy.spiders.Spider` object
|
||||
|
||||
Response signals
|
||||
----------------
|
||||
|
||||
response_received
|
||||
-----------------
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. signal:: response_received
|
||||
.. function:: response_received(response, request, spider)
|
||||
@ -336,7 +394,7 @@ response_received
|
||||
:type spider: :class:`~scrapy.spiders.Spider` object
|
||||
|
||||
response_downloaded
|
||||
-------------------
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. signal:: response_downloaded
|
||||
.. function:: response_downloaded(response, request, spider)
|
||||
|
Loading…
x
Reference in New Issue
Block a user