2009-08-18 14:05:15 -03:00
|
|
|
.. _topics-signals:
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
=======
|
|
|
|
Signals
|
|
|
|
=======
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
Scrapy uses signals extensively to notify when certain events occur. You can
|
|
|
|
catch some of those signals in your Scrapy project (using an :ref:`extension
|
|
|
|
<topics-extensions>`, for example) to perform additional tasks or extend Scrapy
|
|
|
|
to add functionality not provided out of the box.
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
Even though signals provide several arguments, the handlers that catch them
|
|
|
|
don't need to accept all of them - the signal dispatching mechanism will only
|
|
|
|
deliver the arguments that the handler receives.
|
2009-01-03 09:14:52 +00:00
|
|
|
|
2012-08-28 18:31:03 -03:00
|
|
|
You can connect to signals (or send your own) through the
|
|
|
|
:ref:`topics-api-signals`.
|
2009-01-03 09:14:52 +00:00
|
|
|
|
2015-10-29 15:40:07 +05:30
|
|
|
Here is a simple example showing how you can catch signals and perform some action:
|
|
|
|
::
|
|
|
|
|
|
|
|
from scrapy import signals
|
|
|
|
from scrapy import Spider
|
|
|
|
|
|
|
|
|
|
|
|
class DmozSpider(Spider):
|
|
|
|
name = "dmoz"
|
|
|
|
allowed_domains = ["dmoz.org"]
|
|
|
|
start_urls = [
|
|
|
|
"http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
|
|
|
|
"http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/",
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
|
|
@classmethod
|
|
|
|
def from_crawler(cls, crawler, *args, **kwargs):
|
|
|
|
spider = super(DmozSpider, cls).from_crawler(crawler, *args, **kwargs)
|
|
|
|
crawler.signals.connect(spider.spider_closed, signal=signals.spider_closed)
|
|
|
|
return spider
|
|
|
|
|
|
|
|
|
|
|
|
def spider_closed(self, spider):
|
|
|
|
spider.logger.info('Spider closed: %s', spider.name)
|
|
|
|
|
|
|
|
|
|
|
|
def parse(self, response):
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
Deferred signal handlers
|
|
|
|
========================
|
|
|
|
|
2019-11-18 15:50:45 +01:00
|
|
|
Some signals support returning :class:`~twisted.internet.defer.Deferred`
|
|
|
|
objects from their handlers, see the :ref:`topics-signals-ref` below to know
|
|
|
|
which ones.
|
2010-08-14 21:10:37 -03:00
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. _topics-signals-ref:
|
|
|
|
|
|
|
|
Built-in signals reference
|
|
|
|
==========================
|
|
|
|
|
2010-08-10 17:47:04 -03:00
|
|
|
.. module:: scrapy.signals
|
2009-08-21 21:49:54 -03:00
|
|
|
:synopsis: Signals definitions
|
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
Here's the list of Scrapy built-in signals and their meaning.
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-08-21 15:07:52 -03:00
|
|
|
engine_started
|
|
|
|
--------------
|
|
|
|
|
2009-01-02 19:48:31 +00:00
|
|
|
.. signal:: engine_started
|
2009-01-03 09:14:52 +00:00
|
|
|
.. function:: engine_started()
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2011-08-07 03:57:09 -03:00
|
|
|
Sent when the Scrapy engine has started crawling.
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
This signal supports returning deferreds from their handlers.
|
|
|
|
|
2011-08-07 03:57:09 -03:00
|
|
|
.. note:: This signal may be fired *after* the :signal:`spider_opened` signal,
|
|
|
|
depending on how the spider was started. So **don't** rely on this signal
|
|
|
|
getting fired before :signal:`spider_opened`.
|
|
|
|
|
2009-08-21 15:07:52 -03:00
|
|
|
engine_stopped
|
|
|
|
--------------
|
|
|
|
|
2009-01-02 19:48:31 +00:00
|
|
|
.. signal:: engine_stopped
|
2009-01-03 09:14:52 +00:00
|
|
|
.. function:: engine_stopped()
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-06-21 22:00:16 -03:00
|
|
|
Sent when the Scrapy engine is stopped (for example, when a crawling
|
|
|
|
process has finished).
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
This signal supports returning deferreds from their handlers.
|
|
|
|
|
2009-08-21 15:07:52 -03:00
|
|
|
item_scraped
|
|
|
|
------------
|
|
|
|
|
2009-06-21 22:00:16 -03:00
|
|
|
.. signal:: item_scraped
|
2014-04-25 16:03:19 -03:00
|
|
|
.. function:: item_scraped(item, response, spider)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2011-06-03 01:13:00 -03:00
|
|
|
Sent when an item has been scraped, after it has passed all the
|
|
|
|
:ref:`topics-item-pipeline` stages (without being dropped).
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
This signal supports returning deferreds from their handlers.
|
|
|
|
|
2011-06-03 01:13:00 -03:00
|
|
|
:param item: the item scraped
|
2015-03-19 05:16:14 +05:00
|
|
|
:type item: dict or :class:`~scrapy.item.Item` object
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-08-21 15:05:06 -03:00
|
|
|
:param spider: the spider which scraped the item
|
2015-05-09 04:20:09 -03:00
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2014-03-18 15:23:25 +08:00
|
|
|
:param response: the response from where the item was scraped
|
|
|
|
:type response: :class:`~scrapy.http.Response` object
|
|
|
|
|
2009-08-21 15:07:52 -03:00
|
|
|
item_dropped
|
|
|
|
------------
|
|
|
|
|
2009-06-21 22:00:16 -03:00
|
|
|
.. signal:: item_dropped
|
2014-05-02 14:42:01 +01:00
|
|
|
.. function:: item_dropped(item, response, exception, spider)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-07-09 12:57:10 -03:00
|
|
|
Sent after an item has been dropped from the :ref:`topics-item-pipeline`
|
2010-08-10 17:36:48 -03:00
|
|
|
when some stage raised a :exc:`~scrapy.exceptions.DropItem` exception.
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
This signal supports returning deferreds from their handlers.
|
|
|
|
|
2009-06-21 22:00:16 -03:00
|
|
|
:param item: the item dropped from the :ref:`topics-item-pipeline`
|
2015-03-19 05:16:14 +05:00
|
|
|
:type item: dict or :class:`~scrapy.item.Item` object
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-08-21 15:05:06 -03:00
|
|
|
:param spider: the spider which scraped the item
|
2015-05-09 04:20:09 -03:00
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2014-05-02 14:42:01 +01:00
|
|
|
:param response: the response from where the item was dropped
|
|
|
|
:type response: :class:`~scrapy.http.Response` object
|
|
|
|
|
2009-08-21 15:05:06 -03:00
|
|
|
:param exception: the exception (which must be a
|
2010-08-10 17:36:48 -03:00
|
|
|
:exc:`~scrapy.exceptions.DropItem` subclass) which caused the item
|
2009-08-21 15:05:06 -03:00
|
|
|
to be dropped
|
2010-08-10 17:36:48 -03:00
|
|
|
:type exception: :exc:`~scrapy.exceptions.DropItem` exception
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2018-07-04 03:00:59 +08:00
|
|
|
item_error
|
|
|
|
------------
|
|
|
|
|
|
|
|
.. signal:: item_error
|
|
|
|
.. function:: item_error(item, response, spider, failure)
|
|
|
|
|
|
|
|
Sent when a :ref:`topics-item-pipeline` generates an error (ie. raises
|
|
|
|
an exception), except :exc:`~scrapy.exceptions.DropItem` exception.
|
|
|
|
|
|
|
|
This signal supports returning deferreds from their handlers.
|
|
|
|
|
|
|
|
:param item: the item dropped from the :ref:`topics-item-pipeline`
|
|
|
|
:type item: dict or :class:`~scrapy.item.Item` object
|
|
|
|
|
|
|
|
:param response: the response being processed when the exception was raised
|
|
|
|
:type response: :class:`~scrapy.http.Response` object
|
|
|
|
|
|
|
|
:param spider: the spider which raised the exception
|
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
|
|
|
|
2019-11-18 15:50:45 +01:00
|
|
|
:param failure: the exception raised
|
|
|
|
:type failure: twisted.python.failure.Failure
|
2018-07-04 03:00:59 +08:00
|
|
|
|
2009-11-30 11:29:19 -02:00
|
|
|
spider_closed
|
|
|
|
-------------
|
|
|
|
|
|
|
|
.. signal:: spider_closed
|
|
|
|
.. function:: spider_closed(spider, reason)
|
|
|
|
|
|
|
|
Sent after a spider has been closed. This can be used to release per-spider
|
|
|
|
resources reserved on :signal:`spider_opened`.
|
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
This signal supports returning deferreds from their handlers.
|
|
|
|
|
2009-11-30 11:29:19 -02:00
|
|
|
:param spider: the spider which has been closed
|
2015-05-09 04:20:09 -03:00
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
2009-11-30 11:29:19 -02:00
|
|
|
|
|
|
|
:param reason: a string which describes the reason why the spider was closed. If
|
2010-08-21 01:26:35 -03:00
|
|
|
it was closed because the spider has completed scraping, the reason
|
2009-11-30 11:29:19 -02:00
|
|
|
is ``'finished'``. Otherwise, if the spider was manually closed by
|
|
|
|
calling the ``close_spider`` engine method, then the reason is the one
|
|
|
|
passed in the ``reason`` argument of that method (which defaults to
|
|
|
|
``'cancelled'``). If the engine was shutdown (for example, by hitting
|
|
|
|
Ctrl-C to stop it) the reason will be ``'shutdown'``.
|
|
|
|
:type reason: str
|
|
|
|
|
|
|
|
spider_opened
|
|
|
|
-------------
|
|
|
|
|
|
|
|
.. signal:: spider_opened
|
|
|
|
.. function:: spider_opened(spider)
|
|
|
|
|
|
|
|
Sent after a spider has been opened for crawling. This is typically used to
|
|
|
|
reserve per-spider resources, but can be used for any task that needs to be
|
|
|
|
performed when a spider is opened.
|
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
This signal supports returning deferreds from their handlers.
|
|
|
|
|
2009-11-30 11:29:19 -02:00
|
|
|
:param spider: the spider which has been opened
|
2015-05-09 04:20:09 -03:00
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
2009-11-30 11:29:19 -02:00
|
|
|
|
|
|
|
spider_idle
|
|
|
|
-----------
|
|
|
|
|
|
|
|
.. signal:: spider_idle
|
|
|
|
.. function:: spider_idle(spider)
|
|
|
|
|
|
|
|
Sent when a spider has gone idle, which means the spider has no further:
|
|
|
|
|
|
|
|
* requests waiting to be downloaded
|
|
|
|
* requests scheduled
|
|
|
|
* items being processed in the item pipeline
|
|
|
|
|
|
|
|
If the idle state persists after all handlers of this signal have finished,
|
|
|
|
the engine starts closing the spider. After the spider has finished
|
|
|
|
closing, the :signal:`spider_closed` signal is sent.
|
|
|
|
|
2017-06-15 10:41:02 +08:00
|
|
|
You may raise a :exc:`~scrapy.exceptions.DontCloseSpider` exception to
|
|
|
|
prevent the spider from being closed.
|
2009-11-30 11:29:19 -02:00
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
This signal does not support returning deferreds from their handlers.
|
|
|
|
|
2009-11-30 11:29:19 -02:00
|
|
|
:param spider: the spider which has gone idle
|
2015-05-09 04:20:09 -03:00
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
2009-11-30 11:29:19 -02:00
|
|
|
|
2017-06-15 10:41:02 +08:00
|
|
|
.. note:: Scheduling some requests in your :signal:`spider_idle` handler does
|
|
|
|
**not** guarantee that it can prevent the spider from being closed,
|
|
|
|
although it sometimes can. That's because the spider may still remain idle
|
|
|
|
if all the scheduled requests are rejected by the scheduler (e.g. filtered
|
|
|
|
due to duplication).
|
|
|
|
|
2011-05-29 22:38:17 -03:00
|
|
|
spider_error
|
|
|
|
------------
|
|
|
|
|
|
|
|
.. signal:: spider_error
|
|
|
|
.. function:: spider_error(failure, response, spider)
|
|
|
|
|
|
|
|
Sent when a spider callback generates an error (ie. raises an exception).
|
|
|
|
|
2015-06-09 02:20:10 +05:00
|
|
|
This signal does not support returning deferreds from their handlers.
|
|
|
|
|
2019-11-18 15:50:45 +01:00
|
|
|
:param failure: the exception raised
|
|
|
|
:type failure: twisted.python.failure.Failure
|
2011-05-29 22:38:17 -03:00
|
|
|
|
|
|
|
:param response: the response being processed when the exception was raised
|
|
|
|
:type response: :class:`~scrapy.http.Response` object
|
|
|
|
|
|
|
|
:param spider: the spider which raised the exception
|
2015-05-09 04:20:09 -03:00
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
2011-05-29 22:38:17 -03:00
|
|
|
|
2014-06-14 11:08:18 +03:00
|
|
|
request_scheduled
|
|
|
|
-----------------
|
|
|
|
|
|
|
|
.. signal:: request_scheduled
|
|
|
|
.. function:: request_scheduled(request, spider)
|
|
|
|
|
2014-06-14 20:51:16 -03:00
|
|
|
Sent when the engine schedules a :class:`~scrapy.http.Request`, to be
|
|
|
|
downloaded later.
|
2014-06-14 11:08:18 +03:00
|
|
|
|
|
|
|
The signal does not support returning deferreds from their handlers.
|
|
|
|
|
|
|
|
:param request: the request that reached the scheduler
|
|
|
|
:type request: :class:`~scrapy.http.Request` object
|
|
|
|
|
|
|
|
:param spider: the spider that yielded the request
|
2015-05-09 04:20:09 -03:00
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
2011-05-29 22:38:17 -03:00
|
|
|
|
2014-11-27 15:10:15 +03:00
|
|
|
request_dropped
|
2016-03-05 19:54:06 -03:00
|
|
|
---------------
|
2014-11-27 15:10:15 +03:00
|
|
|
|
|
|
|
.. signal:: request_dropped
|
|
|
|
.. function:: request_dropped(request, spider)
|
|
|
|
|
|
|
|
Sent when a :class:`~scrapy.http.Request`, scheduled by the engine to be
|
|
|
|
downloaded later, is rejected by the scheduler.
|
|
|
|
|
|
|
|
The signal does not support returning deferreds from their handlers.
|
|
|
|
|
|
|
|
:param request: the request that reached the scheduler
|
|
|
|
:type request: :class:`~scrapy.http.Request` object
|
|
|
|
|
|
|
|
:param spider: the spider that yielded the request
|
2015-05-09 04:20:09 -03:00
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
2014-11-27 15:10:15 +03:00
|
|
|
|
2018-08-17 14:39:42 +00:00
|
|
|
request_reached_downloader
|
|
|
|
---------------------------
|
|
|
|
|
|
|
|
.. signal:: request_reached_downloader
|
|
|
|
.. function:: request_reached_downloader(request, spider)
|
|
|
|
|
2018-08-29 11:21:55 +00:00
|
|
|
Sent when a :class:`~scrapy.http.Request` reached downloader.
|
2018-08-17 14:39:42 +00:00
|
|
|
|
|
|
|
The signal does not support returning deferreds from their handlers.
|
|
|
|
|
|
|
|
:param request: the request that reached downloader
|
|
|
|
:type request: :class:`~scrapy.http.Request` object
|
|
|
|
|
|
|
|
:param spider: the spider that yielded the request
|
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
|
|
|
|
2009-08-21 15:07:52 -03:00
|
|
|
response_received
|
|
|
|
-----------------
|
|
|
|
|
2009-06-21 22:00:16 -03:00
|
|
|
.. signal:: response_received
|
2010-09-06 23:23:14 -03:00
|
|
|
.. function:: response_received(response, request, spider)
|
2009-06-21 22:00:16 -03:00
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
Sent when the engine receives a new :class:`~scrapy.http.Response` from the
|
|
|
|
downloader.
|
|
|
|
|
|
|
|
This signal does not support returning deferreds from their handlers.
|
|
|
|
|
2009-06-21 22:00:16 -03:00
|
|
|
:param response: the response received
|
|
|
|
:type response: :class:`~scrapy.http.Response` object
|
|
|
|
|
2010-09-06 23:23:14 -03:00
|
|
|
:param request: the request that generated the response
|
|
|
|
:type request: :class:`~scrapy.http.Request` object
|
|
|
|
|
2009-06-21 22:00:16 -03:00
|
|
|
:param spider: the spider for which the response is intended
|
2015-05-09 04:20:09 -03:00
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
2009-06-21 22:00:16 -03:00
|
|
|
|
2009-08-21 15:07:52 -03:00
|
|
|
response_downloaded
|
|
|
|
-------------------
|
|
|
|
|
2009-06-21 22:00:16 -03:00
|
|
|
.. signal:: response_downloaded
|
2010-09-06 23:23:14 -03:00
|
|
|
.. function:: response_downloaded(response, request, spider)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-06-21 22:00:16 -03:00
|
|
|
Sent by the downloader right after a ``HTTPResponse`` is downloaded.
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2010-08-14 21:10:37 -03:00
|
|
|
This signal does not support returning deferreds from their handlers.
|
|
|
|
|
2009-06-21 22:00:16 -03:00
|
|
|
:param response: the response downloaded
|
|
|
|
:type response: :class:`~scrapy.http.Response` object
|
2009-01-03 09:14:52 +00:00
|
|
|
|
2010-09-06 23:23:14 -03:00
|
|
|
:param request: the request that generated the response
|
|
|
|
:type request: :class:`~scrapy.http.Request` object
|
|
|
|
|
2009-06-21 22:00:16 -03:00
|
|
|
:param spider: the spider for which the response is intended
|
2015-05-09 04:20:09 -03:00
|
|
|
:type spider: :class:`~scrapy.spiders.Spider` object
|