2009-01-02 19:48:31 +00:00
|
|
|
.. _signals:
|
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
.. module:: scrapy.core.signals
|
|
|
|
:synopsis: Signals definitions
|
|
|
|
|
2009-01-02 19:48:31 +00:00
|
|
|
Available Signals
|
|
|
|
=================
|
|
|
|
|
|
|
|
Scrapy uses signals extensively to notify when certain actions occur. You can
|
|
|
|
catch some of those signals in your Scrapy project or extension to perform
|
|
|
|
additional tasks or extend Scrapy to add functionality not provided out of the
|
|
|
|
box.
|
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
Even though signals provide several arguments, the handlers which catch them
|
|
|
|
don't have to receive all of them.
|
|
|
|
|
|
|
|
For more information about working when see the documentation of
|
|
|
|
`pydispatcher`_ (library used to implement signals).
|
|
|
|
|
|
|
|
.. _pydispatcher: http://pydispatcher.sourceforge.net/
|
|
|
|
|
2009-01-02 19:48:31 +00:00
|
|
|
Here's a list of signals used in Scrapy and their meaning, in alphabetical
|
|
|
|
order.
|
|
|
|
|
|
|
|
.. signal:: domain_closed
|
2009-01-29 20:18:53 +00:00
|
|
|
.. function:: domain_closed(domain, spider, status)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
Sent right after a spider/domain has been closed.
|
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
``domain`` is a string which contains the domain of the spider which has been closed
|
|
|
|
``spider`` is the spider which has been closed
|
2009-01-29 20:18:53 +00:00
|
|
|
``status`` is a string which can have two values: ``'finished'`` if the domain
|
|
|
|
has finished successfully, or ``'cancelled'`` if the domain was cancelled (for
|
|
|
|
example, by hitting Ctrl-C, by calling the engine ``stop()`` method or by
|
|
|
|
explicitly closing the domain).
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
.. signal:: domain_open
|
|
|
|
.. function:: domain_open(domain, spider)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
Sent right before a spider has been opened for crawling.
|
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
``domain`` is a string which contains the domain of the spider which is about
|
|
|
|
to be opened
|
|
|
|
``spider`` is the spider which is about to be opened
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
.. signal:: domain_opened
|
|
|
|
.. function:: domain_opened(domain, spider)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
Sent right after a spider has been opened for crawling.
|
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
``domain`` is a string with the domain of the spider which has been opened
|
|
|
|
``spider`` is the spider which has been opened
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
.. signal:: domain_idle
|
|
|
|
.. function:: domain_idle(domain, spider)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
Sent when a domain has no further:
|
|
|
|
* requests waiting to be downloaded
|
|
|
|
* requests scheduled
|
|
|
|
* items being processed in the item pipeline
|
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
``domain`` is a string with the domain of the spider which has gone idle
|
|
|
|
``spider`` is the spider which has gone idle
|
|
|
|
|
2009-01-02 19:48:31 +00:00
|
|
|
If any handler of this signals raises a :exception:`DontCloseDomain` the domain
|
|
|
|
won't be closed at this time and will wait until another idle signal is sent.
|
|
|
|
Otherwise (if no handler raises :exception:`DontCloseDomain`) the domain will
|
|
|
|
be closed immediately after all handlers of ``domain_idle`` have finished, and
|
|
|
|
a :signal:`domain_closed` will thus be sent.
|
|
|
|
|
|
|
|
.. signal:: engine_started
|
2009-01-03 09:14:52 +00:00
|
|
|
.. function:: engine_started()
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
Sent when the Scrapy engine is started (for example, when a crawling
|
|
|
|
process has started).
|
|
|
|
|
|
|
|
.. signal:: engine_stopped
|
2009-01-03 09:14:52 +00:00
|
|
|
.. function:: engine_stopped()
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
Sent when the Scrapy engine is stopped (for example, when a crawling
|
2009-03-22 16:22:57 +00:00
|
|
|
process has finished).
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
.. signal:: request_received
|
2009-01-03 09:14:52 +00:00
|
|
|
.. function:: request_received(request, spider, response)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
Sent when the engine receives a :class:`~scrapy.http.Request` from a spider.
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
``request`` is the :class:`~scrapy.http.Request` received
|
|
|
|
``spider`` is the spider which generated the request
|
|
|
|
``response`` is the :class:`~scrapy.http.Response` fed to the spider which
|
|
|
|
generated the request
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
.. signal:: request_uploaded
|
2009-01-03 09:14:52 +00:00
|
|
|
.. function:: request_uploaded(request, spider)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
Sent right after the download has sent a :class:`~scrapy.http.Request`.
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
``request`` is the :class:`~scrapy.http.Request` uploaded/sent
|
|
|
|
``spider`` is the spider which generated the request
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
.. signal:: response_received
|
2009-01-03 09:14:52 +00:00
|
|
|
.. function:: response_received(response, spider)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
``response`` is the :class:`~scrapy.http.Response` received
|
|
|
|
``spider`` is the spider for which the response is intended
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
Sent when the engine receives a new :class:`~scrapy.http.Response` from the
|
|
|
|
downloader.
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
.. signal:: response_downloaded
|
2009-01-03 09:14:52 +00:00
|
|
|
.. function:: response_downloaded(response, spider)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
Sent by the downloader right after a ``HTTPResponse`` is downloaded.
|
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
``response`` is the ``HTTPResponse`` downloaded
|
|
|
|
``spider`` is the spider for which the response is intended
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
.. signal:: item_scraped
|
|
|
|
.. function:: item_scraped(item, spider, response)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
|
|
|
Sent when the engine receives a new scraped item from the spider, and right
|
2009-01-03 09:14:52 +00:00
|
|
|
before the item is sent to the :ref:`topics-item-pipeline`.
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
``item`` is the item scraped
|
|
|
|
``spider`` is the spider which scraped the item
|
|
|
|
``response`` is the :class:`~scrapy.http.Response` from which the item was
|
|
|
|
scraped
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
.. signal:: item_passed
|
|
|
|
.. function:: item_passed(item, spider, response, pipe_output)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
Sent after an item has passed al the :ref:`topics-item-pipeline` stages without
|
2009-01-02 19:48:31 +00:00
|
|
|
being dropped.
|
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
``item`` is the item which passed the pipeline
|
|
|
|
``spider`` is the spider which scraped the item
|
|
|
|
``response`` is the :class:`~scrapy.http.Response` from which the item was scraped
|
|
|
|
``pipe_output`` is the output of the item pipeline. Typically, this points to
|
|
|
|
the same ``item`` object, unless some pipeline stage created a new item.
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
.. signal:: item_dropped
|
|
|
|
.. function:: item_dropped(item, spider, response, exception)
|
2009-01-02 19:48:31 +00:00
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
Sent after an item has dropped from the :ref:`topics-item-pipeline` when some stage
|
2009-01-02 19:48:31 +00:00
|
|
|
raised a :exception:`DropItem` exception.
|
|
|
|
|
2009-01-03 09:14:52 +00:00
|
|
|
``item`` is the item dropped from the :ref:`topics-item-pipeline`
|
|
|
|
``spider`` is the spider which scraped the item
|
|
|
|
``response`` is the :class:`~scrapy.http.Response` from which the item was scraped
|
|
|
|
``exception`` is the (:exception:`DropItem` child) exception that caused the
|
|
|
|
item to be dropped
|
|
|
|
|
2009-01-02 19:48:31 +00:00
|
|
|
|