2009-01-11 06:34:38 +00:00
|
|
|
.. _topics-extensions:
|
|
|
|
|
|
|
|
==========
|
|
|
|
Extensions
|
|
|
|
==========
|
|
|
|
|
2010-04-18 23:39:54 -03:00
|
|
|
The extensions framework provides a mechanism for inserting your own
|
2009-01-11 06:34:38 +00:00
|
|
|
custom functionality into Scrapy.
|
|
|
|
|
|
|
|
Extensions are just regular classes that are instantiated at Scrapy startup,
|
|
|
|
when extensions are initialized.
|
|
|
|
|
|
|
|
Extension settings
|
|
|
|
==================
|
|
|
|
|
|
|
|
Extensions use the :ref:`Scrapy settings <topics-settings>` to manage their
|
|
|
|
settings, just like any other Scrapy code.
|
|
|
|
|
|
|
|
It is customary for extensions to prefix their settings with their own name, to
|
|
|
|
avoid collision with existing (and future) extensions. For example, an
|
|
|
|
hypothetic extension to handle `Google Sitemaps`_ would use settings like
|
|
|
|
`GOOGLESITEMAP_ENABLED`, `GOOGLESITEMAP_DEPTH`, and so on.
|
|
|
|
|
|
|
|
.. _Google Sitemaps: http://en.wikipedia.org/wiki/Sitemaps
|
|
|
|
|
|
|
|
Loading & activating extensions
|
|
|
|
===============================
|
|
|
|
|
|
|
|
Extensions are loaded and activated at startup by instantiating a single
|
|
|
|
instance of the extension class. Therefore, all the extension initialization
|
|
|
|
code must be performed in the class constructor (``__init__`` method).
|
|
|
|
|
2009-08-18 11:05:36 -03:00
|
|
|
To make an extension available, add it to the :setting:`EXTENSIONS` setting in
|
2009-01-11 06:34:38 +00:00
|
|
|
your Scrapy settings. In :setting:`EXTENSIONS`, each extension is represented
|
|
|
|
by a string: the full Python path to the extension's class name. For example::
|
|
|
|
|
2009-08-18 11:05:36 -03:00
|
|
|
EXTENSIONS = {
|
2009-09-01 23:00:49 -03:00
|
|
|
'scrapy.contrib.corestats.CoreStats': 500,
|
2010-06-09 13:46:22 -03:00
|
|
|
'scrapy.webservice.WebService': 500,
|
|
|
|
'scrapy.telnet.TelnetConsole': 500,
|
2009-08-18 11:05:36 -03:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
As you can see, the :setting:`EXTENSIONS` setting is a dict where the keys are
|
|
|
|
the extension paths, and their values are the orders, which define the
|
|
|
|
extension *loading* order. Extensions orders are not as important as middleware
|
|
|
|
orders though, and they are typically irrelevant, ie. it doesn't matter in
|
|
|
|
which order the extensions are loaded because they don't depend on each other
|
|
|
|
[1].
|
|
|
|
|
2010-04-18 23:39:54 -03:00
|
|
|
However, this feature can be exploited if you need to add an extension which
|
|
|
|
depends on other extensions already loaded.
|
2009-08-18 11:05:36 -03:00
|
|
|
|
|
|
|
[1] This is is why the :setting:`EXTENSIONS_BASE` setting in Scrapy (which
|
|
|
|
contains all built-in extensions enabled by default) defines all the extensions
|
|
|
|
with the same order (``500``).
|
2009-01-11 06:34:38 +00:00
|
|
|
|
|
|
|
Available, enabled and disabled extensions
|
|
|
|
==========================================
|
|
|
|
|
|
|
|
Not all available extensions will be enabled. Some of them usually depend on a
|
2009-05-24 19:13:06 -03:00
|
|
|
particular setting. For example, the HTTP Cache extension is available by default
|
2010-11-24 13:27:44 -02:00
|
|
|
but disabled unless the :setting:`HTTPCACHE_ENABLED` setting is set.
|
2009-01-11 06:34:38 +00:00
|
|
|
|
|
|
|
Accessing enabled extensions
|
|
|
|
============================
|
|
|
|
|
|
|
|
Even though it's not usually needed, you can access extension objects through
|
2009-08-18 14:05:15 -03:00
|
|
|
the :ref:`topics-extensions-ref-manager` which is populated when extensions are
|
2010-06-09 13:46:22 -03:00
|
|
|
loaded. For example, to access the ``WebService`` extension::
|
2009-01-11 06:34:38 +00:00
|
|
|
|
2010-08-22 02:15:11 -03:00
|
|
|
from scrapy.project import extensions
|
2010-06-09 13:46:22 -03:00
|
|
|
webservice_extension = extensions.enabled['WebService']
|
2009-01-11 06:34:38 +00:00
|
|
|
|
2010-04-18 23:39:54 -03:00
|
|
|
.. see also::
|
2009-01-11 06:34:38 +00:00
|
|
|
|
2010-04-18 23:39:54 -03:00
|
|
|
:ref:`topics-extensions-ref-manager`, for the complete Extension Manager
|
2009-08-18 14:05:15 -03:00
|
|
|
reference.
|
2009-01-11 06:34:38 +00:00
|
|
|
|
|
|
|
Writing your own extension
|
|
|
|
==========================
|
|
|
|
|
|
|
|
Writing your own extension is easy. Each extension is a single Python class
|
|
|
|
which doesn't need to implement any particular method.
|
|
|
|
|
|
|
|
All extension initialization code must be performed in the class constructor
|
2009-08-21 15:05:06 -03:00
|
|
|
(``__init__`` method). If that method raises the
|
2010-08-10 17:36:48 -03:00
|
|
|
:exc:`~scrapy.exceptions.NotConfigured` exception, the extension will be
|
2009-08-21 15:05:06 -03:00
|
|
|
disabled. Otherwise, the extension will be enabled.
|
2009-01-11 06:34:38 +00:00
|
|
|
|
|
|
|
Let's take a look at the following example extension which just logs a message
|
2010-04-18 23:39:54 -03:00
|
|
|
every time a domain/spider is opened and closed::
|
2009-01-11 06:34:38 +00:00
|
|
|
|
2009-07-09 17:13:30 -03:00
|
|
|
from scrapy.xlib.pydispatch import dispatcher
|
2010-08-10 17:40:53 -03:00
|
|
|
from scrapy import signals
|
2009-01-11 06:34:38 +00:00
|
|
|
|
|
|
|
class SpiderOpenCloseLogging(object):
|
|
|
|
|
|
|
|
def __init__(self):
|
2009-11-03 00:39:02 -02:00
|
|
|
dispatcher.connect(self.spider_opened, signal=signals.spider_opened)
|
|
|
|
dispatcher.connect(self.spider_closed, signal=signals.spider_closed)
|
2009-01-11 06:34:38 +00:00
|
|
|
|
2009-11-03 00:39:02 -02:00
|
|
|
def spider_opened(self, spider):
|
2010-04-01 18:27:22 -03:00
|
|
|
log.msg("opened spider %s" % spider.name)
|
2009-01-11 06:34:38 +00:00
|
|
|
|
2009-11-03 00:39:02 -02:00
|
|
|
def spider_closed(self, spider):
|
2010-04-01 18:27:22 -03:00
|
|
|
log.msg("closed spider %s" % spider.name)
|
2009-01-11 06:34:38 +00:00
|
|
|
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
.. _topics-extensions-ref-manager:
|
|
|
|
|
2010-04-18 23:39:54 -03:00
|
|
|
Extension Manager
|
2009-09-08 22:32:17 -03:00
|
|
|
=================
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. module:: scrapy.extension
|
|
|
|
:synopsis: The extension manager
|
|
|
|
|
|
|
|
The Extension Manager is responsible for loading and keeping track of installed
|
|
|
|
extensions and it's configured through the :setting:`EXTENSIONS` setting which
|
|
|
|
contains a dictionary of all available extensions and their order similar to
|
|
|
|
how you :ref:`configure the downloader middlewares
|
|
|
|
<topics-downloader-middleware-setting>`.
|
|
|
|
|
|
|
|
.. class:: ExtensionManager
|
|
|
|
|
2010-04-18 23:39:54 -03:00
|
|
|
The Extension Manager is a singleton object, which is instantiated at module
|
2009-08-18 14:05:15 -03:00
|
|
|
loading time and can be accessed like this::
|
|
|
|
|
2010-08-22 02:15:11 -03:00
|
|
|
from scrapy.project import extensions
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. attribute:: loaded
|
|
|
|
|
|
|
|
A boolean which is True if extensions are already loaded or False if
|
|
|
|
they're not.
|
|
|
|
|
|
|
|
.. attribute:: enabled
|
|
|
|
|
|
|
|
A dict with the enabled extensions. The keys are the extension class names,
|
|
|
|
and the values are the extension objects. Example::
|
|
|
|
|
2010-08-22 02:15:11 -03:00
|
|
|
>>> from scrapy.project import extensions
|
2009-08-18 14:05:15 -03:00
|
|
|
>>> extensions.load()
|
|
|
|
>>> print extensions.enabled
|
2009-09-01 23:00:49 -03:00
|
|
|
{'CoreStats': <scrapy.contrib.corestats.CoreStats object at 0x9e272ac>,
|
2010-06-09 13:46:22 -03:00
|
|
|
'WebService': <scrapy.management.telnet.TelnetConsole instance at 0xa05670c>,
|
2009-08-18 14:05:15 -03:00
|
|
|
...
|
|
|
|
|
|
|
|
.. attribute:: disabled
|
|
|
|
|
|
|
|
A dict with the disabled extensions. The keys are the extension class names,
|
|
|
|
and the values are the extension class paths (because objects are never
|
|
|
|
instantiated for disabled extensions). Example::
|
|
|
|
|
2010-08-22 02:15:11 -03:00
|
|
|
>>> from scrapy.project import extensions
|
2009-08-18 14:05:15 -03:00
|
|
|
>>> extensions.load()
|
|
|
|
>>> print extensions.disabled
|
2010-06-09 13:46:22 -03:00
|
|
|
{'MemoryDebugger': 'scrapy.contrib.memdebug.MemoryDebugger',
|
2009-09-01 22:38:37 -03:00
|
|
|
'MyExtension': 'myproject.extensions.MyExtension',
|
2009-08-18 14:05:15 -03:00
|
|
|
...
|
|
|
|
|
|
|
|
.. method:: load()
|
|
|
|
|
|
|
|
Load the available extensions configured in the :setting:`EXTENSIONS`
|
|
|
|
setting. On a standard run, this method is usually called by the Execution
|
|
|
|
Manager, but you may need to call it explicitly if you're dealing with
|
|
|
|
code outside Scrapy.
|
|
|
|
|
|
|
|
.. method:: reload()
|
|
|
|
|
|
|
|
Reload the available extensions. See :meth:`load`.
|
|
|
|
|
2009-09-08 22:32:17 -03:00
|
|
|
|
|
|
|
.. _topics-extensions-ref:
|
|
|
|
|
|
|
|
Built-in extensions reference
|
|
|
|
=============================
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
General purpose extensions
|
|
|
|
--------------------------
|
|
|
|
|
2011-06-14 00:50:05 -03:00
|
|
|
Log Stats extension
|
|
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
.. module:: scrapy.contrib.logstats
|
|
|
|
:synopsis: Basic stats logging
|
|
|
|
|
|
|
|
.. class:: LogStats
|
|
|
|
|
|
|
|
Log basic stats like crawled pages and scraped items.
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
Core Stats extension
|
|
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
2011-06-14 00:50:05 -03:00
|
|
|
.. module:: scrapy.contrib.corestats
|
2009-08-18 14:05:15 -03:00
|
|
|
:synopsis: Core stats collection
|
|
|
|
|
2009-09-01 23:00:49 -03:00
|
|
|
.. class:: CoreStats
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2010-04-18 23:39:54 -03:00
|
|
|
Enable the collection of core statistics, provided the stats collection is
|
2009-08-18 14:05:15 -03:00
|
|
|
enabled (see :ref:`topics-stats`).
|
|
|
|
|
2010-06-09 13:46:22 -03:00
|
|
|
.. _topics-extensions-ref-webservice:
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2010-06-09 13:46:22 -03:00
|
|
|
Web service extension
|
2009-08-18 14:05:15 -03:00
|
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
2010-06-09 13:46:22 -03:00
|
|
|
.. module:: scrapy.webservice
|
|
|
|
:synopsis: Web service
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2010-06-09 13:46:22 -03:00
|
|
|
.. class:: scrapy.webservice.WebService
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2010-06-09 13:46:22 -03:00
|
|
|
See `topics-webservice`.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. _topics-extensions-ref-telnetconsole:
|
|
|
|
|
|
|
|
Telnet console extension
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
2010-06-09 13:46:22 -03:00
|
|
|
.. module:: scrapy.telnet
|
|
|
|
:synopsis: Telnet console
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2010-06-09 13:46:22 -03:00
|
|
|
.. class:: scrapy.telnet.TelnetConsole
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
Provides a telnet console for getting into a Python interpreter inside the
|
|
|
|
currently running Scrapy process, which can be very useful for debugging.
|
|
|
|
|
|
|
|
The telnet console must be enabled by the :setting:`TELNETCONSOLE_ENABLED`
|
|
|
|
setting, and the server will listen in the port specified in
|
2010-06-09 13:46:22 -03:00
|
|
|
:setting:`TELNETCONSOLE_PORT`.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
.. _topics-extensions-ref-memusage:
|
|
|
|
|
|
|
|
Memory usage extension
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
.. module:: scrapy.contrib.memusage
|
|
|
|
:synopsis: Memory usage extension
|
|
|
|
|
|
|
|
.. class:: scrapy.contrib.memusage.MemoryUsage
|
|
|
|
|
2009-10-07 22:57:10 -02:00
|
|
|
.. note:: This extension does not work in Windows.
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
Allows monitoring the memory used by a Scrapy process and:
|
|
|
|
|
2010-04-18 23:39:54 -03:00
|
|
|
1, send a notification e-mail when it exceeds a certain value
|
2009-08-18 14:05:15 -03:00
|
|
|
2. terminate the Scrapy process when it exceeds a certain value
|
|
|
|
|
2010-04-18 23:39:54 -03:00
|
|
|
The notification e-mails can be triggered when a certain warning value is
|
2009-08-18 14:05:15 -03:00
|
|
|
reached (:setting:`MEMUSAGE_WARNING_MB`) and when the maximum value is reached
|
|
|
|
(:setting:`MEMUSAGE_LIMIT_MB`) which will also cause the Scrapy process to be
|
|
|
|
terminated.
|
|
|
|
|
|
|
|
This extension is enabled by the :setting:`MEMUSAGE_ENABLED` setting and
|
|
|
|
can be configured with the following settings:
|
|
|
|
|
|
|
|
* :setting:`MEMUSAGE_LIMIT_MB`
|
|
|
|
* :setting:`MEMUSAGE_WARNING_MB`
|
|
|
|
* :setting:`MEMUSAGE_NOTIFY_MAIL`
|
|
|
|
* :setting:`MEMUSAGE_REPORT`
|
|
|
|
|
|
|
|
Memory debugger extension
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
.. module:: scrapy.contrib.memdebug
|
|
|
|
:synopsis: Memory debugger extension
|
|
|
|
|
|
|
|
.. class:: scrapy.contrib.memdebug.MemoryDebugger
|
|
|
|
|
2011-06-06 03:13:28 -03:00
|
|
|
An extension for debugging memory usage. It collects information about:
|
|
|
|
|
|
|
|
* objects uncollected by the Python garbage collector
|
|
|
|
* libxml2 memory leaks
|
|
|
|
* objects left alive that shouldn't. For more info, see :ref:`topics-leaks-trackrefs`
|
|
|
|
|
|
|
|
To enable this extension, turn on the :setting:`MEMDEBUG_ENABLED` setting. The
|
|
|
|
info will be stored in the stats.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-11-30 11:36:18 -02:00
|
|
|
Close spider extension
|
2009-08-18 14:05:15 -03:00
|
|
|
~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
2009-11-06 15:54:17 -02:00
|
|
|
.. module:: scrapy.contrib.closespider
|
2009-11-30 11:36:18 -02:00
|
|
|
:synopsis: Close spider extension
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-11-06 15:54:17 -02:00
|
|
|
.. class:: scrapy.contrib.closespider.CloseSpider
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-11-30 11:38:56 -02:00
|
|
|
Closes a spider automatically when some conditions are met, using a specific
|
|
|
|
closing reason for each condition.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-11-30 11:38:56 -02:00
|
|
|
The conditions for closing a spider can be configured through the following
|
2010-10-18 22:36:30 -02:00
|
|
|
settings:
|
|
|
|
|
|
|
|
* :setting:`CLOSESPIDER_TIMEOUT`
|
2011-06-13 16:58:51 -03:00
|
|
|
* :setting:`CLOSESPIDER_ITEMCOUNT`
|
2010-10-18 22:36:30 -02:00
|
|
|
* :setting:`CLOSESPIDER_PAGECOUNT`
|
|
|
|
* :setting:`CLOSESPIDER_ERRORCOUNT`
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-11-06 15:54:17 -02:00
|
|
|
.. setting:: CLOSESPIDER_TIMEOUT
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2009-11-06 15:54:17 -02:00
|
|
|
CLOSESPIDER_TIMEOUT
|
2009-08-18 14:05:15 -03:00
|
|
|
"""""""""""""""""""
|
|
|
|
|
|
|
|
Default: ``0``
|
|
|
|
|
2009-11-30 11:38:56 -02:00
|
|
|
An integer which specifies a number of seconds. If the spider remains open for
|
2009-08-18 14:05:15 -03:00
|
|
|
more than that number of second, it will be automatically closed with the
|
2010-04-18 23:39:54 -03:00
|
|
|
reason ``closespider_timeout``. If zero (or non set), spiders won't be closed by
|
2009-08-18 14:05:15 -03:00
|
|
|
timeout.
|
|
|
|
|
2011-06-13 16:58:51 -03:00
|
|
|
.. setting:: CLOSESPIDER_ITEMCOUNT
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2011-06-13 16:58:51 -03:00
|
|
|
CLOSESPIDER_ITEMCOUNT
|
|
|
|
"""""""""""""""""""""
|
2009-08-18 14:05:15 -03:00
|
|
|
|
|
|
|
Default: ``0``
|
|
|
|
|
|
|
|
An integer which specifies a number of items. If the spider scrapes more than
|
|
|
|
that amount if items and those items are passed by the item pipeline, the
|
2011-06-13 16:58:51 -03:00
|
|
|
spider will be closed with the reason ``closespider_itemcount``. If zero (or
|
2010-04-18 23:39:54 -03:00
|
|
|
non set), spiders won't be closed by number of passed items.
|
2009-08-18 14:05:15 -03:00
|
|
|
|
2010-09-28 16:29:37 -03:00
|
|
|
.. setting:: CLOSESPIDER_PAGECOUNT
|
|
|
|
|
|
|
|
CLOSESPIDER_PAGECOUNT
|
2010-10-18 22:36:30 -02:00
|
|
|
"""""""""""""""""""""
|
|
|
|
|
|
|
|
.. versionadded:: 0.11
|
2010-09-28 16:29:37 -03:00
|
|
|
|
|
|
|
Default: ``0``
|
|
|
|
|
|
|
|
An integer which specifies the maximum number of responses to crawl. If the spider
|
|
|
|
crawls more than that, the spider will be closed with the reason
|
|
|
|
``closespider_pagecount``. If zero (or non set), spiders won't be closed by
|
|
|
|
number of crawled responses.
|
|
|
|
|
2010-10-18 22:36:30 -02:00
|
|
|
.. setting:: CLOSESPIDER_ERRORCOUNT
|
|
|
|
|
|
|
|
CLOSESPIDER_ERRORCOUNT
|
|
|
|
""""""""""""""""""""""
|
|
|
|
|
2010-09-28 16:44:53 -03:00
|
|
|
.. versionadded:: 0.11
|
|
|
|
|
2010-10-18 22:36:30 -02:00
|
|
|
Default: ``0``
|
|
|
|
|
|
|
|
An integer which specifies the maximum number of errors to receive before
|
|
|
|
closing the spider. If the spider generates more than that number of errors,
|
|
|
|
it will be closed with the reason ``closespider_errorcount``. If zero (or non
|
|
|
|
set), spiders won't be closed by number of errors.
|
|
|
|
|
2009-08-18 14:05:15 -03:00
|
|
|
StatsMailer extension
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
.. module:: scrapy.contrib.statsmailer
|
|
|
|
:synopsis: StatsMailer extension
|
|
|
|
|
|
|
|
.. class:: scrapy.contrib.statsmailer.StatsMailer
|
|
|
|
|
2010-04-18 23:39:54 -03:00
|
|
|
This simple extension can be used to send a notification e-mail every time a
|
2009-08-18 14:05:15 -03:00
|
|
|
domain has finished scraping, including the Scrapy stats collected. The email
|
|
|
|
will be sent to all recipients specified in the :setting:`STATSMAILER_RCPTS`
|
|
|
|
setting.
|
|
|
|
|
2009-09-08 22:32:17 -03:00
|
|
|
.. module:: scrapy.contrib.debug
|
|
|
|
:synopsis: Extensions for debugging Scrapy
|
|
|
|
|
|
|
|
Debugging extensions
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
Stack trace dump extension
|
|
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
.. class:: scrapy.contrib.debug.StackTraceDump
|
|
|
|
|
2012-02-16 15:14:17 -02:00
|
|
|
Dumps information about the running process when a `SIGQUIT`_ or `SIGUSR2`_
|
|
|
|
signal is received. The information dumped is the following:
|
2009-09-08 22:32:17 -03:00
|
|
|
|
2012-02-16 15:14:17 -02:00
|
|
|
1. engine status (using ``scrapy.utils.engine.get_engine_status()``)
|
|
|
|
2. live references (see :ref:`topics-leaks-trackrefs`)
|
|
|
|
3. stack trace of all threads
|
2009-09-08 22:32:17 -03:00
|
|
|
|
2012-02-16 15:14:17 -02:00
|
|
|
After the stack trace and engine status is dumped, the Scrapy process continues
|
|
|
|
running normally.
|
|
|
|
|
|
|
|
This extension only works on POSIX-compliant platforms (ie. not Windows),
|
|
|
|
because the `SIGQUIT`_ and `SIGUSR2`_ signals are not available on Windows.
|
2009-09-08 22:32:17 -03:00
|
|
|
|
2011-05-20 03:25:00 -03:00
|
|
|
There are at least two ways to send Scrapy the `SIGQUIT`_ signal:
|
|
|
|
|
|
|
|
1. By pressing Ctrl-\ while a Scrapy process is running (Linux only?)
|
|
|
|
2. By running this command (assuming ``<pid>`` is the process id of the Scrapy
|
|
|
|
process)::
|
|
|
|
|
|
|
|
kill -QUIT <pid>
|
|
|
|
|
2009-09-08 22:32:17 -03:00
|
|
|
.. _SIGUSR2: http://en.wikipedia.org/wiki/SIGUSR1_and_SIGUSR2
|
2011-05-20 03:25:00 -03:00
|
|
|
.. _SIGQUIT: http://en.wikipedia.org/wiki/SIGQUIT
|
2009-09-08 22:32:17 -03:00
|
|
|
|
|
|
|
Debugger extension
|
|
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
|
|
|
|
.. class:: scrapy.contrib.debug.Debugger
|
|
|
|
|
|
|
|
Invokes a `Python debugger`_ inside a running Scrapy process when a `SIGUSR2`_
|
|
|
|
signal is received. After the debugger is exited, the Scrapy process continues
|
|
|
|
running normally.
|
|
|
|
|
|
|
|
For more info see `Debugging in Python`.
|
|
|
|
|
|
|
|
This extension only works on POSIX-compliant platforms (ie. not Windows).
|
|
|
|
|
|
|
|
.. _Python debugger: http://docs.python.org/library/pdb.html
|
|
|
|
.. _Debugging in Python: http://www.ferg.org/papers/debugging_in_python.html
|