1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 19:24:03 +00:00
scrapy/docs/topics/webservice.rst

235 lines
6.8 KiB
ReStructuredText
Raw Normal View History

.. _topics-webservice:
===========
Web Service
===========
Scrapy comes with a built-in web service for monitoring and controlling a
running crawler. The service exposes most resources using the `JSON-RPC 2.0`_
protocol, but there are also other (read-only) resources which just output JSON
data.
Provides an extensible web service for managing a Scrapy process. It's enabled
by the :setting:`WEBSERVICE_ENABLED` setting. The web server will listen in the
port specified in :setting:`WEBSERVICE_PORT`, and will log to the file
specified in :setting:`WEBSERVICE_LOGFILE`.
The web service is a :ref:`built-in Scrapy extension <topics-extensions-ref>`
which comes enabled by default, but you can also disable it if you're running
tight on memory.
.. _topics-webservice-resources:
Web service resources
=====================
The web service contains several resources, defined in the
:setting:`WEBSERVICE_RESOURCES` setting. Each resource provides a different
functionality. See :ref:`topics-webservice-resources-ref` for a list of
resources available by default.
2013-01-22 14:52:18 -08:00
Although you can implement your own resources using any protocol, there are
two kinds of resources bundled with Scrapy:
* Simple JSON resources - which are read-only and just output JSON data
* JSON-RPC resources - which provide direct access to certain Scrapy objects
using the `JSON-RPC 2.0`_ protocol
.. module:: scrapy.contrib.webservice
:synopsis: Built-in web service resources
.. _topics-webservice-resources-ref:
Available JSON-RPC resources
----------------------------
These are the JSON-RPC resources available by default in Scrapy:
.. _topics-webservice-crawler:
Crawler JSON-RPC resource
~~~~~~~~~~~~~~~~~~~~~~~~~
.. module:: scrapy.contrib.webservice.crawler
:synopsis: Crawler JSON-RPC resource
.. class:: CrawlerResource
Provides access to the main Crawler object that controls the Scrapy
process.
Available by default at: http://localhost:6080/crawler
Stats Collector JSON-RPC resource
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. module:: scrapy.contrib.webservice.stats
:synopsis: Stats JSON-RPC resource
.. class:: StatsResource
Provides access to the Stats Collector used by the crawler.
Available by default at: http://localhost:6080/stats
Spider Manager JSON-RPC resource
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can access the spider manager JSON-RPC resource through the
:ref:`topics-webservice-crawler` at: http://localhost:6080/crawler/spiders
Extension Manager JSON-RPC resource
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can access the extension manager JSON-RPC resource through the
:ref:`topics-webservice-crawler` at: http://localhost:6080/crawler/spiders
Available JSON resources
------------------------
These are the JSON resources available by default:
Engine status JSON resource
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. module:: scrapy.contrib.webservice.enginestatus
:synopsis: Engine Status JSON resource
.. class:: EngineStatusResource
Provides access to engine status metrics.
Available by default at: http://localhost:6080/enginestatus
Web service settings
====================
These are the settings that control the web service behaviour:
.. setting:: WEBSERVICE_ENABLED
WEBSERVICE_ENABLED
------------------
Default: ``True``
A boolean which specifies if the web service will be enabled (provided its
extension is also enabled).
.. setting:: WEBSERVICE_LOGFILE
WEBSERVICE_LOGFILE
------------------
Default: ``None``
A file to use for logging HTTP requests made to the web service. If unset web
the log is sent to standard scrapy log.
.. setting:: WEBSERVICE_PORT
WEBSERVICE_PORT
---------------
Default: ``[6080, 7030]``
The port range to use for the web service. If set to ``None`` or ``0``, a
dynamically assigned port is used.
.. setting:: WEBSERVICE_HOST
WEBSERVICE_HOST
---------------
Default: ``'0.0.0.0'``
The interface the web service should listen on
WEBSERVICE_RESOURCES
--------------------
Default: ``{}``
The list of web service resources enabled for your project. See
:ref:`topics-webservice-resources`. These are added to the ones available by
default in Scrapy, defined in the :setting:`WEBSERVICE_RESOURCES_BASE` setting.
WEBSERVICE_RESOURCES_BASE
-------------------------
Default::
{
'scrapy.contrib.webservice.crawler.CrawlerResource': 1,
'scrapy.contrib.webservice.enginestatus.EngineStatusResource': 1,
'scrapy.contrib.webservice.stats.StatsResource': 1,
}
The list of web service resources available by default in Scrapy. You shouldn't
change this setting in your project, change :setting:`WEBSERVICE_RESOURCES`
instead. If you want to disable some resource set its value to ``None`` in
:setting:`WEBSERVICE_RESOURCES`.
Writing a web service resource
==============================
Web service resources are implemented using the Twisted Web API. See this
`Twisted Web guide`_ for more information on Twisted web and Twisted web
resources.
To write a web service resource you should subclass the :class:`JsonResource` or
:class:`JsonRpcResource` classes and implement the :class:`renderGET` method.
.. class:: scrapy.webservice.JsonResource
A subclass of `twisted.web.resource.Resource`_ that implements a JSON web
service resource. See
.. attribute:: ws_name
The name by which the Scrapy web service will known this resource, and
2013-01-22 14:52:18 -08:00
also the path where this resource will listen. For example, assuming
Scrapy web service is listening on http://localhost:6080/ and the
``ws_name`` is ``'resource1'`` the URL for that resource will be:
http://localhost:6080/resource1/
.. class:: scrapy.webservice.JsonRpcResource(crawler, target=None)
This is a subclass of :class:`JsonResource` for implementing JSON-RPC
resources. JSON-RPC resources wrap Python (Scrapy) objects around a
JSON-RPC API. The resource wrapped must be returned by the
:meth:`get_target` method, which returns the target passed in the
constructor by default
.. method:: get_target()
Return the object wrapped by this JSON-RPC resource. By default, it
returns the object passed on the constructor.
Examples of web service resources
=================================
StatsResource (JSON-RPC resource)
---------------------------------
.. literalinclude:: ../../scrapy/contrib/webservice/stats.py
EngineStatusResource (JSON resource)
-------------------------------------
.. literalinclude:: ../../scrapy/contrib/webservice/enginestatus.py
Example of web service client
=============================
scrapy-ws.py script
-------------------
.. literalinclude:: ../../extras/scrapy-ws.py
.. _Twisted Web guide: http://jcalderone.livejournal.com/50562.html
.. _JSON-RPC 2.0: http://www.jsonrpc.org/
.. _twisted.web.resource.Resource: http://twistedmatrix.com/documents/10.0.0/api/twisted.web.resource.Resource.html