1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 15:24:15 +00:00
scrapy/docs/topics/items.rst

195 lines
6.5 KiB
ReStructuredText

.. _topics-items:
=====
Items
=====
Quick overview
==============
In Scrapy, items are the placeholder to use for the scraped data. They are
represented by a :class:`ScrapedItem` object, or any descendant class instance,
and store the information in instance attributes.
ScrapedItems
============
.. module:: scrapy.item
:synopsis: Objects for storing scraped data
.. class:: ScrapedItem
Methods
-------
.. method:: ScrapedItem.__init__(data=None)
:param data: A dictionary containing attributes and values to be set
after instancing the item.
Instanciates a ``ScrapedItem`` object and sets an attribute and its value
for each key in the given ``data`` dict (if any). These items are the most
basic items available, and the common interface from which any items should
inherit.
Examples
--------
Creating an item and setting some attributes::
>>> from scrapy.item import ScrapedItem
>>> item = ScrapedItem()
>>> item.name = 'John'
>>> item.last_name = 'Smith'
>>> item.age = 23
>>> item
ScrapedItem({'age': 23, 'last_name': 'Smith', 'name': 'John'})
Creating an item and setting its attributes inline::
>>> person = ScrapedItem({'name': 'John', 'age': 23, 'last_name': 'Smith'})
>>> person
ScrapedItem({'age': 23, 'last_name': 'Smith', 'name': 'John'})
RobustScrapedItems
==================
.. warning::
RobustScapedItems are deprecated and will be replaced by the :ref:`New item
API <topics-newitem>` (still in development).
.. module:: scrapy.contrib.item
:synopsis: Objects for storing scraped data
.. class:: RobustScrapedItem
RobustScrapedItems are more complex items (compared to
:class:`ScrapedItem`) and have a few more features available, which
include:
* Attributes dictionary: items that inherit from RobustScrapedItem are
defined with a dictionary of attributes in the class. This allows the
item to have more logic at the moment of handling and setting attributes
than the :class:`ScrapedItem`.
* Adaptors: perhaps the most important of the features these items provide.
The adaptors are a system designed for filtering/modifying data before
setting it to the item, that makes cleansing tasks a lot easier.
* Type checking: RobustScrapedItems come with a built-in type checking
which assures you that no data of the wrong type will get into the items
without raising a warning.
* Versioning: These items also provide versioning by making a unique hash
for each item based on its attributes values.
* ItemDeltas: You can subtract two RobustScrapedItems, which allows you to
know the difference between a pair of items. This difference is
represented by a RobustItemDelta object.
Attributes
----------
.. attribute:: RobustScrapedItem.ATTRIBUTES
This attribute **must** be specified when writing your items, and it's a
dictionary in which the keys are the names of the attributes your item will
have, and their values are the type of those attributes. For multivalued
attributes, you should write the type of the values inside a list, e.g:
``'numbers': [int]``
Methods
-------
.. method:: RobustScrapedItem.__init__(data=None, adaptor_args=None)
:param data: Idem as for ScrapedItems
:param adaptor_args: A dictionary of the kind
``'attribute': [list_of_adaptors]``" for defining adaptors automatically
after instancing the item.
Constructor of RobustScrapedItem objects.
.. method:: RobustScrapedItem.attribute(attrname, value, override=False, add=False, ***kwargs)
Sets the item's ``attrname`` attribute with the given ``value`` filtering
it through the given attribute's adaptor pipeline (if any).
:param attrname: a string containing the name of the attribute you want
to set.
:param value: the value you want to assign, which will be adapted by
the corresponding adaptors for the given attribute (if any).
:param override: if True, makes this method avoid checking if there
was a previous value and sets ``value`` no matter what.
:param add: if True, tries to concatenate the given ``value`` with the one
already set in the item. For multivalued attributes, this will extend
the list of already-set values, with the new ones.
For single valued attributes, the method _add_single_attributes (which
is explained below) will be called.
:param kwargs: any extra parameters will be passed in a dictionary to any
adaptor that receives a parameter called ``adaptor_args``.
Check the :ref:`topics-adaptors` topic for more information.
.. method:: RobustScrapedItem.set_adaptors(adaptors_dict)
Receives a dict containing a list of adaptors for each desired attribute
(key) and sets each of them as their adaptor pipeline.
.. method:: RobustScrapedItem.set_attrib_adaptors(attrib, pipe)
Sets the provided iterable (``pipe``) as the adaptor pipeline for the
given attribute (``attrib``)
.. method:: RobustScrapedItem.add_adaptor(attrib, adaptor, position=None)
Adds an adaptor to an already existing (or not) pipeline.
:param attr: the name of the attribute you're adding adaptors to.
:param adaptor: a callable to be added to the pipeline.
:param position: an integer representing the place where to add the adaptor.
If it's ``None``, the adaptor will be appended at the end of the pipeline.
.. method:: RobustScrapedItem._add_single_attributes(attrname, attrtype, attributes)
This method is the one to be called whenever a single attribute has to be
joined before storing into an item. That is,
every time you have multiple results at the end of your adaptors pipeline,
and you called the ``attribute`` method with the parameter `add=True`.
This method is intended to be overriden by you, since by default it
raises an exception.
:param attrname: the name of the attribute you're setting
:param attrtype: the type of the attribute you're setting
:param attributes: the list of resulting values after the adaptors pipeline
(the one you have to join somehow)
Examples
--------
Creating a pretty basic item with a few attributes::
from scrapy.contrib.item import RobustScrapedItem
class MyItem(RobustScrapedItem):
ATTRIBUTES = {
'name': basestring,
'size': basestring,
'colours': [basestring],
}
Setting some adaptors::
.. note::
More RobustScrapedItem examples are about to come. In the meantime, check the :ref:`topics-adaptors` topic to see a few of them.