mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-22 06:33:12 +00:00
112 lines
3.3 KiB
ReStructuredText
112 lines
3.3 KiB
ReStructuredText
========= ==============================================================
|
|
SEP 8
|
|
Title Item Parsers
|
|
Author Pablo Hoffman
|
|
Created 2009-08-11
|
|
Status Final (implemented with variations)
|
|
Obsoletes :doc:`sep-001`, :doc:`sep-002`, :doc:`sep-003`, :doc:`sep-005`
|
|
========= ==============================================================
|
|
|
|
======================
|
|
SEP-008 - Item Loaders
|
|
======================
|
|
|
|
Item Parser is the final API proposed to implement Item Builders/Loader
|
|
proposed in :doc:`sep-001`.
|
|
|
|
.. note:: This is the API that was finally implemented with the name "Item
|
|
Loaders", instead of "Item Parsers" along with some other minor fine
|
|
tuning to the API methods and semantics.
|
|
|
|
Dataflow
|
|
========
|
|
|
|
1. ``ItemParser.add_value()``
|
|
1. **input_parser**
|
|
2. store
|
|
2. ``ItemParser.add_xpath()`` *(only available in XPathItemLoader)*
|
|
1. selector.extract()
|
|
2. **input_parser**
|
|
3. store
|
|
3. ``ItemParser.populate_item()`` *(ex. get_item)*
|
|
1. **output_parser**
|
|
2. assign field
|
|
|
|
Modules and classes
|
|
===================
|
|
|
|
- ``scrapy.contrib.itemparser.ItemParser``
|
|
- ``scrapy.contrib.itemparser.XPathItemParser``
|
|
- ``scrapy.contrib.itemparser.parsers.``MapConcat`` *(ex. ``TreeExpander``)*
|
|
- ``scrapy.contrib.itemparser.parsers.``TakeFirst``
|
|
- ``scrapy.contrib.itemparser.parsers.Join``
|
|
- ``scrapy.contrib.itemparser.parsers.Identity``
|
|
|
|
Public API
|
|
==========
|
|
|
|
- ``ItemParser.add_value()``
|
|
- ``ItemParser.replace_value()``
|
|
- ``ItemParser.populate_item()`` *(returns item populated)*
|
|
|
|
- ``ItemParser.get_collected_values()`` *(note the 's' in values)*
|
|
- ``ItemParser.parse_field()``
|
|
|
|
- ``ItemParser.get_input_parser()``
|
|
- ``ItemParser.get_output_parser()``
|
|
|
|
- ``ItemParser.context``
|
|
|
|
- ``ItemParser.default_item_class``
|
|
- ``ItemParser.default_input_parser``
|
|
- ``ItemParser.default_output_parser``
|
|
- ``ItemParser.*field*_in``
|
|
- ``ItemParser.*field*_out``
|
|
|
|
Alternative Public API Proposal
|
|
===============================
|
|
|
|
- ``ItemLoader.add_value()``
|
|
- ``ItemLoader.replace_value()``
|
|
- ``ItemLoader.load_item()`` *(returns loaded item)*
|
|
|
|
- ``ItemLoader.get_stored_values()`` or ``ItemLoader.get_values()`` *(returns the ``ItemLoader values)*
|
|
- ``ItemLoader.get_output_value()``
|
|
|
|
- ``ItemLoader.get_input_processor()`` or ``ItemLoader.get_in_processor()`` *(short version)*
|
|
- ``ItemLoader.get_output_processor()`` or ``ItemLoader.get_out_processor()`` *(short version)*
|
|
|
|
- ``ItemLoader.context``
|
|
|
|
- ``ItemLoader.default_item_class``
|
|
- ``ItemLoader.default_input_processor`` or ``ItemLoader.default_in_processor`` *(short version)*
|
|
- ``ItemLoader.default_output_processor`` or ``ItemLoader.default_out_processor`` *(short version)*
|
|
- ``ItemLoader.*field*_in``
|
|
- ``ItemLoader.*field*_out``
|
|
|
|
Usage example: declaring Item Parsers
|
|
=====================================
|
|
|
|
::
|
|
|
|
#!python
|
|
from scrapy.contrib.itemparser import XPathItemParser, parsers
|
|
|
|
class ProductParser(XPathItemParser):
|
|
name_in = parsers.MapConcat(removetags, filterx)
|
|
price_in = parsers.MapConcat(...)
|
|
|
|
price_out = parsers.TakeFirst()
|
|
|
|
Usage example: declaring parsers in Fields
|
|
==========================================
|
|
|
|
::
|
|
|
|
#!python
|
|
class Product(Item):
|
|
name = Field(output_parser=parsers.Join(), ...)
|
|
price = Field(output_parser=parsers.TakeFirst(), ...)
|
|
|
|
description = Field(input_parser=parsers.MapConcat(removetags))
|