Moved Item Loader to its final location in scrapy.contrib.loader, and updated doc/tests

--HG-- rename : docs/experimental/itemparser.rst => docs/experimental/loaders.rst rename : scrapy/contrib/itemparser/__init__.py => scrapy/contrib/loader/__init__.py rename : scrapy/contrib/itemparser/common.py => scrapy/contrib/loader/common.py rename : scrapy/contrib/itemparser/parsers.py => scrapy/contrib/loader/processor.py rename : scrapy/tests/test_itemparser.py => scrapy/tests/test_contrib_loader.py
2025-02-23 18:44:05 +00:00 · 2009-08-12 16:49:07 -03:00 · 2009-08-12 16:49:07 -03:00 · 1dc592882b
commit 1dc592882b
parent 7cbbc3ffb0
7 changed files with 661 additions and 660 deletions
--- a/docs/experimental/itemparser.rst
+++ b/docs/experimental/itemparser.rst
@ -1,526 +0,0 @@
-.. _topics-itemparser:
-
-============
-Item Parsers
-============
-
-.. module:: scrapy.contrib.itemparser
-   :synopsis: Item Parser class
-
-Item Parser provide a convenient mechanism for populating scraped :ref:`Items
-<topics-newitems>`. Even though Items can be populated using their own
-dictionary-like API, the Item Parsers provide a much more convenient API for
-populating them from a scraping process, by automating some common tasks like
-parsing the raw extracted data before assigning it.
-
-In other words, :ref:`Items <topics-newitems>` provide the *container* of
-scraped data, while Item Parsers provide the mechanism for *populating* that
-container.
-
-Item Parsers are designed to provide a flexible, efficient and easy mechanism
-for extending and overriding different field parsing rules, either by spider,
-or by source format (HTML, XML, etc) without becoming a nightmare to maintain.
-
-Using Item Parsers to populate items
-====================================
-
-To use an Item Parser, you must first instantiate it. You can either
-instantiate it with an Item object or without one, in which case an Item is
-automatically instantiated in the Item Parser constructor using the Item class
-specified in the :attr:`ItemParser.default_item_class` attribute.
-
-Then, you start collecting values into the Item Parser, typically using using
-:ref:`XPath Selectors <topics-selectors>`. You can add more than one value to
-the same item field, the Item Parser will know how to "join" those values later
-using a proper parser function.
-
-Here is a typical Item Parser usage in a :ref:`Spider <topics-spiders>`, using
-the :ref:`Product item <topics-newitems-declaring>` declared in the :ref:`Items
-chapter <topics-newitems>`::
-
-    from scrapy.contrib.itemparser import XPathItemParser
-    from scrapy.xpath import HtmlXPathSelector
-    from myproject.items import Product
-
-    def parse(self, response):
-        p = XPathItemParser(item=Product(), response=response)
-        p.add_xpath('name', '//div[@class="product_name"]')
-        p.add_xpath('name', '//div[@class="product_title"]')
-        p.add_xpath('price', '//p[@id="price"]')
-        p.add_xpath('stock', '//p[@id="stock"]')
-        p.add_value('last_updated', 'today') # you can also use literal values
-        return p.populate_item()
-
-By quickly looking at that code we can see the ``name`` field is being
-extracted from two different XPath locations in the page:
-
-1. ``//div[@class="product_name"]``
-2. ``//div[@class="product_title"]``
-
-In other words, data is being collected by extracting it from two XPath
-locations, using the :meth:`~XPathItemParser.add_xpath` method. This is the data
-that will be assigned to the ``name`` field later.
-
-Afterwards, similar calls are used for ``price`` and ``stock`` fields, and
-finally the ``last_update`` field is populated directly with a literal value
-(``today``) using a different method: :meth:`~ItemParser.add_value`.
-
-Finally, when all data is collected, the :meth:`ItemParser.populate_item`
-method is called which actually populates and returns the item populated with
-the data previously extracted and collected with the
-:meth:`~XPathItemParser.add_xpath` and :meth:`~ItemParser.add_value` calls.
-
-.. _topics-itemparser-parsers:
-
-Input and Output parsers
-========================
-
-An Item Parser contains one input parser and one output parser for each (item)
-field. The input parser processes the extracted data as soon as it's received
-(through the :meth:`~XPathItemParser.add_xpath` or
-:meth:`~ItemParser.add_value` methods) and the result of the input parser is
-collected and kept inside the ItemParser. After collecting all data, the
-:meth:`ItemParser.populate_item` method is called to populate and get the
-populated :class:`~scrapy.newitem.Item` object.  That's when the output parser
-is called with the data previously collected (and processed using the input
-parser). The result of the output parser is the final value that gets assigned
-to the item.
-
-Let's see an example to illustrate how this input and output parsers are
-called for a particular field (the same applies for any other field)::
-
-    p = XPathItemParser(Product(), some_xpath_selector)
-    p.add_xpath('name', xpath1) # (1)
-    p.add_xpath('name', xpath2) # (2)
-    return p.populate_item() # (3)
-
-So what happens is:
-
-1. Data from ``xpath1`` is extracted, and passed through the *input parser* of
-   the ``name`` field. The result of the input parser is collected and kept in
-   the Item Parser (but not yet assigned to the item).
-
-2. Data from ``xpath2`` is extracted, and passed through the same *input
-   parser* used in (1). The result of the input parser is appended to the data
-   collected in (1) (if any).
-
-3. The data collected in (1) and (2) is passed through the *output parser* of
-   the ``name`` field. The result of the output parser is the value assigned to
-   the ``name`` field in the item.
-
-It's worth noticing that parsers are just callable objects, which are called
-with the data to be parsed, and return a parsed value. So you can use any
-function as input or output parser, provided they can receive only one
-positional (required) argument.
-
-The other thing you need to keep in mind is that the values returned by input
-parsers are collected internally (in lists) and then passed to output parsers
-to populate the fields, so output parsers should expect iterables as input. 
-
-Last, but not least, Scrapy comes with some :ref:`commonly used parsers
-<topics-itemparser-available-parsers>` built-in for convenience.
-
-
-Declaring Item Parsers
-======================
-
-Item Parsers are declared like Items, by using a class definition syntax. Here
-is an example::
-
-    from scrapy.contrib.itemparser import ItemParser
-    from scrapy.contrib.itemparser.parsers import TakeFirst, ApplyConcat, Join
-
-    class ProductParser(ItemParser):
-
-        default_expander = TakeFirst()
-
-        name_in = ApplyConcat(unicode.title)
-        name_out = Join()
-
-        price_in = ApplyConcat(unicode.strip)
-        price_out = TakeFirst()
-
-        # ...
-
-As you can see, input parsers are declared using the ``_in`` suffix while
-output parsers are declared using the ``_out`` suffix. And you can also declare
-a default input/output parsers using the
-:attr:`ItemParser.default_input_parser` and
-:attr:`ItemParser.default_output_parser` attributes.
-
-.. _topics-itemparser-parsers-declaring:
-
-Declaring Input and Output Parsers
-==================================
-
-As seen in the previous section, input and output parsers can be declared in
-the Item Parser definition, and it's very common to declare input parsers this
-way. However, there is one more place where you can specify the input and
-output parsers to use: in the :ref:`Item Field <topics-newitems-fields>`
-metadata. Here is an example::
-
-    from scrapy.newitem import Item, Field
-    from scrapy.contrib.itemparser.parser import ApplyConcat, Join, TakeFirst
-
-    from scrapy.utils.markup import remove_entities
-    from myproject.utils import filter_prices
-
-    class Product(Item):
-        name = Field(
-            input_parser=ApplyConcat(remove_entities),
-            output_parser=Join(),
-        )
-        price = Field(
-            default=0,
-            input_parser=ApplyConcat(remove_entities, filter_prices),
-            output_parser=TakeFirst(),
-        )
-
-The precedence order, for both input and output parsers, is as follows:
-
-1. Item Parser field-specific attributes: ``field_in`` and ``field_out`` (most
-   precedence)
-2. Field metadata (``input_parser`` and ``output_parser`` key)
-3. Item Parser defaults: :meth:`ItemParser.default_expander` and
-   :meth:`ItemParser.default_output_parser` (least precedence)
-
-See also: :ref:`topics-itemparser-extending`.
-
-.. _topics-itemparser-context:
-
-Item Parser Context
-===================
-
-The Item Parser Context is a dict of arbitrary key/values which is shared among
-all input and output parsers in the Item Parser. It can be passed when
-declaring, instantiating or using Item Parser. They are used to modify the
-behaviour of the input/output parsers.
-
-For example, suppose you have a function ``parse_length`` which receives a text
-value and extracts a length from it::
-
-    def parse_length(text, parser_context):
-        unit = parser_context.get('unit', 'm')
-        # ... length parsing code goes here ...
-        return parsed_length
-
-By accepting a ``parser_context`` argument the function is explicitly telling
-the Item Parser that is able to receive an Item Parser context, so the Item
-Parser passes the currently active context when calling it, and the parser
-function (``parse_length`` in this case) can thus use them.
-
-There are several ways to modify Item Parser context values:
-
-1. By modifying the currently active Item Parser context
-(:meth:`ItemParser.context` attribute)::
-
-    parser = ItemParser(product, unit='cm')
-    parser.context['unit'] = 'cm'
-
-2. On Item Parser instantiation (the keyword arguments of Item Parser
-   constructor are stored in the Item Parser context)::
-
-    p = ItemParser(product, unit='cm')
-
-2. On Item Parser declaration, for those input/output parsers that support
-   instatiating them with a Item Parser context. :class:`ApplyConcat` is one of
-   them::
-
-    class ProductParser(ItemParser):
-        length_out = ApplyConcat(parse_length, unit='cm')
-
-
-ItemParser objects
-==================
-
-.. class:: ItemParser([item], \**kwargs)
-
-    Return a new Item Parser for populating the given Item. If no item is
-    given, one is instantiated automatically using the class in
-    :attr:`default_item_class`.
-
-    The item and the remaining keyword arguments are assigned to the Parser
-    context (accesible through the :attr:`context` attribute).
-
-    .. method:: add_value(field_name, value)
-
-        Add the given ``value`` for the given field.
-
-        The value is passed through the :ref:`field input parser
-        <topics-itemparser-parsers>` and its result appened to the data
-        collected for that field. If the field already contains collected data,
-        the new data is added.
-
-        Examples::
-
-            parser.add_value('name', u'Color TV')
-            parser.add_value('colours', [u'white', u'blue'])
-            parser.add_value('length', u'100', default_unit='cm')
-
-    .. method:: replace_value(field_name, value)
-
-        Similar to :meth:`add_value` but replaces the collected data with the
-        new value instead of adding it.
-
-    .. method:: populate_item()
-
-        Populate the item with the data collected so far, and return it. The
-        data collected is first passed through the :ref:`field output parsers
-        <topics-itemparser-parsers>` to get the final value to assign to each
-        item field.
-
-    .. method:: get_collected_values(field_name)
-
-        Return the collected values for the given field.
-
-    .. method:: get_output_value(field_name)
-
-        Return the collected values parsed using the output parser, for the
-        given field. This method doesn't populate or modify the item at all.
-
-    .. method:: get_input_parser(field_name)
-
-        Return the input parser for the given field.
-
-    .. method:: get_output_parser(field_name)
-
-        Return the output parser for the given field.
-
-    .. attribute:: item
-
-        The :class:`~scrapy.newitem.Item` object being parsed by this Item
-        Parser.
-
-    .. attribute:: context
-
-        The currently active :ref:`Context <topics-itemparser-context>` of this
-        Item Parser.
-
-    .. attribute:: default_item_class
-
-        An Item class (or factory), used to instantiate items when not given in
-        the constructor.
-
-    .. attribute:: default_input_parser
-
-        The default input parser to use for those fields which don't specify
-        one.
-
-    .. attribute:: default_output_parser
-
-        The default output parser to use for those fields which don't specify
-        one.
-
-.. class:: XPathItemParser([item, selector, response], \**kwargs)
-
-    The :class:`XPathItemParser` class extends the :class:`ItemParser` class
-    providing more convenient mechanisms for extracting data from web pages
-    using :ref:`XPath selectors <topics-selectors>`.
-
-    :class:`XPathItemParser` objects accept two more additional parameters in
-    their constructors:
-
-    :param selector: The selector to extract data from, when using the
-        :meth:`add_xpath` or :meth:`replace_xpath` method.
-    :type selector: :class:`~scrapy.xpath.XPathSelector` object
-
-    :param response: The response used to construct the selector using the
-        :attr:`default_selector_class`, unless the selector argument is given,
-        in which case this argument is ignored.
-    :type response: :class:`~scrapy.http.Response` object
-
-    .. method:: add_xpath(field_name, xpath, re=None)
-
-        Similar to :meth:`ItemParser.add_value` but receives an XPath instead of a
-        value, which is used to extract a list of unicode strings from the
-        selector associated with this :class:`XPathItemParser`. If the ``re``
-        argument is given, it's used for extrating data from the selector using
-        the :meth:`~scrapy.xpath.XPathSelector.re` method.
-
-        :param xpath: the XPath to extract data from
-        :type xpath: str
-
-        :param re: a regular expression to use for extracting data from the
-            selected XPath region
-        :type re: str or compiled regex
-
-        Examples::
-
-            # HTML snippet: <p class="product-name">Color TV</p>
-            parser.add_xpath('name', '//p[@class="product-name"]')
-            # HTML snippet: <p id="price">the price is $1200</p>
-            parser.add_xpath('price', '//p[@id="price"]', re='the price is (.*)')
-
-    .. method:: replace_xpath(field_name, xpath, re=None)
-
-        Similar to :meth:`add_xpath` but replaces collected data instead of
-        adding it.
-
-    .. attribute:: default_selector_class
-
-        The class used to construct the :attr:`selector` of this
-        :class:`XPathItemParser`, if only a response is given in the constructor.
-        If a selector is given in the constructor this attribute is ignored.
-        This attribute is sometimes overridden in subclasses.
-
-    .. attribute:: selector
-
-        The :class:`~scrapy.xpath.XPathSelector` object to extract data from.
-        It's either the selector given in the constructor or one created from
-        the response given in the constructor using the
-        :attr:`default_selector_class`. This attribute is meant to be
-        read-only.
-
-.. _topics-itemparser-extending:
-
-Reusing and extending Item Parsers
-==================================
-
-As your project grows bigger and acquires more and more spiders, maintenance
-becomes a fundamental problem, specially when you have to deal with many
-different parsing rules for each spider, having a lot of exceptions, but also
-wanting to reuse the common parsers.
-
-Item Parsers are designed to ease the maintenance burden of parsing rules,
-without loosing flexibility and, at the same time, providing a convenient
-mechanism for extending and overriding them. For this reason Item Parsers
-support traditional Python class inheritance for dealing with differences of
-specific spiders (or group of spiders).
-
-Suppose, for example, that some particular site encloses their product names in
-three dashes (ie. ``---Plasma TV---``) and you don't want to end up scraping
-those dashes in the final product names.
-
-Here's how you can remove those dashes by reusing and extending the default
-Product Item Parser (``ProductParser``)::
-
-    from scrapy.contrib.itemparser.parsers import ApplyConcat
-    from myproject.itemparsers import ProductParser
-
-    def strip_dashes(x):
-        return x.strip('-')
-
-    class SiteSpecificParser(ProductParser):
-        name_in = ApplyConcat(ProductParser.name_in, strip_dashes)
-
-Another case where extending Item Parsers can be very helpful is when you have
-multiple source formats, for example XML and HTML. In the XML version you may
-want to remove ``CDATA`` occurrences. Here's an example of how to do it::
-
-    from scrapy.contrib.itemparser.parsers import ApplyConcat
-    from myproject.itemparsers import ProductParser
-    from myproject.utils.xml import remove_cdata
-
-    class XmlProductParser(ProductParser):
-        name_in = ApplyConcat(remove_cdata, ProductParser.name_in)
-
-And that's how you typically extend input parsers.
-
-As for output parsers, it is more common to declare them in the field metadata,
-as they usually depend only on the field and not on each specific site parsing
-rule (as input parsers do). See also:
-:ref:`topics-itemparser-parsers-declaring`.
-
-There are many other possible ways to extend, inherit and override your Item
-Parsers, and different Item Parsers hierarchies may fit better for different
-projects. Scrapy only provides the mechanism, it doesn't impose any specific
-organization of your Parsers collection - that's up to you and your project
-needs.
-
-.. _topics-itemparser-available-parsers:
-
-Available built-in parsers
-==========================
-
-Even though you can use any callable function as input and output parsers,
-Scrapy provides some commonly used parsers, which are described below. Some of
-them, like the :class:`ApplyConcat` (which is typically used as input parser)
-composes the output of several functions executed in order, to produce the
-final parsed value.
-
-Here is a list of all built-in parsers:
-
-.. _topics-itemparser-Applyconcat:
-
-ApplyConcat parser
------------------
-
-The ApplyConcat parser is the recommended parser to use if you want to
-concatenate the processing of several functions in a pipeline.
-
-.. module:: scrapy.contrib.itemparser.parsers
-   :synopsis: Parser functions to use with Item Parsers
-
-.. class:: ApplyConcat(\*functions, \**default_parser_context)
-
-    A parser which applies the given functions consecutively, in order,
-    concatenating their results before next function call. So each function
-    returns a list of values (though it could return ``None`` or a signle value
-    too) and the next function is called once for each of those values,
-    receiving one of those values as input each time. The output of each
-    function call (for each input value) is concatenated and each values of the
-    concatenation is used to call the next function, and the process repeats
-    until there are no functions left.
-    
-    Each function can optionally receive a ``parser_context`` parameter, which
-    will contain the currently active :ref:`Item Parser context
-    <topics-itemparser-context>`. 
-
-    The keyword arguments passed in the consturctor are used as the default
-    Item Parser context values passed on each function call. However, the final
-    Item Parser context values passed to funtions get overriden with the
-    currently active Item Parser context accesible through the
-    :meth:`ItemParser.context` attribute.
-
-    Example::
-
-        >>> def filter_world(x):
-        ...     return None if x == 'world' else x
-        ...
-        >>> from scrapy.contrib.itemparser.parsers import ApplyConcat
-        >>> parser = ApplyConcat(filter_world, str.upper)
-        >>> parser(['hello', 'world', 'this', 'is', 'scrapy'])
-        ['HELLO, 'THIS', 'IS', 'SCRAPY']
-
-.. class:: TakeFirst
-
-    Return the first non null/empty value from the values to received, so it's
-    typically used as output parser of single-valued fields. It doesn't receive
-    any constructor arguments, nor accepts a Item Parser context.
-
-    Example::
-
-        >>> from scrapy.contrib.itemparser.parsers import TakeFirst
-        >>> parser = TakeFirst()
-        >>> parser(['', 'one', 'two', 'three'])
-        'one'
-
-.. class:: Identity
-
-    Return the original values unchanged. It doesn't receive any constructor
-    arguments nor accepts a Item Parser context.
-
-    Example::
-
-        >>> from scrapy.contrib.itemparser.parsers import Identity
-        >>> parser = Identity()
-        >>> parser(['one', 'two', 'three'])
-        ['one', 'two', 'three']
-
-.. class:: Join(separator=u' ')
-
-    Return the values joined with the separator given in the constructor, which
-    defaults to ``u' '``. It doesn't accept a Item Parser context.
-
-    When using the default separator, this parser is equivalent to the
-    function: ``u' '.join``
-
-    Examples::
-
-        >>> from scrapy.contrib.itemparser.parsers import Join
-        >>> parser = Join()
-        >>> parser(['one', 'two', 'three'])
-        u'one two three'
-        >>> parser = Join('<br>')
-        >>> parser(['one', 'two', 'three'])
-        u'one<br>two<br>three'
--- a/docs/experimental/loaders.rst
+++ b/docs/experimental/loaders.rst
@ -0,0 +1,527 @@
+.. _topics-loaders:
+
+============
+Item Loaders
+============
+
+.. module:: scrapy.contrib.loader
+   :synopsis: Item Loader class
+
+Item Loaders provide a convenient mechanism for populating scraped :ref:`Items
+<topics-newitems>`. Even though Items can be populated using their own
+dictionary-like API, the Item Loaders provide a much more convenient API for
+populating them from a scraping process, by automating some common tasks like
+parsing the raw extracted data before assigning it.
+
+In other words, :ref:`Items <topics-newitems>` provide the *container* of
+scraped data, while Item Loaders provide the mechanism for *populating* that
+container.
+
+Item Loaders are designed to provide a flexible, efficient and easy mechanism
+for extending and overriding different field parsing rules, either by spider,
+or by source format (HTML, XML, etc) without becoming a nightmare to maintain.
+
+Using Item Loaders to populate items
+====================================
+
+To use an Item Loader, you must first instantiate it. You can either
+instantiate it with an Item object or without one, in which case an Item is
+automatically instantiated in the Item Loader constructor using the Item class
+specified in the :attr:`ItemLoader.default_item_class` attribute.
+
+Then, you start collecting values into the Item Loader, typically using using
+:ref:`XPath Selectors <topics-selectors>`. You can add more than one value to
+the same item field, the Item Loader will know how to "join" those values later
+using a proper processing function.
+
+Here is a typical Item Loader usage in a :ref:`Spider <topics-spiders>`, using
+the :ref:`Product item <topics-newitems-declaring>` declared in the :ref:`Items
+chapter <topics-newitems>`::
+
+    from scrapy.contrib.loader import XPathItemLoader
+    from scrapy.xpath import HtmlXPathSelector
+    from myproject.items import Product
+
+    def parse(self, response):
+        p = XPathItemLoader(item=Product(), response=response)
+        p.add_xpath('name', '//div[@class="product_name"]')
+        p.add_xpath('name', '//div[@class="product_title"]')
+        p.add_xpath('price', '//p[@id="price"]')
+        p.add_xpath('stock', '//p[@id="stock"]')
+        p.add_value('last_updated', 'today') # you can also use literal values
+        return p.populate_item()
+
+By quickly looking at that code we can see the ``name`` field is being
+extracted from two different XPath locations in the page:
+
+1. ``//div[@class="product_name"]``
+2. ``//div[@class="product_title"]``
+
+In other words, data is being collected by extracting it from two XPath
+locations, using the :meth:`~XPathItemLoader.add_xpath` method. This is the data
+that will be assigned to the ``name`` field later.
+
+Afterwards, similar calls are used for ``price`` and ``stock`` fields, and
+finally the ``last_update`` field is populated directly with a literal value
+(``today``) using a different method: :meth:`~ItemLoader.add_value`.
+
+Finally, when all data is collected, the :meth:`ItemLoader.populate_item`
+method is called which actually populates and returns the item populated with
+the data previously extracted and collected with the
+:meth:`~XPathItemLoader.add_xpath` and :meth:`~ItemLoader.add_value` calls.
+
+.. _topics-loaders-processors:
+
+Input and Output processors
+===========================
+
+An Item Loader contains one input processor and one output processor for each
+(item) field. The input processor processes the extracted data as soon as it's
+received (through the :meth:`~XPathItemLoader.add_xpath` or
+:meth:`~ItemLoader.add_value` methods) and the result of the input processor is
+collected and kept inside the ItemLoader. After collecting all data, the
+:meth:`ItemLoader.populate_item` method is called to populate and get the
+populated :class:`~scrapy.newitem.Item` object.  That's when the output processor
+is called with the data previously collected (and processed using the input
+processor). The result of the output processor is the final value that gets assigned
+to the item.
+
+Let's see an example to illustrate how this input and output processors are
+called for a particular field (the same applies for any other field)::
+
+    p = XPathItemLoader(Product(), some_xpath_selector)
+    p.add_xpath('name', xpath1) # (1)
+    p.add_xpath('name', xpath2) # (2)
+    return p.populate_item() # (3)
+
+So what happens is:
+
+1. Data from ``xpath1`` is extracted, and passed through the *input processor* of
+   the ``name`` field. The result of the input processor is collected and kept in
+   the Item Loader (but not yet assigned to the item).
+
+2. Data from ``xpath2`` is extracted, and passed through the same *input
+   processor* used in (1). The result of the input processor is appended to the
+   data collected in (1) (if any).
+
+3. The data collected in (1) and (2) is passed through the *output processor* of
+   the ``name`` field. The result of the output processor is the value assigned to
+   the ``name`` field in the item.
+
+It's worth noticing that processors are just callable objects, which are called
+with the data to be parsed, and return a parsed value. So you can use any
+function as input or output processor, provided they can receive only one
+positional (required) argument.
+
+The other thing you need to keep in mind is that the values returned by input
+processors are collected internally (in lists) and then passed to output
+processors to populate the fields, so output processors should expect iterables as
+input. 
+
+Last, but not least, Scrapy comes with some :ref:`commonly used processors
+<topics-loaders-available-processors>` built-in for convenience.
+
+
+Declaring Item Loaders
+======================
+
+Item Loaders are declared like Items, by using a class definition syntax. Here
+is an example::
+
+    from scrapy.contrib.loader import ItemLoader
+    from scrapy.contrib.loader.processor import TakeFirst, ApplyConcat, Join
+
+    class ProductLoader(ItemLoader):
+
+        default_input_processor = TakeFirst()
+
+        name_in = ApplyConcat(unicode.title)
+        name_out = Join()
+
+        price_in = ApplyConcat(unicode.strip)
+        price_out = TakeFirst()
+
+        # ...
+
+As you can see, input processors are declared using the ``_in`` suffix while
+output processors are declared using the ``_out`` suffix. And you can also
+declare a default input/output processors using the
+:attr:`ItemLoader.default_input_processor` and
+:attr:`ItemLoader.default_output_processor` attributes.
+
+.. _topics-loaders-processors-declaring:
+
+Declaring Input and Output Processors
+=====================================
+
+As seen in the previous section, input and output processors can be declared in
+the Item Loader definition, and it's very common to declare input processors
+this way. However, there is one more place where you can specify the input and
+output processors to use: in the :ref:`Item Field <topics-newitems-fields>`
+metadata. Here is an example::
+
+    from scrapy.newitem import Item, Field
+    from scrapy.contrib.loader.processor import ApplyConcat, Join, TakeFirst
+
+    from scrapy.utils.markup import remove_entities
+    from myproject.utils import filter_prices
+
+    class Product(Item):
+        name = Field(
+            input_processor=ApplyConcat(remove_entities),
+            output_processor=Join(),
+        )
+        price = Field(
+            default=0,
+            input_processor=ApplyConcat(remove_entities, filter_prices),
+            output_processor=TakeFirst(),
+        )
+
+The precedence order, for both input and output processors, is as follows:
+
+1. Item Loader field-specific attributes: ``field_in`` and ``field_out`` (most
+   precedence)
+2. Field metadata (``input_processor`` and ``output_processor`` key)
+3. Item Loader defaults: :meth:`ItemLoader.default_input_processor` and
+   :meth:`ItemLoader.default_output_processor` (least precedence)
+
+See also: :ref:`topics-loaders-extending`.
+
+.. _topics-loaders-context:
+
+Item Loader Context
+===================
+
+The Item Loader Context is a dict of arbitrary key/values which is shared among
+all input and output processors in the Item Loader. It can be passed when
+declaring, instantiating or using Item Loader. They are used to modify the
+behaviour of the input/output processors.
+
+For example, suppose you have a function ``parse_length`` which receives a text
+value and extracts a length from it::
+
+    def parse_length(text, loader_context):
+        unit = loader_context.get('unit', 'm')
+        # ... length parsing code goes here ...
+        return parsed_length
+
+By accepting a ``loader_context`` argument the function is explicitly telling
+the Item Loader that is able to receive an Item Loader context, so the Item
+Loader passes the currently active context when calling it, and the processor
+function (``parse_length`` in this case) can thus use them.
+
+There are several ways to modify Item Loader context values:
+
+1. By modifying the currently active Item Loader context
+(:meth:`ItemLoader.context` attribute)::
+
+    loader = ItemLoader(product, unit='cm')
+    loader.context['unit'] = 'cm'
+
+2. On Item Loader instantiation (the keyword arguments of Item Loader
+   constructor are stored in the Item Loader context)::
+
+    p = ItemLoader(product, unit='cm')
+
+2. On Item Loader declaration, for those input/output processors that support
+   instatiating them with a Item Loader context. :class:`ApplyConcat` is one of
+   them::
+
+    class ProductLoader(ItemLoader):
+        length_out = ApplyConcat(parse_length, unit='cm')
+
+
+ItemLoader objects
+==================
+
+.. class:: ItemLoader([item], \**kwargs)
+
+    Return a new Item Loader for populating the given Item. If no item is
+    given, one is instantiated automatically using the class in
+    :attr:`default_item_class`.
+
+    The item and the remaining keyword arguments are assigned to the Loader
+    context (accesible through the :attr:`context` attribute).
+
+    .. method:: add_value(field_name, value)
+
+        Add the given ``value`` for the given field.
+
+        The value is passed through the :ref:`field input processor
+        <topics-loaders-processors>` and its result appened to the data
+        collected for that field. If the field already contains collected data,
+        the new data is added.
+
+        Examples::
+
+            loader.add_value('name', u'Color TV')
+            loader.add_value('colours', [u'white', u'blue'])
+            loader.add_value('length', u'100', default_unit='cm')
+
+    .. method:: replace_value(field_name, value)
+
+        Similar to :meth:`add_value` but replaces the collected data with the
+        new value instead of adding it.
+
+    .. method:: populate_item()
+
+        Populate the item with the data collected so far, and return it. The
+        data collected is first passed through the :ref:`field output processors
+        <topics-loaders-processors>` to get the final value to assign to each
+        item field.
+
+    .. method:: get_collected_values(field_name)
+
+        Return the collected values for the given field.
+
+    .. method:: get_output_value(field_name)
+
+        Return the collected values parsed using the output processor, for the
+        given field. This method doesn't populate or modify the item at all.
+
+    .. method:: get_input_processor(field_name)
+
+        Return the input processor for the given field.
+
+    .. method:: get_output_processor(field_name)
+
+        Return the output processor for the given field.
+
+    .. attribute:: item
+
+        The :class:`~scrapy.newitem.Item` object being parsed by this Item
+        Loader.
+
+    .. attribute:: context
+
+        The currently active :ref:`Context <topics-loaders-context>` of this
+        Item Loader.
+
+    .. attribute:: default_item_class
+
+        An Item class (or factory), used to instantiate items when not given in
+        the constructor.
+
+    .. attribute:: default_input_processor
+
+        The default input processor to use for those fields which don't specify
+        one.
+
+    .. attribute:: default_output_processor
+
+        The default output processor to use for those fields which don't specify
+        one.
+
+.. class:: XPathItemLoader([item, selector, response], \**kwargs)
+
+    The :class:`XPathItemLoader` class extends the :class:`ItemLoader` class
+    providing more convenient mechanisms for extracting data from web pages
+    using :ref:`XPath selectors <topics-selectors>`.
+
+    :class:`XPathItemLoader` objects accept two more additional parameters in
+    their constructors:
+
+    :param selector: The selector to extract data from, when using the
+        :meth:`add_xpath` or :meth:`replace_xpath` method.
+    :type selector: :class:`~scrapy.xpath.XPathSelector` object
+
+    :param response: The response used to construct the selector using the
+        :attr:`default_selector_class`, unless the selector argument is given,
+        in which case this argument is ignored.
+    :type response: :class:`~scrapy.http.Response` object
+
+    .. method:: add_xpath(field_name, xpath, re=None)
+
+        Similar to :meth:`ItemLoader.add_value` but receives an XPath instead of a
+        value, which is used to extract a list of unicode strings from the
+        selector associated with this :class:`XPathItemLoader`. If the ``re``
+        argument is given, it's used for extrating data from the selector using
+        the :meth:`~scrapy.xpath.XPathSelector.re` method.
+
+        :param xpath: the XPath to extract data from
+        :type xpath: str
+
+        :param re: a regular expression to use for extracting data from the
+            selected XPath region
+        :type re: str or compiled regex
+
+        Examples::
+
+            # HTML snippet: <p class="product-name">Color TV</p>
+            loader.add_xpath('name', '//p[@class="product-name"]')
+            # HTML snippet: <p id="price">the price is $1200</p>
+            loader.add_xpath('price', '//p[@id="price"]', re='the price is (.*)')
+
+    .. method:: replace_xpath(field_name, xpath, re=None)
+
+        Similar to :meth:`add_xpath` but replaces collected data instead of
+        adding it.
+
+    .. attribute:: default_selector_class
+
+        The class used to construct the :attr:`selector` of this
+        :class:`XPathItemLoader`, if only a response is given in the constructor.
+        If a selector is given in the constructor this attribute is ignored.
+        This attribute is sometimes overridden in subclasses.
+
+    .. attribute:: selector
+
+        The :class:`~scrapy.xpath.XPathSelector` object to extract data from.
+        It's either the selector given in the constructor or one created from
+        the response given in the constructor using the
+        :attr:`default_selector_class`. This attribute is meant to be
+        read-only.
+
+.. _topics-loaders-extending:
+
+Reusing and extending Item Loaders
+==================================
+
+As your project grows bigger and acquires more and more spiders, maintenance
+becomes a fundamental problem, specially when you have to deal with many
+different parsing rules for each spider, having a lot of exceptions, but also
+wanting to reuse the common processors.
+
+Item Loaders are designed to ease the maintenance burden of parsing rules,
+without loosing flexibility and, at the same time, providing a convenient
+mechanism for extending and overriding them. For this reason Item Loaders
+support traditional Python class inheritance for dealing with differences of
+specific spiders (or group of spiders).
+
+Suppose, for example, that some particular site encloses their product names in
+three dashes (ie. ``---Plasma TV---``) and you don't want to end up scraping
+those dashes in the final product names.
+
+Here's how you can remove those dashes by reusing and extending the default
+Product Item Loader (``ProductLoader``)::
+
+    from scrapy.contrib.loader.processor import ApplyConcat
+    from myproject.ItemLoaders import ProductLoader
+
+    def strip_dashes(x):
+        return x.strip('-')
+
+    class SiteSpecificLoader(ProductLoader):
+        name_in = ApplyConcat(ProductLoader.name_in, strip_dashes)
+
+Another case where extending Item Loaders can be very helpful is when you have
+multiple source formats, for example XML and HTML. In the XML version you may
+want to remove ``CDATA`` occurrences. Here's an example of how to do it::
+
+    from scrapy.contrib.loader.processor import ApplyConcat
+    from myproject.ItemLoaders import ProductLoader
+    from myproject.utils.xml import remove_cdata
+
+    class XmlProductLoader(ProductLoader):
+        name_in = ApplyConcat(remove_cdata, ProductLoader.name_in)
+
+And that's how you typically extend input processors.
+
+As for output processors, it is more common to declare them in the field metadata,
+as they usually depend only on the field and not on each specific site parsing
+rule (as input processors do). See also:
+:ref:`topics-loaders-processors-declaring`.
+
+There are many other possible ways to extend, inherit and override your Item
+Loaders, and different Item Loaders hierarchies may fit better for different
+projects. Scrapy only provides the mechanism, it doesn't impose any specific
+organization of your Loaders collection - that's up to you and your project
+needs.
+
+.. _topics-loaders-available-processors:
+
+Available built-in processors
+=============================
+
+Even though you can use any callable function as input and output processors,
+Scrapy provides some commonly used processors, which are described below. Some
+of them, like the :class:`ApplyConcat` (which is typically used as input
+processor) composes the output of several functions executed in order, to
+produce the final parsed value.
+
+Here is a list of all built-in processors:
+
+.. _topics-loaders-applyconcat:
+
+ApplyConcat processor
+---------------------
+
+The ApplyConcat processor is the recommended processor to use if you want to
+concatenate the processing of several functions in a pipeline.
+
+.. module:: scrapy.contrib.loader.processor
+   :synopsis: A collection of processors to use with Item Loaders
+
+.. class:: ApplyConcat(\*functions, \**default_loader_context)
+
+    A processor which applies the given functions consecutively, in order,
+    concatenating their results before next function call. So each function
+    returns a list of values (though it could return ``None`` or a signle value
+    too) and the next function is called once for each of those values,
+    receiving one of those values as input each time. The output of each
+    function call (for each input value) is concatenated and each values of the
+    concatenation is used to call the next function, and the process repeats
+    until there are no functions left.
+    
+    Each function can optionally receive a ``loader_context`` parameter, which
+    will contain the currently active :ref:`Item Loader context
+    <topics-loaders-context>`. 
+
+    The keyword arguments passed in the consturctor are used as the default
+    Item Loader context values passed on each function call. However, the final
+    Item Loader context values passed to funtions get overriden with the
+    currently active Item Loader context accesible through the
+    :meth:`ItemLoader.context` attribute.
+
+    Example::
+
+        >>> def filter_world(x):
+        ...     return None if x == 'world' else x
+        ...
+        >>> from scrapy.contrib.loader.processor import ApplyConcat
+        >>> proc = ApplyConcat(filter_world, str.upper)
+        >>> proc(['hello', 'world', 'this', 'is', 'scrapy'])
+        ['HELLO, 'THIS', 'IS', 'SCRAPY']
+
+.. class:: TakeFirst
+
+    Return the first non null/empty value from the values to received, so it's
+    typically used as output processor of single-valued fields. It doesn't
+    receive any constructor arguments, nor accepts a Item Loader context.
+
+    Example::
+
+        >>> from scrapy.contrib.loader.processor import TakeFirst
+        >>> proc = TakeFirst()
+        >>> proc(['', 'one', 'two', 'three'])
+        'one'
+
+.. class:: Identity
+
+    Return the original values unchanged. It doesn't receive any constructor
+    arguments nor accepts a Item Loader context.
+
+    Example::
+
+        >>> from scrapy.contrib.loader.processor import Identity
+        >>> proc = Identity()
+        >>> proc(['one', 'two', 'three'])
+        ['one', 'two', 'three']
+
+.. class:: Join(separator=u' ')
+
+    Return the values joined with the separator given in the constructor, which
+    defaults to ``u' '``. It doesn't accept a Item Loader context.
+
+    When using the default separator, this processor is equivalent to the
+    function: ``u' '.join``
+
+    Examples::
+
+        >>> from scrapy.contrib.loader.processor import Join
+        >>> proc = Join()
+        >>> proc(['one', 'two', 'three'])
+        u'one two three'
+        >>> proc = Join('<br>')
+        >>> proc(['one', 'two', 'three'])
+        u'one<br>two<br>three'
--- a/scrapy/contrib/itemparser/common.py
+++ b/scrapy/contrib/itemparser/common.py
@ -1,13 +0,0 @@
-"""Common functions used in Item Parsers code"""
-
-from functools import partial
-from scrapy.utils.python import get_func_args
-
-def wrap_parser_context(function, context):
-    """Wrap functions that receive parser_context to contain those parser
-    arguments pre-loaded and expose a interface that receives only one argument
-    """
-    if 'parser_context' in get_func_args(function):
-        return partial(function, parser_context=context)
-    else:
-        return function
--- a/scrapy/contrib/itemparser/init.py
+++ b/scrapy/contrib/itemparser/init.py
@ -1,7 +1,7 @@
 """
-Item Parser
+Item Loader

-See documentation in docs/topics/itemparser.rst
+See documentation in docs/topics/loaders.rst
 """

 from collections import defaultdict
@ -9,14 +9,14 @@ from collections import defaultdict
 from scrapy.newitem import Item
 from scrapy.xpath import HtmlXPathSelector
 from scrapy.utils.misc import arg_to_iter
-from .common import wrap_parser_context
-from .parsers import Identity
+from .common import wrap_loader_context
+from .processor import Identity

-class ItemParser(object):
+class ItemLoader(object):

    default_item_class = Item
-    default_input_parser = Identity()
-    default_output_parser = Identity()
+    default_input_processor = Identity()
+    default_output_processor = Identity()

    def __init__(self, item=None, **context):
        if item is None:
@ -40,34 +40,34 @@ class ItemParser(object):
        return item

    def get_output_value(self, field_name):
-        parser = self.get_output_parser(field_name)
-        parser = wrap_parser_context(parser, self.context)
-        return parser(self._values[field_name])
+        proc = self.get_output_processor(field_name)
+        proc = wrap_loader_context(proc, self.context)
+        return proc(self._values[field_name])

    def get_collected_values(self, field_name):
        return self._values[field_name]

-    def get_input_parser(self, field_name):
-        parser = getattr(self, '%s_in' % field_name, None)
-        if not parser:
-            parser = self.item.fields[field_name].get('input_parser', \
-                self.default_input_parser)
-        return parser
+    def get_input_processor(self, field_name):
+        proc = getattr(self, '%s_in' % field_name, None)
+        if not proc:
+            proc = self.item.fields[field_name].get('input_processor', \
+                self.default_input_processor)
+        return proc

-    def get_output_parser(self, field_name):
-        parser = getattr(self, '%s_out' % field_name, None)
-        if not parser:
-            parser = self.item.fields[field_name].get('output_parser', \
-                self.default_output_parser)
-        return parser
+    def get_output_processor(self, field_name):
+        proc = getattr(self, '%s_out' % field_name, None)
+        if not proc:
+            proc = self.item.fields[field_name].get('output_processor', \
+                self.default_output_processor)
+        return proc

    def _parse_input_value(self, field_name, value):
-        parser = self.get_input_parser(field_name)
-        parser = wrap_parser_context(parser, self.context)
-        return parser(value)
+        proc = self.get_input_processor(field_name)
+        proc = wrap_loader_context(proc, self.context)
+        return proc(value)


-class XPathItemParser(ItemParser):
+class XPathItemLoader(ItemLoader):

    default_selector_class = HtmlXPathSelector

@ -79,7 +79,7 @@ class XPathItemParser(ItemParser):
            selector = self.default_selector_class(response)
        self.selector = selector
        context.update(selector=selector, response=response)
-        super(XPathItemParser, self).__init__(item, **context)
+        super(XPathItemLoader, self).__init__(item, **context)

    def add_xpath(self, field_name, xpath, re=None):
        self.add_value(field_name, self._get_values(field_name, xpath, re))
--- a/scrapy/contrib/loader/common.py
+++ b/scrapy/contrib/loader/common.py
@ -0,0 +1,13 @@
+"""Common functions used in Item Loaders code"""
+
+from functools import partial
+from scrapy.utils.python import get_func_args
+
+def wrap_loader_context(function, context):
+    """Wrap functions that receive loader_context to contain the context
+    "pre-loaded" and expose a interface that receives only one argument
+    """
+    if 'loader_context' in get_func_args(function):
+        return partial(function, loader_context=context)
+    else:
+        return function
--- a/scrapy/contrib/itemparser/parsers.py
+++ b/scrapy/contrib/itemparser/parsers.py
@ -1,26 +1,26 @@
 """
-This module provides some commonly used parser functions for Item Parsers.
+This module provides some commonly used processors for Item Loaders.

-See documentation in docs/topics/itemparser.rst
+See documentation in docs/topics/loaders.rst
 """

 from scrapy.utils.misc import arg_to_iter
 from scrapy.utils.datatypes import MergeDict
-from .common import wrap_parser_context
+from .common import wrap_loader_context

 class ApplyConcat(object):

-    def __init__(self, *functions, **default_parser_context):
+    def __init__(self, *functions, **default_loader_context):
        self.functions = functions
-        self.default_parser_context = default_parser_context
+        self.default_loader_context = default_loader_context
        
-    def __call__(self, value, parser_context=None):
+    def __call__(self, value, loader_context=None):
        values = arg_to_iter(value)
-        if parser_context:
-            context = MergeDict(parser_context, self.default_parser_context)
+        if loader_context:
+            context = MergeDict(loader_context, self.default_loader_context)
        else:
-            context = self.default_parser_context
-        wrapped_funcs = [wrap_parser_context(f, context) for f in self.functions]
+            context = self.default_loader_context
+        wrapped_funcs = [wrap_loader_context(f, context) for f in self.functions]
        for func in wrapped_funcs:
            next_values = []
            for v in values:
--- a/scrapy/tests/test_contrib_loader.py
+++ b/scrapy/tests/test_contrib_loader.py
@ -1,7 +1,7 @@
 import unittest

-from scrapy.contrib.itemparser import ItemParser, XPathItemParser
-from scrapy.contrib.itemparser.parsers import ApplyConcat, Join, Identity
+from scrapy.contrib.loader import ItemLoader, XPathItemLoader
+from scrapy.contrib.loader.processor import ApplyConcat, Join, Identity
 from scrapy.newitem import Item, Field
 from scrapy.xpath import HtmlXPathSelector
 from scrapy.http import HtmlResponse
@ -15,30 +15,30 @@ class TestItem(NameItem):
    url = Field()
    summary = Field()

-# test item parsers
+# test item loaders

-class NameItemParser(ItemParser):
+class NameItemLoader(ItemLoader):
    default_item_class = TestItem

-class TestItemParser(NameItemParser):
+class TestItemLoader(NameItemLoader):
    name_in = ApplyConcat(lambda v: v.title())

-class DefaultedItemParser(NameItemParser):
-    default_input_parser = ApplyConcat(lambda v: v[:-1])
+class DefaultedItemLoader(NameItemLoader):
+    default_input_processor = ApplyConcat(lambda v: v[:-1])

-# test parsers
+# test processors

-def parser_with_args(value, other=None, parser_context=None):
-    if 'key' in parser_context:
-        return parser_context['key']
+def processor_with_args(value, other=None, loader_context=None):
+    if 'key' in loader_context:
+        return loader_context['key']
    return value

-class ItemParserTest(unittest.TestCase):
+class ItemLoaderTest(unittest.TestCase):

    def test_populate_item_using_default_loader(self):
        i = TestItem()
        i['summary'] = u'lala'
-        ip = ItemParser(item=i)
+        ip = ItemLoader(item=i)
        ip.add_value('name', u'marta')
        item = ip.populate_item()
        assert item is i
@ -46,13 +46,13 @@ class ItemParserTest(unittest.TestCase):
        self.assertEqual(item['name'], [u'marta'])

    def test_populate_item_using_custom_loader(self):
-        ip = TestItemParser()
+        ip = TestItemLoader()
        ip.add_value('name', u'marta')
        item = ip.populate_item()
        self.assertEqual(item['name'], [u'Marta'])

    def test_add_value(self):
-        ip = TestItemParser()
+        ip = TestItemLoader()
        ip.add_value('name', u'marta')
        self.assertEqual(ip.get_collected_values('name'), [u'Marta'])
        self.assertEqual(ip.get_output_value('name'), [u'Marta'])
@ -61,7 +61,7 @@ class ItemParserTest(unittest.TestCase):
        self.assertEqual(ip.get_output_value('name'), [u'Marta', u'Pepe'])

    def test_replace_value(self):
-        ip = TestItemParser()
+        ip = TestItemLoader()
        ip.replace_value('name', u'marta')
        self.assertEqual(ip.get_collected_values('name'), [u'Marta'])
        self.assertEqual(ip.get_output_value('name'), [u'Marta'])
@ -69,208 +69,208 @@ class ItemParserTest(unittest.TestCase):
        self.assertEqual(ip.get_collected_values('name'), [u'Pepe'])
        self.assertEqual(ip.get_output_value('name'), [u'Pepe'])

-    def test_map_concat_filter(self):
+    def test_apply_concat_filter(self):
        def filter_world(x):
            return None if x == 'world' else x

-        parser = ApplyConcat(filter_world, str.upper)
-        self.assertEqual(parser(['hello', 'world', 'this', 'is', 'scrapy']),
+        proc = ApplyConcat(filter_world, str.upper)
+        self.assertEqual(proc(['hello', 'world', 'this', 'is', 'scrapy']),
                         ['HELLO', 'THIS', 'IS', 'SCRAPY'])

    def test_map_concat_filter_multiple_functions(self):
-        class TestItemParser(NameItemParser):
+        class TestItemLoader(NameItemLoader):
            name_in = ApplyConcat(lambda v: v.title(), lambda v: v[:-1])

-        ip = TestItemParser()
+        ip = TestItemLoader()
        ip.add_value('name', u'marta')
        self.assertEqual(ip.get_output_value('name'), [u'Mart'])
        item = ip.populate_item()
        self.assertEqual(item['name'], [u'Mart'])

-    def test_default_input_parser(self):
-        ip = DefaultedItemParser()
+    def test_default_input_processor(self):
+        ip = DefaultedItemLoader()
        ip.add_value('name', u'marta')
        self.assertEqual(ip.get_output_value('name'), [u'mart'])

-    def test_inherited_default_input_parser(self):
-        class InheritDefaultedItemParser(DefaultedItemParser):
+    def test_inherited_default_input_processor(self):
+        class InheritDefaultedItemLoader(DefaultedItemLoader):
            pass

-        ip = InheritDefaultedItemParser()
+        ip = InheritDefaultedItemLoader()
        ip.add_value('name', u'marta')
        self.assertEqual(ip.get_output_value('name'), [u'mart'])

-    def test_input_parser_inheritance(self):
-        class ChildItemParser(TestItemParser):
+    def test_input_processor_inheritance(self):
+        class ChildItemLoader(TestItemLoader):
            url_in = ApplyConcat(lambda v: v.lower())

-        ip = ChildItemParser()
+        ip = ChildItemLoader()
        ip.add_value('url', u'HTTP://scrapy.ORG')
        self.assertEqual(ip.get_output_value('url'), [u'http://scrapy.org'])
        ip.add_value('name', u'marta')
        self.assertEqual(ip.get_output_value('name'), [u'Marta'])

-        class ChildChildItemParser(ChildItemParser):
+        class ChildChildItemLoader(ChildItemLoader):
            url_in = ApplyConcat(lambda v: v.upper())
            summary_in = ApplyConcat(lambda v: v)

-        ip = ChildChildItemParser()
+        ip = ChildChildItemLoader()
        ip.add_value('url', u'http://scrapy.org')
        self.assertEqual(ip.get_output_value('url'), [u'HTTP://SCRAPY.ORG'])
        ip.add_value('name', u'marta')
        self.assertEqual(ip.get_output_value('name'), [u'Marta'])

    def test_empty_map_concat(self):
-        class IdentityDefaultedItemParser(DefaultedItemParser):
+        class IdentityDefaultedItemLoader(DefaultedItemLoader):
            name_in = ApplyConcat()

-        ip = IdentityDefaultedItemParser()
+        ip = IdentityDefaultedItemLoader()
        ip.add_value('name', u'marta')
        self.assertEqual(ip.get_output_value('name'), [u'marta'])

-    def test_identity_input_parser(self):
-        class IdentityDefaultedItemParser(DefaultedItemParser):
+    def test_identity_input_processor(self):
+        class IdentityDefaultedItemLoader(DefaultedItemLoader):
            name_in = Identity()

-        ip = IdentityDefaultedItemParser()
+        ip = IdentityDefaultedItemLoader()
        ip.add_value('name', u'marta')
        self.assertEqual(ip.get_output_value('name'), [u'marta'])

-    def test_extend_custom_input_parsers(self):
-        class ChildItemParser(TestItemParser):
-            name_in = ApplyConcat(TestItemParser.name_in, unicode.swapcase)
+    def test_extend_custom_input_processors(self):
+        class ChildItemLoader(TestItemLoader):
+            name_in = ApplyConcat(TestItemLoader.name_in, unicode.swapcase)

-        ip = ChildItemParser()
+        ip = ChildItemLoader()
        ip.add_value('name', u'marta')
        self.assertEqual(ip.get_output_value('name'), [u'mARTA'])

-    def test_extend_default_input_parsers(self):
-        class ChildDefaultedItemParser(DefaultedItemParser):
-            name_in = ApplyConcat(DefaultedItemParser.default_input_parser, unicode.swapcase)
+    def test_extend_default_input_processors(self):
+        class ChildDefaultedItemLoader(DefaultedItemLoader):
+            name_in = ApplyConcat(DefaultedItemLoader.default_input_processor, unicode.swapcase)

-        ip = ChildDefaultedItemParser()
+        ip = ChildDefaultedItemLoader()
        ip.add_value('name', u'marta')
        self.assertEqual(ip.get_output_value('name'), [u'MART'])

-    def test_output_parser_using_function(self):
-        ip = TestItemParser()
+    def test_output_processor_using_function(self):
+        ip = TestItemLoader()
        ip.add_value('name', [u'mar', u'ta'])
        self.assertEqual(ip.get_output_value('name'), [u'Mar', u'Ta'])

-        class TakeFirstItemParser(TestItemParser):
+        class TakeFirstItemLoader(TestItemLoader):
            name_out = u" ".join

-        ip = TakeFirstItemParser()
+        ip = TakeFirstItemLoader()
        ip.add_value('name', [u'mar', u'ta'])
        self.assertEqual(ip.get_output_value('name'), u'Mar Ta')

-    def test_output_parser_using_classes(self):
-        ip = TestItemParser()
+    def test_output_processor_using_classes(self):
+        ip = TestItemLoader()
        ip.add_value('name', [u'mar', u'ta'])
        self.assertEqual(ip.get_output_value('name'), [u'Mar', u'Ta'])

-        class TakeFirstItemParser(TestItemParser):
+        class TakeFirstItemLoader(TestItemLoader):
            name_out = Join()

-        ip = TakeFirstItemParser()
+        ip = TakeFirstItemLoader()
        ip.add_value('name', [u'mar', u'ta'])
        self.assertEqual(ip.get_output_value('name'), u'Mar Ta')

-        class TakeFirstItemParser(TestItemParser):
+        class TakeFirstItemLoader(TestItemLoader):
            name_out = Join("<br>")

-        ip = TakeFirstItemParser()
+        ip = TakeFirstItemLoader()
        ip.add_value('name', [u'mar', u'ta'])
        self.assertEqual(ip.get_output_value('name'), u'Mar<br>Ta')

-    def test_default_output_parser(self):
-        ip = TestItemParser()
+    def test_default_output_processor(self):
+        ip = TestItemLoader()
        ip.add_value('name', [u'mar', u'ta'])
        self.assertEqual(ip.get_output_value('name'), [u'Mar', u'Ta'])

-        class LalaItemParser(TestItemParser):
-            default_output_parser = Identity()
+        class LalaItemLoader(TestItemLoader):
+            default_output_processor = Identity()

-        ip = LalaItemParser()
+        ip = LalaItemLoader()
        ip.add_value('name', [u'mar', u'ta'])
        self.assertEqual(ip.get_output_value('name'), [u'Mar', u'Ta'])

-    def test_parser_context_on_declaration(self):
-        class ChildItemParser(TestItemParser):
-            url_in = ApplyConcat(parser_with_args, key=u'val')
+    def test_loader_context_on_declaration(self):
+        class ChildItemLoader(TestItemLoader):
+            url_in = ApplyConcat(processor_with_args, key=u'val')

-        ip = ChildItemParser()
+        ip = ChildItemLoader()
        ip.add_value('url', u'text')
        self.assertEqual(ip.get_output_value('url'), ['val'])
        ip.replace_value('url', u'text2')
        self.assertEqual(ip.get_output_value('url'), ['val'])

-    def test_parser_context_on_instantiation(self):
-        class ChildItemParser(TestItemParser):
-            url_in = ApplyConcat(parser_with_args)
+    def test_loader_context_on_instantiation(self):
+        class ChildItemLoader(TestItemLoader):
+            url_in = ApplyConcat(processor_with_args)

-        ip = ChildItemParser(key=u'val')
+        ip = ChildItemLoader(key=u'val')
        ip.add_value('url', u'text')
        self.assertEqual(ip.get_output_value('url'), ['val'])
        ip.replace_value('url', u'text2')
        self.assertEqual(ip.get_output_value('url'), ['val'])

-    def test_parser_context_on_assign(self):
-        class ChildItemParser(TestItemParser):
-            url_in = ApplyConcat(parser_with_args)
+    def test_loader_context_on_assign(self):
+        class ChildItemLoader(TestItemLoader):
+            url_in = ApplyConcat(processor_with_args)

-        ip = ChildItemParser()
+        ip = ChildItemLoader()
        ip.context['key'] = u'val'
        ip.add_value('url', u'text')
        self.assertEqual(ip.get_output_value('url'), ['val'])
        ip.replace_value('url', u'text2')
        self.assertEqual(ip.get_output_value('url'), ['val'])

-    def test_item_passed_to_input_parser_functions(self):
-        def parser(value, parser_context):
-            return parser_context['item']['name']
+    def test_item_passed_to_input_processor_functions(self):
+        def processor(value, loader_context):
+            return loader_context['item']['name']

-        class ChildItemParser(TestItemParser):
-            url_in = ApplyConcat(parser)
+        class ChildItemLoader(TestItemLoader):
+            url_in = ApplyConcat(processor)

        it = TestItem(name='marta')
-        ip = ChildItemParser(item=it)
+        ip = ChildItemLoader(item=it)
        ip.add_value('url', u'text')
        self.assertEqual(ip.get_output_value('url'), ['marta'])
        ip.replace_value('url', u'text2')
        self.assertEqual(ip.get_output_value('url'), ['marta'])

    def test_add_value_on_unknown_field(self):
-        ip = TestItemParser()
+        ip = TestItemLoader()
        self.assertRaises(KeyError, ip.add_value, 'wrong_field', [u'lala', u'lolo'])


-class TestXPathItemParser(XPathItemParser):
+class TestXPathItemLoader(XPathItemLoader):
    default_item_class = TestItem
    name_in = ApplyConcat(lambda v: v.title())

-class XPathItemParserTest(unittest.TestCase):
+class XPathItemLoaderTest(unittest.TestCase):

    def test_constructor_errors(self):
-        self.assertRaises(RuntimeError, XPathItemParser)
+        self.assertRaises(RuntimeError, XPathItemLoader)

    def test_constructor_with_selector(self):
        sel = HtmlXPathSelector(text=u"<html><body><div>marta</div></body></html>")
-        l = TestXPathItemParser(selector=sel)
+        l = TestXPathItemLoader(selector=sel)
        self.assert_(l.selector is sel)
        l.add_xpath('name', '//div/text()')
        self.assertEqual(l.get_output_value('name'), [u'Marta'])

    def test_constructor_with_response(self):
        response = HtmlResponse(url="", body="<html><body><div>marta</div></body></html>")
-        l = TestXPathItemParser(response=response)
+        l = TestXPathItemLoader(response=response)
        self.assert_(l.selector)
        l.add_xpath('name', '//div/text()')
        self.assertEqual(l.get_output_value('name'), [u'Marta'])

    def test_add_xpath_re(self):
        response = HtmlResponse(url="", body="<html><body><div>marta</div></body></html>")
-        l = TestXPathItemParser(response=response)
+        l = TestXPathItemLoader(response=response)
        l.add_xpath('name', '//div/text()', re='ma')
        self.assertEqual(l.get_output_value('name'), [u'Ma'])