Merge pull request #426 from scrapy/selectors-unified

[MRG] Selectors unified API
2025-02-25 11:24:24 +00:00 · 2013-10-16 12:58:12 -07:00 · 2013-10-16 12:58:12 -07:00 · 289688e39e
commit 289688e39e
parent 8bf3284ebf 1461363809
39 changed files with 1035 additions and 1074 deletions
--- a/docs/intro/overview.rst
+++ b/docs/intro/overview.rst
@ -143,13 +143,12 @@ Finally, here's the spider code::
        rules = [Rule(SgmlLinkExtractor(allow=['/tor/\d+']), 'parse_torrent')]
        
        def parse_torrent(self, response):
-            x = HtmlXPathSelector(response)
-
+            sel = Selector(response)
            torrent = TorrentItem()
            torrent['url'] = response.url
-            torrent['name'] = x.select("//h1/text()").extract()
-            torrent['description'] = x.select("//div[@id='description']").extract()
-            torrent['size'] = x.select("//div[@id='info-left']/p[2]/text()[2]").extract()
+            torrent['name'] = sel.xpath("//h1/text()").extract()
+            torrent['description'] = sel.xpath("//div[@id='description']").extract()
+            torrent['size'] = sel.xpath("//div[@id='info-left']/p[2]/text()[2]").extract()
            return torrent

 For brevity's sake, we intentionally left out the import statements. The
--- a/docs/intro/tutorial.rst
+++ b/docs/intro/tutorial.rst
@ -183,11 +183,12 @@ Introduction to Selectors
 ^^^^^^^^^^^^^^^^^^^^^^^^^

 There are several ways to extract data from web pages. Scrapy uses a mechanism
-based on `XPath`_ expressions called :ref:`XPath selectors <topics-selectors>`.
-For more information about selectors and other extraction mechanisms see the
-:ref:`XPath selectors documentation <topics-selectors>`.
+based on `XPath`_ or `CSS`_ expressions called :ref:`Scrapy Selectors
+<topics-selectors>`.  For more information about selectors and other extraction
+mechanisms see the :ref:`Selectors documentation <topics-selectors>`.

 .. _XPath: http://www.w3.org/TR/xpath
+.. _CSS: http://www.w3.org/TR/selectors

 Here are some examples of XPath expressions and their meanings:

@ -206,27 +207,28 @@ These are just a couple of simple examples of what you can do with XPath, but
 XPath expressions are indeed much more powerful. To learn more about XPath we
 recommend `this XPath tutorial <http://www.w3schools.com/XPath/default.asp>`_.

-For working with XPaths, Scrapy provides a :class:`~scrapy.selector.XPathSelector`
-class, which comes in two flavours, :class:`~scrapy.selector.HtmlXPathSelector`
-(for HTML data) and :class:`~scrapy.selector.XmlXPathSelector` (for XML data). In
-order to use them you must instantiate the desired class with a
-:class:`~scrapy.http.Response` object.
+For working with XPaths, Scrapy provides a :class:`~scrapy.selector.Selector`
+class, it is instantiated with a :class:`~scrapy.http.HtmlResponse` or
+:class:`~scrapy.http.XmlResponse` object as first argument.

 You can see selectors as objects that represent nodes in the document
 structure. So, the first instantiated selectors are associated to the root
 node, or the entire document.

-Selectors have three methods (click on the method to see the complete API
+Selectors have four basic methods (click on the method to see the complete API
 documentation).

-* :meth:`~scrapy.selector.XPathSelector.select`: returns a list of selectors, each of
+* :meth:`~scrapy.selector.Selector.xpath`: returns a list of selectors, each of
  them representing the nodes selected by the xpath expression given as
-  argument. 
+  argument.

-* :meth:`~scrapy.selector.XPathSelector.extract`: returns a unicode string with
-   the data selected by the XPath selector.
+* :meth:`~scrapy.selector.Selector.xpath`: returns a list of selectors, each of
+  them representing the nodes selected by the CSS expression given as argument. 

-* :meth:`~scrapy.selector.XPathSelector.re`: returns a list of unicode strings
+* :meth:`~scrapy.selector.Selector.extract`: returns a unicode string with the
+  selected data.
+
+* :meth:`~scrapy.selector.Selector.re`: returns a list of unicode strings
  extracted by applying the regular expression given as argument.


@ -253,12 +255,11 @@ This is what the shell looks like::

    [s] Available Scrapy objects:
    [s] 2010-08-19 21:45:59-0300 [default] INFO: Spider closed (finished)
-    [s]   hxs        <HtmlXPathSelector (http://www.dmoz.org/Computers/Programming/Languages/Python/Books/) xpath=None>
+    [s]   sel        <Selector (http://www.dmoz.org/Computers/Programming/Languages/Python/Books/) xpath=None>
    [s]   item       Item()
    [s]   request    <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
    [s]   response   <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
    [s]   spider     <BaseSpider 'default' at 0x1b6c2d0>
-    [s]   xxs        <XmlXPathSelector (http://www.dmoz.org/Computers/Programming/Languages/Python/Books/) xpath=None>
    [s] Useful shortcuts:
    [s]   shelp()           Print this help
    [s]   fetch(req_or_url) Fetch a new request or URL and update shell objects
@ -270,23 +271,25 @@ After the shell loads, you will have the response fetched in a local
 ``response`` variable, so if you type ``response.body`` you will see the body
 of the response, or you can type ``response.headers`` to see its headers.

-The shell also instantiates two selectors, one for HTML (in the ``hxs``
-variable) and one for XML (in the ``xxs`` variable) with this response. So let's
-try them::
+The shell also pre-instantiate a selector for this response in variable ``sel``,
+the selector automatically chooses the best parsing rules (XML vs HTML) based
+on response's type.

-   In [1]: hxs.select('//title')
-   Out[1]: [<HtmlXPathSelector (title) xpath=//title>]
+So let's try it::

-   In [2]: hxs.select('//title').extract()
+   In [1]: sel.xpath('//title')
+   Out[1]: [<Selector (title) xpath=//title>]
+
+   In [2]: sel.xpath('//title').extract()
   Out[2]: [u'<title>Open Directory - Computers: Programming: Languages: Python: Books</title>']

-   In [3]: hxs.select('//title/text()')
-   Out[3]: [<HtmlXPathSelector (text) xpath=//title/text()>]
+   In [3]: sel.xpath('//title/text()')
+   Out[3]: [<Selector (text) xpath=//title/text()>]

-   In [4]: hxs.select('//title/text()').extract()
+   In [4]: sel.xpath('//title/text()').extract()
   Out[4]: [u'Open Directory - Computers: Programming: Languages: Python: Books']

-   In [5]: hxs.select('//title/text()').re('(\w+):')
+   In [5]: sel.xpath('//title/text()').re('(\w+):')
   Out[5]: [u'Computers', u'Programming', u'Languages', u'Python']

 Extracting the data
@ -306,29 +309,29 @@ is inside a ``<ul>`` element, in fact the *second* ``<ul>`` element.
 So we can select each ``<li>`` element belonging to the sites list with this
 code::

-   hxs.select('//ul/li')
+   sel.xpath('//ul/li')

 And from them, the sites descriptions::

-   hxs.select('//ul/li/text()').extract()
+   sel.xpath('//ul/li/text()').extract()

 The sites titles::

-   hxs.select('//ul/li/a/text()').extract()
+   sel.xpath('//ul/li/a/text()').extract()

 And the sites links::

-   hxs.select('//ul/li/a/@href').extract()
+   sel.xpath('//ul/li/a/@href').extract()

-As we said before, each ``select()`` call returns a list of selectors, so we can
-concatenate further ``select()`` calls to dig deeper into a node. We are going to use
+As we said before, each ``.xpath()`` call returns a list of selectors, so we can
+concatenate further ``.xpath()`` calls to dig deeper into a node. We are going to use
 that property here, so::

-   sites = hxs.select('//ul/li')
+   sites = sel.xpath('//ul/li')
   for site in sites:
-       title = site.select('a/text()').extract()
-       link = site.select('a/@href').extract()
-       desc = site.select('text()').extract()
+       title = site.xpath('a/text()').extract()
+       link = site.xpath('a/@href').extract()
+       desc = site.xpath('text()').extract()
       print title, link, desc

 .. note::
@ -341,7 +344,7 @@ that property here, so::
 Let's add this code to our spider::

   from scrapy.spider import BaseSpider
-   from scrapy.selector import HtmlXPathSelector
+   from scrapy.selector import Selector

   class DmozSpider(BaseSpider):
       name = "dmoz"
@ -352,12 +355,12 @@ Let's add this code to our spider::
       ]
       
       def parse(self, response):
-           hxs = HtmlXPathSelector(response)
-           sites = hxs.select('//ul/li')
+           sel = Selector(response)
+           sites = sel.xpath('//ul/li')
           for site in sites:
-               title = site.select('a/text()').extract()
-               link = site.select('a/@href').extract()
-               desc = site.select('text()').extract()
+               title = site.xpath('a/text()').extract()
+               link = site.xpath('a/@href').extract()
+               desc = site.xpath('text()').extract()
               print title, link, desc

 Now try crawling the dmoz.org domain again and you'll see sites being printed
@ -382,7 +385,7 @@ Spiders are expected to return their scraped data inside
 scraped so far, the final code for our Spider would be like this::

   from scrapy.spider import BaseSpider
-   from scrapy.selector import HtmlXPathSelector
+   from scrapy.selector import Selector

   from tutorial.items import DmozItem

@ -395,14 +398,14 @@ scraped so far, the final code for our Spider would be like this::
      ]
       
      def parse(self, response):
-          hxs = HtmlXPathSelector(response)
-          sites = hxs.select('//ul/li')
+          sel = Selector(response)
+          sites = sel.xpath('//ul/li')
          items = []
          for site in sites:
              item = DmozItem()
-              item['title'] = site.select('a/text()').extract()
-              item['link'] = site.select('a/@href').extract()
-              item['desc'] = site.select('text()').extract()
+              item['title'] = site.xpath('a/text()').extract()
+              item['link'] = site.xpath('a/@href').extract()
+              item['desc'] = site.xpath('text()').extract()
              items.append(item)
          return items

--- a/docs/news.rst
+++ b/docs/news.rst
@ -9,6 +9,9 @@ Release notes
 - Request/Response url/body attributes are now immutable (modifying them had
  been deprecated for a long time)
 - :setting:`ITEM_PIPELINES` is now defined as a dict (instead of a list)
+- Dropped libxml2 selectors backend
+- Dropped support for multiple selectors backends, sticking to lxml only
+- Selector Unified API with support for CSS expressions (:issue:`395` and :issue:`426`)

 0.18.4 (released 2013-10-10)
 ----------------------------
--- a/docs/topics/extensions.rst
+++ b/docs/topics/extensions.rst
@ -248,7 +248,6 @@ Memory debugger extension
 An extension for debugging memory usage. It collects information about:

 * objects uncollected by the Python garbage collector
-* libxml2 memory leaks
 * objects left alive that shouldn't. For more info, see :ref:`topics-leaks-trackrefs`

 To enable this extension, turn on the :setting:`MEMDEBUG_ENABLED` setting. The
--- a/docs/topics/firebug.rst
+++ b/docs/topics/firebug.rst
@ -107,7 +107,7 @@ Now we're going to write the code to extract data from those pages.

 With the help of Firebug, we'll take a look at some page containing links to
 websites (say http://directory.google.com/Top/Arts/Awards/) and find out how we can
-extract those links using :ref:`XPath selectors <topics-selectors>`. We'll also
+extract those links using :ref:`Selectors <topics-selectors>`. We'll also
 use the :ref:`Scrapy shell <topics-shell>` to test those XPath's and make sure
 they work as we expect.

@ -146,16 +146,16 @@ that have that grey colour of the links,
 Finally, we can write our ``parse_category()`` method::

    def parse_category(self, response):
-        hxs = HtmlXPathSelector(response)
+        sel = Selector(response)

        # The path to website links in directory page
-        links = hxs.select('//td[descendant::a[contains(@href, "#pagerank")]]/following-sibling::td/font')
+        links = sel.xpath('//td[descendant::a[contains(@href, "#pagerank")]]/following-sibling::td/font')

        for link in links:
            item = DirectoryItem()
-            item['name'] = link.select('a/text()').extract()
-            item['url'] = link.select('a/@href').extract()
-            item['description'] = link.select('font[2]/text()').extract()
+            item['name'] = link.xpath('a/text()').extract()
+            item['url'] = link.xpath('a/@href').extract()
+            item['description'] = link.xpath('font[2]/text()').extract()
            yield item


--- a/docs/topics/leaks.rst
+++ b/docs/topics/leaks.rst
@ -67,7 +67,7 @@ alias to the :func:`~scrapy.utils.trackref.print_live_refs` function::

    ExampleSpider                       1   oldest: 15s ago
    HtmlResponse                       10   oldest: 1s ago
-    XPathSelector                       2   oldest: 0s ago
+    Selector                            2   oldest: 0s ago
    FormRequest                       878   oldest: 7s ago

 As you can see, that report also shows the "age" of the oldest object in each
@ -87,9 +87,8 @@ subclasses):
 * ``scrapy.http.Request``
 * ``scrapy.http.Response``
 * ``scrapy.item.Item``
-* ``scrapy.selector.XPathSelector``
+* ``scrapy.selector.Selector``
 * ``scrapy.spider.BaseSpider``
-* ``scrapy.selector.document.Libxml2Document``

 A real example
 --------------
@ -117,7 +116,7 @@ references::

    SomenastySpider                     1   oldest: 15s ago
    HtmlResponse                     3890   oldest: 265s ago
-    XPathSelector                       2   oldest: 0s ago
+    Selector                            2   oldest: 0s ago
    Request                          3878   oldest: 250s ago

 The fact that there are so many live responses (and that they're so old) is
--- a/docs/topics/loaders.rst
+++ b/docs/topics/loaders.rst
@ -31,7 +31,7 @@ using the Item class specified in the :attr:`ItemLoader.default_item_class`
 attribute.

 Then, you start collecting values into the Item Loader, typically using
-:ref:`XPath Selectors <topics-selectors>`. You can add more than one value to
+:ref:`Selectors <topics-selectors>`. You can add more than one value to
 the same item field; the Item Loader will know how to "join" those values later
 using a proper processing function.

@ -352,14 +352,14 @@ ItemLoader objects

    The :class:`XPathItemLoader` class extends the :class:`ItemLoader` class
    providing more convenient mechanisms for extracting data from web pages
-    using :ref:`XPath selectors <topics-selectors>`.
+    using :ref:`selectors <topics-selectors>`.

    :class:`XPathItemLoader` objects accept two more additional parameters in
    their constructors:

    :param selector: The selector to extract data from, when using the
        :meth:`add_xpath` or :meth:`replace_xpath` method.
-    :type selector: :class:`~scrapy.selector.XPathSelector` object
+    :type selector: :class:`~scrapy.selector.Selector` object

    :param response: The response used to construct the selector using the
        :attr:`default_selector_class`, unless the selector argument is given,
@ -418,7 +418,7 @@ ItemLoader objects

    .. attribute:: selector

-        The :class:`~scrapy.selector.XPathSelector` object to extract data from.
+        The :class:`~scrapy.selector.Selector` object to extract data from.
        It's either the selector given in the constructor or one created from
        the response given in the constructor using the
        :attr:`default_selector_class`. This attribute is meant to be
@ -592,7 +592,7 @@ Here is a list of all built-in processors:
    work with single values (instead of iterables). For this reason the
    :class:`MapCompose` processor is typically used as input processor, since
    data is often extracted using the
-    :meth:`~scrapy.selector.XPathSelector.extract` method of :ref:`selectors
+    :meth:`~scrapy.selector.Selector.extract` method of :ref:`selectors
    <topics-selectors>`, which returns a list of unicode strings.

    The example below should clarify how it works::
--- a/docs/topics/selectors.rst
+++ b/docs/topics/selectors.rst
@ -6,39 +6,43 @@ Selectors

 When you're scraping web pages, the most common task you need to perform is
 to extract data from the HTML source. There are several libraries available to
-achieve this: 
+achieve this:

 * `BeautifulSoup`_ is a very popular screen scraping library among Python
-   programmers which constructs a Python object based on the
-   structure of the HTML code and also deals with bad markup reasonably well,
-   but it has one drawback: it's slow.
+   programmers which constructs a Python object based on the structure of the
+   HTML code and also deals with bad markup reasonably well, but it has one
+   drawback: it's slow.

 * `lxml`_ is a XML parsing library (which also parses HTML) with a pythonic
   API based on `ElementTree`_ (which is not part of the Python standard
   library).

-Scrapy comes with its own mechanism for extracting data. They're called XPath
-selectors (or just "selectors", for short) because they "select" certain parts
-of the HTML document specified by `XPath`_ expressions.
+Scrapy comes with its own mechanism for extracting data. They're called
+selectors because they "select" certain parts of the HTML document specified
+either by `XPath`_ or `CSS`_ expressions.

-`XPath`_ is a language for selecting nodes in XML documents, which can also be used with HTML.
+`XPath`_ is a language for selecting nodes in XML documents, which can also be
+used with HTML. `CSS`_ is a language for applying styles to HTML documents. It
+defines selectors to associate those styles with specific HTML elements.

-Both `lxml`_ and Scrapy Selectors are built over the `libxml2`_ library, which
-means they're very similar in speed and parsing accuracy.
+Scrapy selectors are built over the `lxml`_ library, which means they're very
+similar in speed and parsing accuracy.

 This page explains how selectors work and describes their API which is very
 small and simple, unlike the `lxml`_ API which is much bigger because the
 `lxml`_ library can be used for many other tasks, besides selecting markup
 documents.

-For a complete reference of the selectors API see the :ref:`XPath selector
-reference <topics-selectors-ref>`.
+For a complete reference of the selectors API see
+:ref:`Selector reference <topics-selectors-ref>`

 .. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
 .. _lxml: http://codespeak.net/lxml/
 .. _ElementTree: http://docs.python.org/library/xml.etree.elementtree.html
-.. _libxml2: http://xmlsoft.org/
+.. _cssselect: https://pypi.python.org/pypi/cssselect/
 .. _XPath: http://www.w3.org/TR/xpath
+.. _CSS: http://www.w3.org/TR/selectors
+

 Using selectors
 ===============
@ -46,24 +50,29 @@ Using selectors
 Constructing selectors
 ----------------------

-There are two types of selectors bundled with Scrapy. Those are:
-
- * :class:`~scrapy.selector.HtmlXPathSelector` - for working with HTML documents
-
- * :class:`~scrapy.selector.XmlXPathSelector` - for working with XML documents
-
 .. highlight:: python

-Both share the same selector API, and are constructed with a Response object as
-their first parameter. This is the Response they're going to be "selecting".
+Scrapy selectors are instances of :class:`~scrapy.selector.Selector` class
+constructed by passing a `Response` object as first argument, the response's
+body is what they're going to be "selecting"::

-Example::
+    from scrapy.spider import BaseSpider
+    from scrapy.selector import Selector

-    hxs = HtmlXPathSelector(response) # a HTML selector
-    xxs = XmlXPathSelector(response) # a XML selector
+    class MySpider(BaseSpider):
+        # ...
+        def parse(self, response):
+            sel = Selector(response)
+            # Using XPath query
+            print sel.xpath('//p')
+            # Using CSS query
+            print sel.css('p')
+            # Nesting queries
+            print sel.xpath('//div[@foo="bar"]').css('span#bold')

-Using selectors with XPaths
---------------------------
+
+Using selectors
+---------------

 To explain how to use the selectors we'll use the `Scrapy shell` (which
 provides interactive testing) and an example page located in the Scrapy
@ -84,78 +93,82 @@ First, let's open the shell::

    scrapy shell http://doc.scrapy.org/en/latest/_static/selectors-sample1.html

-Then, after the shell loads, you'll have some selectors already instantiated and
-ready to use.
+Then, after the shell loads, you'll have a selector already instantiated and
+ready to use in ``sel`` shell variable.

-Since we're dealing with HTML, we'll be using the
-:class:`~scrapy.selector.HtmlXPathSelector` object which is found, by default, in
-the ``hxs`` shell variable.
+Since we're dealing with HTML, the selector will automatically use an HTML parser.

 .. highlight:: python

-So, by looking at the :ref:`HTML code <topics-selectors-htmlcode>` of that page,
-let's construct an XPath (using an HTML selector) for selecting the text inside
-the title tag::
+So, by looking at the :ref:`HTML code <topics-selectors-htmlcode>` of that
+page, let's construct an XPath (using an HTML selector) for selecting the text
+inside the title tag::

-    >>> hxs.select('//title/text()')
-    [<HtmlXPathSelector (text) xpath=//title/text()>]
+    >>> sel.xpath('//title/text()')
+    [<Selector (text) xpath=//title/text()>]

-As you can see, the select() method returns an XPathSelectorList, which is a list of
-new selectors. This API can be used quickly for extracting nested data. 
+As you can see, the ``.xpath()`` method returns an
+:class:`~scrapy.selector.SelectorList` instance, which is a list of new
+selectors. This API can be used quickly for extracting nested data.

-To actually extract the textual data, you must call the selector ``extract()``
+To actually extract the textual data, you must call the selector ``.extract()``
 method, as follows::

-    >>> hxs.select('//title/text()').extract()
+    >>> sel.xpath('//title/text()').extract()
+    [u'Example website']
+
+Notice that CSS selectors can select text or attribute nodes using CSS3
+pseudo-elements::
+
+    >>> sel.css('title::text').extract()
    [u'Example website']

 Now we're going to get the base URL and some image links::

-    >>> hxs.select('//base/@href').extract()
+    >>> sel.xpath('//base/@href').extract()
    [u'http://example.com/']

-    >>> hxs.select('//a[contains(@href, "image")]/@href').extract()
+    >>> sel.css('base::attr(href)').extract()
+    [u'http://example.com/']
+
+    >>> sel.xpath('//a[contains(@href, "image")]/@href').extract()
    [u'image1.html',
     u'image2.html',
     u'image3.html',
     u'image4.html',
     u'image5.html']

-    >>> hxs.select('//a[contains(@href, "image")]/img/@src').extract()
+    >>> sel.css('a[href*=image]::attr(href)').extract()
+    [u'image1.html',
+     u'image2.html',
+     u'image3.html',
+     u'image4.html',
+     u'image5.html']
+
+    >>> sel.xpath('//a[contains(@href, "image")]/img/@src').extract()
    [u'image1_thumb.jpg',
     u'image2_thumb.jpg',
     u'image3_thumb.jpg',
     u'image4_thumb.jpg',
     u'image5_thumb.jpg']

-
-Using selectors with regular expressions
----------------------------------------
-
-Selectors also have a ``re()`` method for extracting data using regular
-expressions. However, unlike using the ``select()`` method, the ``re()`` method
-does not return a list of :class:`~scrapy.selector.XPathSelector` objects, so you
-can't construct nested ``.re()`` calls. 
-
-Here's an example used to extract images names from the :ref:`HTML code
-<topics-selectors-htmlcode>` above::
-
-    >>> hxs.select('//a[contains(@href, "image")]/text()').re(r'Name:\s*(.*)')
-    [u'My image 1',
-     u'My image 2',
-     u'My image 3',
-     u'My image 4',
-     u'My image 5']
+    >>> sel.css('a[href*=image] img::attr(src)').extract()
+    [u'image1_thumb.jpg',
+     u'image2_thumb.jpg',
+     u'image3_thumb.jpg',
+     u'image4_thumb.jpg',
+     u'image5_thumb.jpg']

 .. _topics-selectors-nesting-selectors:

 Nesting selectors
 -----------------

-The ``select()`` selector method returns a list of selectors, so you can call the
-``select()`` for those selectors too. Here's an example::
+The selection methods (``.xpath()`` or ``.css()``) returns a list of selectors
+of the same type, so you can call the selection methods for those selectors
+too. Here's an example::

-    >>> links = hxs.select('//a[contains(@href, "image")]')
+    >>> links = sel.xpath('//a[contains(@href, "image")]')
    >>> links.extract()
    [u'<a href="image1.html">Name: My image 1 <br><img src="image1_thumb.jpg"></a>',
     u'<a href="image2.html">Name: My image 2 <br><img src="image2_thumb.jpg"></a>',
@ -164,7 +177,7 @@ The ``select()`` selector method returns a list of selectors, so you can call th
     u'<a href="image5.html">Name: My image 5 <br><img src="image5_thumb.jpg"></a>']

    >>> for index, link in enumerate(links):
-            args = (index, link.select('@href').extract(), link.select('img/@src').extract())
+            args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract())
            print 'Link number %d points to url %s and image %s' % args

    Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg']
@ -173,35 +186,53 @@ The ``select()`` selector method returns a list of selectors, so you can call th
    Link number 3 points to url [u'image4.html'] and image [u'image4_thumb.jpg']
    Link number 4 points to url [u'image5.html'] and image [u'image5_thumb.jpg']

+Using selectors with regular expressions
+----------------------------------------
+
+:class:`~scrapy.selector.Selector` also have a ``.re()`` method for extracting
+data using regular expressions. However, unlike using ``.xpath()`` or
+``.css()`` methods, ``.re()`` method returns a list of unicode strings. So you
+can't construct nested ``.re()`` calls.
+
+Here's an example used to extract images names from the :ref:`HTML code
+<topics-selectors-htmlcode>` above::
+
+    >>> sel.xpath('//a[contains(@href, "image")]/text()').re(r'Name:\s*(.*)')
+    [u'My image 1',
+     u'My image 2',
+     u'My image 3',
+     u'My image 4',
+     u'My image 5']
+
 .. _topics-selectors-relative-xpaths:

 Working with relative XPaths
 ----------------------------

-Keep in mind that if you are nesting XPathSelectors and use an XPath that
-starts with ``/``, that XPath will be absolute to the document and not relative
-to the ``XPathSelector`` you're calling it from.
+Keep in mind that if you are nesting selectors and use an XPath that starts
+with ``/``, that XPath will be absolute to the document and not relative to the
+``Selector`` you're calling it from.

 For example, suppose you want to extract all ``<p>`` elements inside ``<div>``
 elements. First, you would get all ``<div>`` elements::

-    >>> divs = hxs.select('//div')
+    >>> divs = sel.xpath('//div')

 At first, you may be tempted to use the following approach, which is wrong, as
 it actually extracts all ``<p>`` elements from the document, not only those
 inside ``<div>`` elements::

-    >>> for p in divs.select('//p') # this is wrong - gets all <p> from the whole document
+    >>> for p in divs.xpath('//p')  # this is wrong - gets all <p> from the whole document
    >>>     print p.extract()

 This is the proper way to do it (note the dot prefixing the ``.//p`` XPath)::

-    >>> for p in divs.select('.//p') # extracts all <p> inside
+    >>> for p in divs.xpath('.//p')  # extracts all <p> inside
    >>>     print p.extract()

 Another common case would be to extract all direct ``<p>`` children::

-    >>> for p in divs.select('p')
+    >>> for p in divs.xpath('p')
    >>>     print p.extract()

 For more details about relative XPaths see the `Location Paths`_ section in the
@ -212,175 +243,170 @@ XPath specification.

 .. _topics-selectors-ref:

-Built-in XPath Selectors reference
-==================================
+Built-in Selectors reference
+============================

 .. module:: scrapy.selector
-   :synopsis: XPath selectors classes
+   :synopsis: Selector class

-There are two types of selectors bundled with Scrapy:
-:class:`HtmlXPathSelector` and :class:`XmlXPathSelector`. Both of them
-implement the same :class:`XPathSelector` interface. The only different is that
-one is used to process HTML data and the other XML data.
+.. class:: Selector(response=None, text=None, type=None)

-XPathSelector objects
---------------------
+  An instance of :class:`Selector` is a wrapper over response to select
+  certain parts of its content.

-.. class:: XPathSelector(response)
+  ``response`` is a :class:`~scrapy.http.HtmlResponse` or
+  :class:`~scrapy.http.XmlResponse` object that will be used for selecting and
+  extracting data.

-   A :class:`XPathSelector` object is a wrapper over response to select
-   certain parts of its content.
+  ``text`` is a unicode string or utf-8 encoded text for cases when a
+  ``response`` isn't available. Using ``text`` and ``response`` together is
+  undefined behavior.

-   ``response`` is a :class:`~scrapy.http.Response` object that will be used
-   for selecting and extracting data 
+  ``type`` defines the selector type, it can be ``"html"``, ``"xml"`` or ``None`` (default).

-   .. method:: select(xpath)
+    If ``type`` is ``None``, the selector automatically chooses the best type
+    based on ``response`` type (see below), or defaults to ``"html"`` in case it
+    is used together with ``text``.

-       Apply the given XPath relative to this XPathSelector and return a list
-       of :class:`XPathSelector` objects (ie. a :class:`XPathSelectorList`) with
-       the result.
+    If ``type`` is ``None`` and a ``response`` is passed, the selector type is
+    inferred from the response type as follow:

-       ``xpath`` is a string containing the XPath to apply
+        * ``"html"`` for :class:`~scrapy.http.HtmlResponse` type
+        * ``"xml"`` for :class:`~scrapy.http.XmlResponse` type
+        * ``"html"`` for anything else

-   .. method:: re(regex)
+   Otherwise, if ``type`` is set, the selector type will be forced and no
+   detection will occur.

-       Apply the given regex and return a list of unicode strings with the
-       matches.
+  .. method:: xpath(query)

-       ``regex`` can be either a compiled regular expression or a string which
-       will be compiled to a regular expression using ``re.compile(regex)``
+      Find nodes matching the xpath ``query`` and return the result as a
+      :class:`SelectorList` instance with all elements flattened. List
+      elements implement :class:`Selector` interface too.
+
+      ``query`` is a string containing the XPATH query to apply.
+
+  .. method:: css(query)
+
+      Apply the given CSS selector and return a :class:`SelectorList` instance.
+
+      ``query`` is a string containing the CSS selector to apply.
+
+      In the background, CSS queries are translated into XPath queries using
+      `cssselect`_ library and run ``.xpath()`` method.
+
+  .. method:: extract()
+
+     Serialize and return the matched nodes as a list of unicode strings.
+     Percent encoded content is unquoted.
+
+  .. method:: re(regex)
+
+     Apply the given regex and return a list of unicode strings with the
+     matches.
+
+     ``regex`` can be either a compiled regular expression or a string which
+     will be compiled to a regular expression using ``re.compile(regex)``
+
+  .. method:: register_namespace(prefix, uri)
+
+     Register the given namespace to be used in this :class:`Selector`.
+     Without registering namespaces you can't select or extract data from
+     non-standard namespaces. See examples below.
+
+  .. method:: remove_namespaces()
+
+     Remove all namespaces, allowing to traverse the document using
+     namespace-less xpaths. See example below.
+
+  .. method:: __nonzero__()
+
+     Returns ``True`` if there is any real content selected or ``False``
+     otherwise.  In other words, the boolean value of a :class:`Selector` is
+     given by the contents it selects.
+
+
+SelectorList objects
+--------------------
+
+.. class:: SelectorList
+
+   The :class:`SelectorList` class is subclass of the builtin ``list``
+   class, which provides a few additional methods.
+
+   .. method:: xpath(query)
+
+       Call the ``.xpath()`` method for each element in this list and return
+       their results flattened as another :class:`SelectorList`.
+
+       ``query`` is the same argument as the one in :meth:`Selector.xpath`
+
+   .. method:: css(query)
+
+       Call the ``.css()`` method for each element in this list and return
+       their results flattened as another :class:`SelectorList`.
+
+       ``query`` is the same argument as the one in :meth:`Selector.css`

   .. method:: extract()

-       Return a unicode string with the content of this :class:`XPathSelector`
-       object.
+       Call the ``.extract()`` method for each element is this list and return
+       their results flattened, as a list of unicode strings.

-   .. method:: register_namespace(prefix, uri)
+   .. method:: re()

-       Register the given namespace to be used in this :class:`XPathSelector`.
-       Without registering namespaces you can't select or extract data from
-       non-standard namespaces. See examples below.
-
-   .. method:: remove_namespaces()
-
-       Remove all namespaces, allowing to traverse the document using
-       namespace-less xpaths. See example below.
+       Call the ``.re()`` method for each element is this list and return
+       their results flattened, as a list of unicode strings.

   .. method:: __nonzero__()

-       Returns ``True`` if there is any real content selected by this
-       :class:`XPathSelector` or ``False`` otherwise.  In other words, the boolean
-       value of an XPathSelector is given by the contents it selects. 
+        returns True if the list is not empty, False otherwise.

-XPathSelectorList objects
-------------------------

-.. class:: XPathSelectorList
+Selector examples on HTML response
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-   The :class:`XPathSelectorList` class is subclass of the builtin ``list``
-   class, which provides a few additional methods.
+Here's a couple of :class:`Selector` examples to illustrate several concepts.
+In all cases, we assume there is already an :class:`Selector` instantiated with
+a :class:`~scrapy.http.HtmlResponse` object like this::

-   .. method:: select(xpath)
-
-       Call the :meth:`XPathSelector.select` method for all :class:`XPathSelector`
-       objects in this list and return their results flattened, as a new
-       :class:`XPathSelectorList`.
-
-       ``xpath`` is the same argument as the one in :meth:`XPathSelector.select`
-
-   .. method:: re(regex)
-
-       Call the :meth:`XPathSelector.re` method for all :class:`XPathSelector`
-       objects in this list and return their results flattened, as a list of
-       unicode strings.
-
-       ``regex`` is the same argument as the one in :meth:`XPathSelector.re`
-
-   .. method:: extract()
-
-       Call the :meth:`XPathSelector.extract` method for all :class:`XPathSelector`
-       objects in this list and return their results flattened, as a list of
-       unicode strings.
-
-   .. method:: extract_unquoted()
-
-       Call the :meth:`XPathSelector.extract_unoquoted` method for all
-       :class:`XPathSelector` objects in this list and return their results
-       flattened, as a list of unicode strings. This method should not be applied
-       to all kinds of XPathSelectors. For more info see
-       :meth:`XPathSelector.extract_unoquoted`.
-
-HtmlXPathSelector objects
-------------------------
-
-.. class:: HtmlXPathSelector(response)
-
-   A subclass of :class:`XPathSelector` for working with HTML content. It uses
-   the `libxml2`_ HTML parser. See the :class:`XPathSelector` API for more info.
-
-.. _libxml2: http://xmlsoft.org/
-
-HtmlXPathSelector examples
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Here's a couple of :class:`HtmlXPathSelector` examples to illustrate several
-concepts.  In all cases, we assume there is already an :class:`HtmlPathSelector`
-instantiated with a :class:`~scrapy.http.Response` object like this::
-
-      x = HtmlXPathSelector(html_response)
+      x = Selector(html_response)

 1. Select all ``<h1>`` elements from a HTML response body, returning a list of
-   :class:`XPathSelector` objects (ie. a :class:`XPathSelectorList` object)::
+   :class:`Selector` objects (ie. a :class:`SelectorList` object)::

-      x.select("//h1")
+      x.xpath("//h1")

 2. Extract the text of all ``<h1>`` elements from a HTML response body,
   returning a list of unicode strings::

-      x.select("//h1").extract()         # this includes the h1 tag
-      x.select("//h1/text()").extract()  # this excludes the h1 tag
+      x.xpath("//h1").extract()         # this includes the h1 tag
+      x.xpath("//h1/text()").extract()  # this excludes the h1 tag

 3. Iterate over all ``<p>`` tags and print their class attribute::

-      for node in x.select("//p"):
-      ...    print node.select("@href")
+      for node in x.xpath("//p"):
+      ...    print node.xpath("@class").extract()

-4. Extract textual data from all ``<p>`` tags without entities, as a list of
-   unicode strings::
+Selector examples on XML response
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-      x.select("//p/text()").extract_unquoted()
+Here's a couple of examples to illustrate several concepts. In both cases we
+assume there is already an :class:`Selector` instantiated with a
+:class:`~scrapy.http.XmlResponse` object like this::

-      # the following line is wrong. extract_unquoted() should only be used
-      # with textual XPathSelectors
-      x.select("//p").extract_unquoted()  # it may work but output is unpredictable
+      x = Selector(xml_response)

-XmlXPathSelector objects
------------------------
+1. Select all ``<product>`` elements from a XML response body, returning a list
+   of :class:`Selector` objects (ie. a :class:`SelectorList` object)::

-.. class:: XmlXPathSelector(response)
-
-   A subclass of :class:`XPathSelector` for working with XML content. It uses
-   the `libxml2`_ XML parser. See the :class:`XPathSelector` API for more info.
-
-XmlXPathSelector examples
-~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Here's a couple of :class:`XmlXPathSelector` examples to illustrate several
-concepts.  In both cases we assume there is already an :class:`XmlXPathSelector`
-instantiated with a :class:`~scrapy.http.Response` object like this::
-
-      x = XmlXPathSelector(xml_response)
-
-1. Select all ``<product>`` elements from a XML response body, returning a list of
-   :class:`XPathSelector` objects (ie. a :class:`XPathSelectorList` object)::
-
-      x.select("//product")
+      x.xpath("//product")

 2. Extract all prices from a `Google Base XML feed`_ which requires registering
   a namespace::

      x.register_namespace("g", "http://base.google.com/ns/1.0")
-      x.select("//g:price").extract()
+      x.xpath("//g:price").extract()

 .. _removing-namespaces:

@ -390,7 +416,7 @@ Removing namespaces
 When dealing with scraping projects, it is often quite convenient to get rid of
 namespaces altogether and just work with element names, to write more
 simple/convenient XPaths. You can use the
-:meth:`XPathSelector.remove_namespaces` method for that.
+:meth:`Selector.remove_namespaces` method for that.

 Let's show an example that illustrates this with Github blog atom feed.

@ -401,27 +427,27 @@ First, we open the shell with the url we want to scrape::
 Once in the shell we can try selecting all ``<link>`` objects and see that it
 doesn't work (because the Atom XML namespace is obfuscating those nodes)::

-    >>> xxs.select("//link")
+    >>> xxs.xpath("//link")
    []

-But once we call the :meth:`XPathSelector.remove_namespaces` method, all
+But once we call the :meth:`Selector.remove_namespaces` method, all
 nodes can be accessed directly by their names::

    >>> xxs.remove_namespaces()
-    >>> xxs.select("//link")
-    [<XmlXPathSelector xpath='//link' data=u'<link xmlns="http://www.w3.org/2005/Atom'>,
-     <XmlXPathSelector xpath='//link' data=u'<link xmlns="http://www.w3.org/2005/Atom'>,
+    >>> xxs.xpath("//link")
+    [<Selector xpath='//link' data=u'<link xmlns="http://www.w3.org/2005/Atom'>,
+     <Selector xpath='//link' data=u'<link xmlns="http://www.w3.org/2005/Atom'>,
     ...

 If you wonder why the namespace removal procedure is not always called, instead
 of having to call it manually. This is because of two reasons which, in order
 of relevance, are:

-1. removing namespaces requires to iterate and modify all nodes in the
+1. Removing namespaces requires to iterate and modify all nodes in the
   document, which is a reasonably expensive operation to performs for all
   documents crawled by Scrapy

-2. there could be some cases where using namespaces is actually required, in
+2. There could be some cases where using namespaces is actually required, in
   case some element names clash between namespaces. These cases are very rare
   though.

--- a/docs/topics/shell.rst
+++ b/docs/topics/shell.rst
@ -9,10 +9,10 @@ scraping code very quickly, without having to run the spider. It's meant to be
 used for testing data extraction code, but you can actually use it for testing
 any kind of code as it is also a regular Python shell.

-The shell is used for testing XPath expressions and see how they work and what
-data they extract from the web pages you're trying to scrape. It allows you to
-interactively test your XPaths while you're writing your spider, without having
-to run the spider to test every change.
+The shell is used for testing XPath or CSS expressions and see how they work
+and what data they extract from the web pages you're trying to scrape. It
+allows you to interactively test your expressions while you're writing your
+spider, without having to run the spider to test every change.

 Once you get familiarized with the Scrapy shell, you'll see that it's an
 invaluable tool for developing and debugging your spiders.
@ -66,7 +66,7 @@ Available Scrapy objects

 The Scrapy shell automatically creates some convenient objects from the
 downloaded page, like the :class:`~scrapy.http.Response` object and the
-:class:`~scrapy.selector.XPathSelector` objects (for both HTML and XML
+:class:`~scrapy.selector.Selector` objects (for both HTML and XML
 content).

 Those objects are:
@ -83,10 +83,7 @@ Those objects are:
 * ``response`` - a :class:`~scrapy.http.Response` object containing the last
   fetched page

- * ``hxs`` - a :class:`~scrapy.selector.HtmlXPathSelector` object constructed
-   with the last response fetched
-
- * ``xxs`` - a :class:`~scrapy.selector.XmlXPathSelector` object constructed
+ * ``sel`` - a :class:`~scrapy.selector.Selector` object constructed
   with the last response fetched

 * ``settings`` - the current :ref:`Scrapy settings <topics-settings>`
@ -114,13 +111,12 @@ list of available objects and useful shortcuts (you'll notice that these lines
 all start with the ``[s]`` prefix)::

    [s] Available objects
-    [s]   hxs       <HtmlXPathSelector (http://scrapy.org) xpath=None>
+    [s]   sel       <Selector (http://scrapy.org) xpath=None>
    [s]   item      Item()
    [s]   request   <http://scrapy.org>
    [s]   response  <http://scrapy.org>
    [s]   settings  <Settings 'mybot.settings'>
    [s]   spider    <scrapy.spider.models.BaseSpider object at 0x2bed9d0>
-    [s]   xxs       <XmlXPathSelector (http://scrapy.org) xpath=None>
    [s] Useful shortcuts:
    [s]   shelp()           Prints this help.
    [s]   fetch(req_or_url) Fetch a new request or URL and update objects
@ -130,24 +126,23 @@ all start with the ``[s]`` prefix)::

 After that, we can star playing with the objects::

-    >>> hxs.select("//h2/text()").extract()[0]
+    >>> sel.xpath("//h2/text()").extract()[0]
    u'Welcome to Scrapy'

    >>> fetch("http://slashdot.org")
    [s] Available Scrapy objects:
-    [s]   hxs        <HtmlXPathSelector (http://slashdot.org) xpath=None>
+    [s]   sel        <Selector (http://slashdot.org) xpath=None>
    [s]   item       JobItem()
    [s]   request    <GET http://slashdot.org>
    [s]   response   <200 http://slashdot.org>
    [s]   settings   <Settings 'jobsbot.settings'>
    [s]   spider     <BaseSpider 'default' at 0x3c44a10>
-    [s]   xxs        <XmlXPathSelector (http://slashdot.org) xpath=None>
    [s] Useful shortcuts:
    [s]   shelp()           Shell help (print this help)
    [s]   fetch(req_or_url) Fetch request (or URL) and update local objects
    [s]   view(response)    View response in a browser

-    >>> hxs.select("//h2/text()").extract()
+    >>> sel.xpath("//h2/text()").extract()
    [u'News for nerds, stuff that matters']

    >>> request = request.replace(method="POST")
@ -185,7 +180,7 @@ When you run the spider, you will get something similar to this::
    2009-08-27 19:15:25-0300 [example.com] DEBUG: Crawled <http://www.example.com/> (referer: <None>)
    2009-08-27 19:15:26-0300 [example.com] DEBUG: Crawled <http://www.example.com/products.php> (referer: <http://www.example.com/>)
    [s] Available objects
-    [s]   hxs       <HtmlXPathSelector (http://www.example.com/products.php) xpath=None>
+    [s]   sel       <Selector (http://www.example.com/products.php) xpath=None>
    ...

    >>> response.url
@ -193,7 +188,7 @@ When you run the spider, you will get something similar to this::

 Then, you can check if the extraction code is working::

-    >>> hxs.select('//h1')
+    >>> sel.xpath('//h1')
    []

 Nope, it doesn't. So you can open the response in your web browser and see if
--- a/docs/topics/spiders.rst
+++ b/docs/topics/spiders.rst
@ -216,7 +216,7 @@ Let's see an example::

 Another example returning multiples Requests and Items from a single callback::

-    from scrapy.selector import HtmlXPathSelector
+    from scrapy.selector import Selector
    from scrapy.spider import BaseSpider
    from scrapy.http import Request
    from myproject.items import MyItem
@ -231,11 +231,11 @@ Another example returning multiples Requests and Items from a single callback::
        ]

        def parse(self, response):
-            hxs = HtmlXPathSelector(response)
-            for h3 in hxs.select('//h3').extract():
+            sel = Selector(response)
+            for h3 in sel.xpath('//h3').extract():
                yield MyItem(title=h3)

-            for url in hxs.select('//a/@href').extract():
+            for url in sel.xpath('//a/@href').extract():
                yield Request(url, callback=self.parse)

 .. module:: scrapy.contrib.spiders
@ -314,7 +314,7 @@ Let's now take a look at an example CrawlSpider with rules::

    from scrapy.contrib.spiders import CrawlSpider, Rule
    from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
-    from scrapy.selector import HtmlXPathSelector
+    from scrapy.selector import Selector
    from scrapy.item import Item

    class MySpider(CrawlSpider):
@ -334,11 +334,11 @@ Let's now take a look at an example CrawlSpider with rules::
        def parse_item(self, response):
            self.log('Hi, this is an item page! %s' % response.url)

-            hxs = HtmlXPathSelector(response)
+            sel = Selector(response)
            item = Item()
-            item['id'] = hxs.select('//td[@id="item_id"]/text()').re(r'ID: (\d+)')
-            item['name'] = hxs.select('//td[@id="item_name"]/text()').extract()
-            item['description'] = hxs.select('//td[@id="item_description"]/text()').extract()
+            item['id'] = sel.xpath('//td[@id="item_id"]/text()').re(r'ID: (\d+)')
+            item['name'] = sel.xpath('//td[@id="item_name"]/text()').extract()
+            item['description'] = sel.xpath('//td[@id="item_description"]/text()').extract()
            return item


@ -366,15 +366,15 @@ XMLFeedSpider

        A string which defines the iterator to use. It can be either:

-           - ``'iternodes'`` - a fast iterator based on regular expressions 
+           - ``'iternodes'`` - a fast iterator based on regular expressions

-           - ``'html'`` - an iterator which uses HtmlXPathSelector. Keep in mind
-             this uses DOM parsing and must load all DOM in memory which could be a
-             problem for big feeds
+           - ``'html'`` - an iterator which uses :class:`~scrapy.selector.Selector`.
+             Keep in mind this uses DOM parsing and must load all DOM in memory
+             which could be a problem for big feeds

-           - ``'xml'`` - an iterator which uses XmlXPathSelector. Keep in mind
-             this uses DOM parsing and must load all DOM in memory which could be a
-             problem for big feeds
+           - ``'xml'`` - an iterator which uses :class:`~scrapy.selector.Selector`.
+             Keep in mind this uses DOM parsing and must load all DOM in memory
+             which could be a problem for big feeds

        It defaults to: ``'iternodes'``.

@ -390,7 +390,7 @@ XMLFeedSpider
        available in that document that will be processed with this spider. The
        ``prefix`` and ``uri`` will be used to automatically register
        namespaces using the
-        :meth:`~scrapy.selector.XPathSelector.register_namespace` method.
+        :meth:`~scrapy.selector.Selector.register_namespace` method.

        You can then specify nodes with namespaces in the :attr:`itertag`
        attribute.
@ -416,9 +416,10 @@ XMLFeedSpider
    .. method:: parse_node(response, selector)
       
        This method is called for the nodes matching the provided tag name
-        (``itertag``).  Receives the response and an XPathSelector for each node.
-        Overriding this method is mandatory. Otherwise, you spider won't work.
-        This method must return either a :class:`~scrapy.item.Item` object, a
+        (``itertag``).  Receives the response and an
+        :class:`~scrapy.selector.Selector` for each node.  Overriding this
+        method is mandatory. Otherwise, you spider won't work.  This method
+        must return either a :class:`~scrapy.item.Item` object, a
        :class:`~scrapy.http.Request` object, or an iterable containing any of
        them.

@ -451,9 +452,9 @@ These spiders are pretty easy to use, let's have a look at one example::
            log.msg('Hi, this is a <%s> node!: %s' % (self.itertag, ''.join(node.extract())))

            item = Item()
-            item['id'] = node.select('@id').extract()
-            item['name'] = node.select('name').extract()
-            item['description'] = node.select('description').extract()
+            item['id'] = node.xpath('@id').extract()
+            item['name'] = node.xpath('name').extract()
+            item['description'] = node.xpath('description').extract()
            return item

 Basically what we did up there was to create a spider that downloads a feed from
--- a/scrapy/init.py
+++ b/scrapy/init.py
@ -30,13 +30,6 @@ except ImportError:
 else:
    optional_features.add('boto')

-try:
-    import libxml2
-except ImportError:
-    pass
-else:
-    optional_features.add('libxml2')
-
 try:
    import django
 except ImportError:
--- a/scrapy/commands/version.py
+++ b/scrapy/commands/version.py
@ -6,6 +6,7 @@ import twisted
 import scrapy
 from scrapy.command import ScrapyCommand

+
 class Command(ScrapyCommand):

    def syntax(self):
@ -21,13 +22,9 @@ class Command(ScrapyCommand):

    def run(self, args, opts):
        if opts.verbose:
-            try:
-                import lxml.etree
-            except ImportError:
-                lxml_version = libxml2_version = "(lxml not available)"
-            else:
-                lxml_version = ".".join(map(str, lxml.etree.LXML_VERSION))
-                libxml2_version = ".".join(map(str, lxml.etree.LIBXML_VERSION))
+            import lxml.etree
+            lxml_version = ".".join(map(str, lxml.etree.LXML_VERSION))
+            libxml2_version = ".".join(map(str, lxml.etree.LIBXML_VERSION))
            print "Scrapy  : %s" % scrapy.__version__
            print "lxml    : %s" % lxml_version
            print "libxml2 : %s" % libxml2_version
--- a/scrapy/contrib/linkextractors/sgml.py
+++ b/scrapy/contrib/linkextractors/sgml.py
@ -4,7 +4,7 @@ SGMLParser-based Link extractors
 import re
 from urlparse import urlparse, urljoin
 from w3lib.url import safe_url_string
-from scrapy.selector import HtmlXPathSelector
+from scrapy.selector import Selector
 from scrapy.link import Link
 from scrapy.linkextractor import IGNORED_EXTENSIONS
 from scrapy.utils.misc import arg_to_iter
@ -116,11 +116,11 @@ class SgmlLinkExtractor(BaseSgmlLinkExtractor):
    def extract_links(self, response):
        base_url = None
        if self.restrict_xpaths:
-            hxs = HtmlXPathSelector(response)
+            sel = Selector(response)
            base_url = get_base_url(response)
            body = u''.join(f
                            for x in self.restrict_xpaths
-                            for f in hxs.select(x).extract()
+                            for f in sel.xpath(x).extract()
                            ).encode(response.encoding)
        else:
            body = response.body
--- a/scrapy/contrib/loader/init.py
+++ b/scrapy/contrib/loader/init.py
@ -8,7 +8,7 @@ from collections import defaultdict
 import re

 from scrapy.item import Item
-from scrapy.selector import HtmlXPathSelector
+from scrapy.selector import Selector
 from scrapy.utils.misc import arg_to_iter, extract_regex
 from scrapy.utils.python import flatten
 from .common import wrap_loader_context
@ -116,7 +116,7 @@ class ItemLoader(object):

 class XPathItemLoader(ItemLoader):

-    default_selector_class = HtmlXPathSelector
+    default_selector_class = Selector

    def __init__(self, item=None, selector=None, response=None, **context):
        if selector is None and response is None:
@ -142,5 +142,4 @@ class XPathItemLoader(ItemLoader):

    def _get_values(self, xpaths, **kw):
        xpaths = arg_to_iter(xpaths)
-        return flatten([self.selector.select(xpath).extract() for xpath in xpaths])
-
+        return flatten([self.selector.xpath(xpath).extract() for xpath in xpaths])
--- a/scrapy/contrib/memdebug.py
+++ b/scrapy/contrib/memdebug.py
@ -10,14 +10,10 @@ from scrapy import signals
 from scrapy.exceptions import NotConfigured
 from scrapy.utils.trackref import live_refs

+
 class MemoryDebugger(object):

    def __init__(self, stats):
-        try:
-            import libxml2
-            self.libxml2 = libxml2
-        except ImportError:
-            self.libxml2 = None
        self.stats = stats

    @classmethod
@ -25,18 +21,10 @@ class MemoryDebugger(object):
        if not crawler.settings.getbool('MEMDEBUG_ENABLED'):
            raise NotConfigured
        o = cls(crawler.stats)
-        crawler.signals.connect(o.engine_started, signals.engine_started)
        crawler.signals.connect(o.engine_stopped, signals.engine_stopped)
        return o

-    def engine_started(self):
-        if self.libxml2:
-            self.libxml2.debugMemory(1)
-
    def engine_stopped(self):
-        if self.libxml2:
-            self.libxml2.cleanupParser()
-            self.stats.set_value('memdebug/libxml2_leaked_bytes', self.libxml2.debugMemory(1))
        gc.collect()
        self.stats.set_value('memdebug/gc_garbage_count', len(gc.garbage))
        for cls, wdict in live_refs.iteritems():
--- a/scrapy/contrib/spiders/feed.py
+++ b/scrapy/contrib/spiders/feed.py
@ -9,7 +9,7 @@ from scrapy.item import BaseItem
 from scrapy.http import Request
 from scrapy.utils.iterators import xmliter, csviter
 from scrapy.utils.spider import iterate_spider_output
-from scrapy.selector import XmlXPathSelector, HtmlXPathSelector
+from scrapy.selector import Selector
 from scrapy.exceptions import NotConfigured, NotSupported


@ -52,7 +52,7 @@ class XMLFeedSpider(BaseSpider):

    def parse_nodes(self, response, nodes):
        """This method is called for the nodes matching the provided tag name
-        (itertag). Receives the response and an XPathSelector for each node.
+        (itertag). Receives the response and an Selector for each node.
        Overriding this method is mandatory. Otherwise, you spider won't work.
        This method must return either a BaseItem, a Request, or a list
        containing any of them.
@ -71,13 +71,13 @@ class XMLFeedSpider(BaseSpider):
        if self.iterator == 'iternodes':
            nodes = self._iternodes(response)
        elif self.iterator == 'xml':
-            selector = XmlXPathSelector(response)
+            selector = Selector(response, type='xml')
            self._register_namespaces(selector)
-            nodes = selector.select('//%s' % self.itertag)
+            nodes = selector.xpath('//%s' % self.itertag)
        elif self.iterator == 'html':
-            selector = HtmlXPathSelector(response)
+            selector = Selector(response, type='html')
            self._register_namespaces(selector)
-            nodes = selector.select('//%s' % self.itertag)
+            nodes = selector.xpath('//%s' % self.itertag)
        else:
            raise NotSupported('Unsupported node iterator')

--- a/scrapy/contrib_exp/iterators.py
+++ b/scrapy/contrib_exp/iterators.py
@ -1,5 +1,5 @@
 from scrapy.http import Response
-from scrapy.selector import XmlXPathSelector
+from scrapy.selector import Selector


 def xmliter_lxml(obj, nodename, namespace=None):
@ -11,10 +11,10 @@ def xmliter_lxml(obj, nodename, namespace=None):
    for _, node in iterable:
        nodetext = etree.tostring(node)
        node.clear()
-        xs = XmlXPathSelector(text=nodetext)
+        xs = Selector(text=nodetext, type='xml')
        if namespace:
            xs.register_namespace('x', namespace)
-        yield xs.select(selxpath)[0]
+        yield xs.xpath(selxpath)[0]


 class _StreamReader(object):
--- a/scrapy/selector/init.py
+++ b/scrapy/selector/init.py
@ -1,26 +1,5 @@
 """
-XPath selectors
-
-To select the backend explicitly use the SCRAPY_SELECTORS_BACKEND environment
-variable.
-
-Two backends are currently available: lxml (default) and libxml2.
-
+Selectors
 """
-
-import os
-
-backend = os.environ.get('SCRAPY_SELECTORS_BACKEND')
-
-if backend == 'libxml2':
-    from scrapy.selector.libxml2sel import *
-elif backend == 'lxml':
-    from scrapy.selector.lxmlsel import *
-else:
-    try:
-        import lxml
-    except ImportError:
-        import libxml2
-        from scrapy.selector.libxml2sel import *
-    else:
-        from scrapy.selector.lxmlsel import *
+from scrapy.selector.unified import *
+from scrapy.selector.lxmlsel import *
--- a/scrapy/selector/csstranslator.py
+++ b/scrapy/selector/csstranslator.py
@ -0,0 +1,88 @@
+from cssselect import GenericTranslator, HTMLTranslator
+from cssselect.xpath import _unicode_safe_getattr, XPathExpr, ExpressionError
+from cssselect.parser import FunctionalPseudoElement
+
+
+class ScrapyXPathExpr(XPathExpr):
+
+    textnode = False
+    attribute = None
+
+    @classmethod
+    def from_xpath(cls, xpath, textnode=False, attribute=None):
+        x = cls(path=xpath.path, element=xpath.element, condition=xpath.condition)
+        x.textnode = textnode
+        x.attribute = attribute
+        return x
+
+    def __str__(self):
+        path = super(ScrapyXPathExpr, self).__str__()
+        if self.textnode:
+            if path == '*':
+                path = 'text()'
+            elif path.endswith('::*/*'):
+                path = path[:-3] + 'text()'
+            else:
+                path += '/text()'
+
+        if self.attribute is not None:
+            if path.endswith('::*/*'):
+                path = path[:-2]
+            path += '/@%s' % self.attribute
+
+        return path
+
+    def join(self, combiner, other):
+        super(ScrapyXPathExpr, self).join(combiner, other)
+        self.textnode = other.textnode
+        self.attribute = other.attribute
+        return self
+
+
+class TranslatorMixin(object):
+
+    def xpath_element(self, selector):
+        xpath = super(TranslatorMixin, self).xpath_element(selector)
+        return ScrapyXPathExpr.from_xpath(xpath)
+
+    def xpath_pseudo_element(self, xpath, pseudo_element):
+        if isinstance(pseudo_element, FunctionalPseudoElement):
+            method = 'xpath_%s_functional_pseudo_element' % (
+                pseudo_element.name.replace('-', '_'))
+            method = _unicode_safe_getattr(self, method, None)
+            if not method:
+                raise ExpressionError(
+                    "The functional pseudo-element ::%s() is unknown"
+                % pseudo_element.name)
+            xpath = method(xpath, pseudo_element)
+        else:
+            method = 'xpath_%s_simple_pseudo_element' % (
+                pseudo_element.replace('-', '_'))
+            method = _unicode_safe_getattr(self, method, None)
+            if not method:
+                raise ExpressionError(
+                    "The pseudo-element ::%s is unknown"
+                    % pseudo_element)
+            xpath = method(xpath)
+        return xpath
+
+    def xpath_attr_functional_pseudo_element(self, xpath, function):
+        if function.argument_types() not in (['STRING'], ['IDENT']):
+            raise ExpressionError(
+                "Expected a single string or ident for ::attr(), got %r"
+                % function.arguments)
+        return ScrapyXPathExpr.from_xpath(xpath,
+            attribute=function.arguments[0].value)
+
+    def xpath_text_simple_pseudo_element(self, xpath):
+        """Support selecting text nodes using ::text pseudo-element"""
+        return ScrapyXPathExpr.from_xpath(xpath, textnode=True)
+
+
+class ScrapyGenericTranslator(TranslatorMixin, GenericTranslator):
+    pass
+
+
+class ScrapyHTMLTranslator(TranslatorMixin, HTMLTranslator):
+    pass
+
--- a/scrapy/selector/libxml2document.py
+++ b/scrapy/selector/libxml2document.py
@ -1,82 +0,0 @@
-"""
-This module contains a simple class (Libxml2Document) which provides cache and
-garbage collection to libxml2 documents (xmlDoc).
-"""
-
-import weakref
-from scrapy.utils.trackref import object_ref
-from scrapy import optional_features
-
-if 'libxml2' in optional_features:
-    import libxml2
-    xml_parser_options = libxml2.XML_PARSE_RECOVER + \
-                         libxml2.XML_PARSE_NOERROR + \
-                         libxml2.XML_PARSE_NOWARNING
-
-    html_parser_options = libxml2.HTML_PARSE_RECOVER + \
-                          libxml2.HTML_PARSE_NOERROR + \
-                          libxml2.HTML_PARSE_NOWARNING
-
-
-_UTF8_ENCODINGS = set(('utf-8', 'UTF-8', 'utf8', 'UTF8'))
-def _body_as_utf8(response):
-    if response.encoding in _UTF8_ENCODINGS:
-        return response.body
-    else:
-        return response.body_as_unicode().encode('utf-8')
-
-
-def xmlDoc_from_html(response):
-    """Return libxml2 doc for HTMLs"""
-    utf8body = _body_as_utf8(response) or ' '
-    try:
-        lxdoc = libxml2.htmlReadDoc(utf8body, response.url, 'utf-8', \
-            html_parser_options)
-    except TypeError:  # libxml2 doesn't parse text with null bytes
-        lxdoc = libxml2.htmlReadDoc(utf8body.replace("\x00", ""), response.url, \
-            'utf-8', html_parser_options)
-    return lxdoc
-
-
-def xmlDoc_from_xml(response):
-    """Return libxml2 doc for XMLs"""
-    utf8body = _body_as_utf8(response) or ' '
-    try:
-        lxdoc = libxml2.readDoc(utf8body, response.url, 'utf-8', \
-            xml_parser_options)
-    except TypeError:  # libxml2 doesn't parse text with null bytes
-        lxdoc = libxml2.readDoc(utf8body.replace("\x00", ""), response.url, \
-            'utf-8', xml_parser_options)
-    return lxdoc
-
-
-class Libxml2Document(object_ref):
-
-    cache = weakref.WeakKeyDictionary()
-    __slots__ = ['xmlDoc', 'xpathContext', '__weakref__']
-
-    def __new__(cls, response, factory=xmlDoc_from_html):
-        cache = cls.cache.setdefault(response, {})
-        if factory not in cache:
-            obj = object_ref.__new__(cls)
-            obj.xmlDoc = factory(response)
-            obj.xpathContext = obj.xmlDoc.xpathNewContext()
-            cache[factory] = obj
-        return cache[factory]
-
-    def __del__(self):
-        # we must call both cleanup functions, so we try/except all exceptions
-        # to make sure one doesn't prevent the other from being called
-        # this call sometimes raises a "NoneType is not callable" TypeError
-        # so the try/except block silences them
-        try:
-            self.xmlDoc.freeDoc()
-        except:
-            pass
-        try:
-            self.xpathContext.xpathFreeContext()
-        except:
-            pass
-
-    def __str__(self):
-        return "<Libxml2Document %s>" % self.xmlDoc.name
--- a/scrapy/selector/libxml2sel.py
+++ b/scrapy/selector/libxml2sel.py
@ -1,117 +0,0 @@
-"""
-XPath selectors based on libxml2
-"""
-
-from scrapy import optional_features
-if 'libxml2' in optional_features:
-    import libxml2
-
-from scrapy.http import TextResponse
-from scrapy.utils.python import unicode_to_str
-from scrapy.utils.misc import extract_regex
-from scrapy.utils.trackref import object_ref
-from scrapy.utils.decorator import deprecated
-from .libxml2document import Libxml2Document, xmlDoc_from_html, xmlDoc_from_xml
-from .list import XPathSelectorList
-
-__all__ = ['HtmlXPathSelector', 'XmlXPathSelector', 'XPathSelector', \
-    'XPathSelectorList']
-
-class XPathSelector(object_ref):
-
-    __slots__ = ['doc', 'xmlNode', 'expr', '__weakref__']
-
-    def __init__(self, response=None, text=None, node=None, parent=None, expr=None):
-        if parent is not None:
-            self.doc = parent.doc
-            self.xmlNode = node
-        elif response:
-            self.doc = Libxml2Document(response, factory=self._get_libxml2_doc)
-            self.xmlNode = self.doc.xmlDoc
-        elif text:
-            response = TextResponse(url='about:blank', \
-                body=unicode_to_str(text, 'utf-8'), encoding='utf-8')
-            self.doc = Libxml2Document(response, factory=self._get_libxml2_doc)
-            self.xmlNode = self.doc.xmlDoc
-        self.expr = expr
-
-    def select(self, xpath):
-        if hasattr(self.xmlNode, 'xpathEval'):
-            self.doc.xpathContext.setContextNode(self.xmlNode)
-            xpath = unicode_to_str(xpath, 'utf-8')
-            try:
-                xpath_result = self.doc.xpathContext.xpathEval(xpath)
-            except libxml2.xpathError:
-                raise ValueError("Invalid XPath: %s" % xpath)
-            if hasattr(xpath_result, '__iter__'):
-                return XPathSelectorList([self.__class__(node=node, parent=self, \
-                    expr=xpath) for node in xpath_result])
-            else:
-                return XPathSelectorList([self.__class__(node=xpath_result, \
-                    parent=self, expr=xpath)])
-        else:
-            return XPathSelectorList([])
-
-    def re(self, regex):
-        return extract_regex(regex, self.extract())
-
-    def extract(self):
-        if isinstance(self.xmlNode, basestring):
-            text = unicode(self.xmlNode, 'utf-8', errors='ignore')
-        elif hasattr(self.xmlNode, 'serialize'):
-            if isinstance(self.xmlNode, libxml2.xmlDoc):
-                data = self.xmlNode.getRootElement().serialize('utf-8')
-                text = unicode(data, 'utf-8', errors='ignore') if data else u''
-            elif isinstance(self.xmlNode, libxml2.xmlAttr): 
-                # serialization doesn't work sometimes for xmlAttr types
-                text = unicode(self.xmlNode.content, 'utf-8', errors='ignore')
-            else:
-                data = self.xmlNode.serialize('utf-8')
-                text = unicode(data, 'utf-8', errors='ignore') if data else u''
-        else:
-            try:
-                text = unicode(self.xmlNode, 'utf-8', errors='ignore')
-            except TypeError:  # catched when self.xmlNode is a float - see tests
-                text = unicode(self.xmlNode)
-        return text
-
-    def extract_unquoted(self):
-        """Get unescaped contents from the text node (no entities, no CDATA)"""
-        # TODO: this function should be deprecated. but what would be use instead?
-        if self.select('self::text()'):
-            return unicode(self.xmlNode.getContent(), 'utf-8', errors='ignore')
-        else:
-            return u''
-
-    def register_namespace(self, prefix, uri):
-        self.doc.xpathContext.xpathRegisterNs(prefix, uri)
-
-    def _get_libxml2_doc(self, response):
-        return xmlDoc_from_html(response)
-
-    def __nonzero__(self):
-        return bool(self.extract())
-
-    def __str__(self):
-        data = repr(self.extract()[:40])
-        return "<%s xpath=%r data=%s>" % (type(self).__name__, self.expr, data)
-
-    __repr__ = __str__
-
-    @deprecated(use_instead='XPathSelector.select')
-    def __call__(self, xpath):
-        return self.select(xpath)
-
-    @deprecated(use_instead='XPathSelector.select')
-    def x(self, xpath):
-        return self.select(xpath)
-
-
-class XmlXPathSelector(XPathSelector):
-    __slots__ = ()
-    _get_libxml2_doc = staticmethod(xmlDoc_from_xml)
-
-
-class HtmlXPathSelector(XPathSelector):
-    __slots__ = ()
-    _get_libxml2_doc = staticmethod(xmlDoc_from_html)
--- a/scrapy/selector/list.py
+++ b/scrapy/selector/list.py
@ -1,23 +0,0 @@
-from scrapy.utils.python import flatten
-from scrapy.utils.decorator import deprecated
-
-class XPathSelectorList(list):
-
-    def __getslice__(self, i, j):
-        return self.__class__(list.__getslice__(self, i, j))
-
-    def select(self, xpath):
-        return self.__class__(flatten([x.select(xpath) for x in self]))
-
-    def re(self, regex):
-        return flatten([x.re(regex) for x in self])
-
-    def extract(self):
-        return [x.extract() for x in self]
-
-    def extract_unquoted(self):
-        return [x.extract_unquoted() for x in self]
-
-    @deprecated(use_instead='XPathSelectorList.select')
-    def x(self, xpath):
-        return self.select(xpath)
--- a/scrapy/selector/lxmlsel.py
+++ b/scrapy/selector/lxmlsel.py
@ -1,109 +1,47 @@
 """
 XPath selectors based on lxml
 """
-
-from lxml import etree
-
-from scrapy.utils.misc import extract_regex
-from scrapy.utils.trackref import object_ref
-from scrapy.utils.python import unicode_to_str
-from scrapy.utils.decorator import deprecated
-from scrapy.http import TextResponse
-from .lxmldocument import LxmlDocument
-from .list import XPathSelectorList
+from .unified import Selector, SelectorList


-__all__ = ['HtmlXPathSelector', 'XmlXPathSelector', 'XPathSelector', \
+__all__ = ['HtmlXPathSelector', 'XmlXPathSelector', 'XPathSelector',
           'XPathSelectorList']


-class XPathSelector(object_ref):
+class XPathSelector(Selector):
+    __slots__ = ()
+    _default_type = 'html'

-    __slots__ = ['response', 'text', 'namespaces', '_expr', '_root', '__weakref__']
-    _parser = etree.HTMLParser
-    _tostring_method = 'html'
+    def __init__(self, *a, **kw):
+        import warnings
+        from scrapy.exceptions import ScrapyDeprecationWarning
+        warnings.warn('%s is deprecated, instanciate scrapy.selector.Selector '
+                      'instead' % type(self).__name__,
+                      category=ScrapyDeprecationWarning, stacklevel=1)
+        super(XPathSelector, self).__init__(*a, **kw)

-    def __init__(self, response=None, text=None, namespaces=None, _root=None, _expr=None):
-        if text is not None:
-            response = TextResponse(url='about:blank', \
-                body=unicode_to_str(text, 'utf-8'), encoding='utf-8')
-        if response is not None:
-            _root = LxmlDocument(response, self._parser)
-
-        self.namespaces = namespaces
-        self.response = response
-        self._root = _root
-        self._expr = _expr
-
-    def select(self, xpath):
-        try:
-            xpathev = self._root.xpath
-        except AttributeError:
-            return XPathSelectorList([])
-
-        try:
-            result = xpathev(xpath, namespaces=self.namespaces)
-        except etree.XPathError:
-            raise ValueError("Invalid XPath: %s" % xpath)
-
-        if type(result) is not list:
-            result = [result]
-
-        result = [self.__class__(_root=x, _expr=xpath, namespaces=self.namespaces)
-                  for x in result]
-        return XPathSelectorList(result)
-
-    def re(self, regex):
-        return extract_regex(regex, self.extract())
-
-    def extract(self):
-        try:
-            return etree.tostring(self._root, method=self._tostring_method, \
-                encoding=unicode, with_tail=False)
-        except (AttributeError, TypeError):
-            if self._root is True:
-                return u'1'
-            elif self._root is False:
-                return u'0'
-            else:
-                return unicode(self._root)
-
-    def register_namespace(self, prefix, uri):
-        if self.namespaces is None:
-            self.namespaces = {}
-        self.namespaces[prefix] = uri
-
-    def remove_namespaces(self):
-        for el in self._root.iter('*'):
-            if el.tag.startswith('{'):
-                el.tag = el.tag.split('}', 1)[1]
-            # loop on element attributes also
-            for an in el.attrib.keys():
-                if an.startswith('{'):
-                    el.attrib[an.split('}', 1)[1]] = el.attrib.pop(an)
-
-    def __nonzero__(self):
-        return bool(self.extract())
-
-    def __str__(self):
-        data = repr(self.extract()[:40])
-        return "<%s xpath=%r data=%s>" % (type(self).__name__, self._expr, data)
-
-    __repr__ = __str__
-
-
-    @deprecated(use_instead='XPathSelector.extract')
-    def extract_unquoted(self):
-        return self.extract()
+    def css(self, *a, **kw):
+        raise RuntimeError('.css() method not available for %s, '
+                           'instanciate scrapy.selector.Selector '
+                           'instead' % type(self).__name__)


 class XmlXPathSelector(XPathSelector):
    __slots__ = ()
-    _parser = etree.XMLParser
-    _tostring_method = 'xml'
+    _default_type = 'xml'


 class HtmlXPathSelector(XPathSelector):
    __slots__ = ()
-    _parser = etree.HTMLParser
-    _tostring_method = 'html'
+    _default_type = 'html'
+
+
+class XPathSelectorList(SelectorList):
+
+    def __init__(self, *a, **kw):
+        import warnings
+        from scrapy.exceptions import ScrapyDeprecationWarning
+        warnings.warn('XPathSelectorList is deprecated, instanciate '
+                      'scrapy.selector.SelectorList instead',
+                      category=ScrapyDeprecationWarning, stacklevel=1)
+        super(XPathSelectorList, self).__init__(*a, **kw)
--- a/scrapy/selector/unified.py
+++ b/scrapy/selector/unified.py
@ -0,0 +1,170 @@
+"""
+XPath selectors based on lxml
+"""
+
+from lxml import etree
+
+from scrapy.utils.misc import extract_regex
+from scrapy.utils.trackref import object_ref
+from scrapy.utils.python import unicode_to_str, flatten
+from scrapy.utils.decorator import deprecated
+from scrapy.http import HtmlResponse, XmlResponse
+from .lxmldocument import LxmlDocument
+from .csstranslator import ScrapyHTMLTranslator, ScrapyGenericTranslator
+
+
+__all__ = ['Selector', 'SelectorList']
+
+_ctgroup = {
+    'html': {'_parser': etree.HTMLParser,
+             '_csstranslator': ScrapyHTMLTranslator(),
+             '_tostring_method': 'html'},
+    'xml': {'_parser': etree.XMLParser,
+            '_csstranslator': ScrapyGenericTranslator(),
+            '_tostring_method': 'xml'},
+}
+
+
+def _st(response, st):
+    if st is None:
+        return 'xml' if isinstance(response, XmlResponse) else 'html'
+    elif st in ('xml', 'html'):
+        return st
+    else:
+        raise ValueError('Invalid type: %s' % st)
+
+
+def _response_from_text(text, st):
+    rt = XmlResponse if st == 'xml' else HtmlResponse
+    return rt(url='about:blank', encoding='utf-8',
+              body=unicode_to_str(text, 'utf-8'))
+
+
+class Selector(object_ref):
+
+    __slots__ = ['response', 'text', 'namespaces', 'type', '_expr', '_root',
+                 '__weakref__', '_parser', '_csstranslator', '_tostring_method']
+
+    _default_type = None
+
+    def __init__(self, response=None, text=None, type=None, namespaces=None,
+                 _root=None, _expr=None):
+        self.type = st = _st(response, type or self._default_type)
+        self._parser = _ctgroup[st]['_parser']
+        self._csstranslator = _ctgroup[st]['_csstranslator']
+        self._tostring_method = _ctgroup[st]['_tostring_method']
+
+        if text is not None:
+            response = _response_from_text(text, st)
+
+        if response is not None:
+            _root = LxmlDocument(response, self._parser)
+
+        self.response = response
+        self.namespaces = namespaces
+        self._root = _root
+        self._expr = _expr
+
+    def xpath(self, query):
+        try:
+            xpathev = self._root.xpath
+        except AttributeError:
+            return SelectorList([])
+
+        try:
+            result = xpathev(query, namespaces=self.namespaces)
+        except etree.XPathError:
+            raise ValueError("Invalid XPath: %s" % query)
+
+        if type(result) is not list:
+            result = [result]
+
+        result = [self.__class__(_root=x, _expr=query,
+                                 namespaces=self.namespaces,
+                                 type=self.type)
+                  for x in result]
+        return SelectorList(result)
+
+    def css(self, query):
+        return self.xpath(self._css2xpath(query))
+
+    def _css2xpath(self, query):
+        return self._csstranslator.css_to_xpath(query)
+
+    def re(self, regex):
+        return extract_regex(regex, self.extract())
+
+    def extract(self):
+        try:
+            return etree.tostring(self._root,
+                                  method=self._tostring_method,
+                                  encoding=unicode,
+                                  with_tail=False)
+        except (AttributeError, TypeError):
+            if self._root is True:
+                return u'1'
+            elif self._root is False:
+                return u'0'
+            else:
+                return unicode(self._root)
+
+    def register_namespace(self, prefix, uri):
+        if self.namespaces is None:
+            self.namespaces = {}
+        self.namespaces[prefix] = uri
+
+    def remove_namespaces(self):
+        for el in self._root.iter('*'):
+            if el.tag.startswith('{'):
+                el.tag = el.tag.split('}', 1)[1]
+            # loop on element attributes also
+            for an in el.attrib.keys():
+                if an.startswith('{'):
+                    el.attrib[an.split('}', 1)[1]] = el.attrib.pop(an)
+
+    def __nonzero__(self):
+        return bool(self.extract())
+
+    def __str__(self):
+        data = repr(self.extract()[:40])
+        return "<%s xpath=%r data=%s>" % (type(self).__name__, self._expr, data)
+    __repr__ = __str__
+
+    # Deprecated api
+    @deprecated(use_instead='.xpath()')
+    def select(self, xpath):
+        return self.xpath(xpath)
+
+    @deprecated(use_instead='.extract()')
+    def extract_unquoted(self):
+        return self.extract()
+
+
+class SelectorList(list):
+
+    def __getslice__(self, i, j):
+        return self.__class__(list.__getslice__(self, i, j))
+
+    def xpath(self, xpath):
+        return self.__class__(flatten([x.xpath(xpath) for x in self]))
+
+    def css(self, xpath):
+        return self.__class__(flatten([x.css(xpath) for x in self]))
+
+    def re(self, regex):
+        return flatten([x.re(regex) for x in self])
+
+    def extract(self):
+        return [x.extract() for x in self]
+
+    @deprecated(use_instead='.extract()')
+    def extract_unquoted(self):
+        return [x.extract_unquoted() for x in self]
+
+    @deprecated(use_instead='.xpath()')
+    def x(self, xpath):
+        return self.select(xpath)
+
+    @deprecated(use_instead='.xpath()')
+    def select(self, xpath):
+        return self.xpath(xpath)
--- a/scrapy/shell.py
+++ b/scrapy/shell.py
@ -11,20 +11,20 @@ from w3lib.url import any_to_uri

 from scrapy.item import BaseItem
 from scrapy.spider import BaseSpider
-from scrapy.selector import XPathSelector, XmlXPathSelector, HtmlXPathSelector
+from scrapy.selector import Selector
 from scrapy.utils.spider import create_spider_for_request
 from scrapy.utils.misc import load_object
 from scrapy.utils.response import open_in_browser
 from scrapy.utils.console import start_python_console
 from scrapy.settings import Settings
-from scrapy.http import Request, Response, HtmlResponse, XmlResponse
+from scrapy.http import Request, Response
 from scrapy.exceptions import IgnoreRequest


 class Shell(object):

    relevant_classes = (BaseSpider, Request, Response, BaseItem,
-        XPathSelector, Settings)
+                        Selector, Settings)

    def __init__(self, crawler, update_vars=None, code=None):
        self.crawler = crawler
@ -95,10 +95,7 @@ class Shell(object):
        self.vars['spider'] = spider
        self.vars['request'] = request
        self.vars['response'] = response
-        self.vars['xxs'] = XmlXPathSelector(response) \
-            if isinstance(response, XmlResponse) else None
-        self.vars['hxs'] = HtmlXPathSelector(response) \
-            if isinstance(response, HtmlResponse) else None
+        self.vars['sel'] = Selector(response)
        if self.inthread:
            self.vars['fetch'] = self.fetch
        self.vars['view'] = open_in_browser
--- a/scrapy/tests/test_command_shell.py
+++ b/scrapy/tests/test_command_shell.py
@ -31,7 +31,7 @@ class ShellTest(ProcessTest, SiteTest, unittest.TestCase):

    @defer.inlineCallbacks
    def test_response_selector_html(self):
-        xpath = 'hxs.select("//p[@class=\'one\']/text()").extract()[0]'
+        xpath = 'sel.xpath("//p[@class=\'one\']/text()").extract()[0]'
        _, out, _ = yield self.execute([self.url('/html'), '-c', xpath])
        self.assertEqual(out.strip(), 'Works')

--- a/scrapy/tests/test_contrib_loader.py
+++ b/scrapy/tests/test_contrib_loader.py
@ -4,7 +4,7 @@ from scrapy.contrib.loader import ItemLoader, XPathItemLoader
 from scrapy.contrib.loader.processor import Join, Identity, TakeFirst, \
    Compose, MapCompose
 from scrapy.item import Item, Field
-from scrapy.selector import HtmlXPathSelector
+from scrapy.selector import Selector
 from scrapy.http import HtmlResponse


@ -379,7 +379,7 @@ class XPathItemLoaderTest(unittest.TestCase):
        self.assertRaises(RuntimeError, XPathItemLoader)

    def test_constructor_with_selector(self):
-        sel = HtmlXPathSelector(text=u"<html><body><div>marta</div></body></html>")
+        sel = Selector(text=u"<html><body><div>marta</div></body></html>")
        l = TestXPathItemLoader(selector=sel)
        self.assert_(l.selector is sel)
        l.add_xpath('name', '//div/text()')
--- a/scrapy/tests/test_libxml2.py
+++ b/scrapy/tests/test_libxml2.py
@ -1,20 +0,0 @@
-from twisted.trial import unittest
-
-from scrapy.utils.test import libxml2debug
-from scrapy import optional_features
-
-
-class Libxml2Test(unittest.TestCase):
-
-    skip = 'libxml2' not in optional_features
-
-    @libxml2debug
-    def test_libxml2_bug_2_6_27(self):
-        # this test will fail in version 2.6.27 but passes on 2.6.29+
-        import libxml2
-        html = "<td>1<b>2</b>3</td>"
-        node = libxml2.htmlParseDoc(html, 'utf-8')
-        result = [str(r) for r in node.xpathEval('//text()')]
-        self.assertEquals(result, ['1', '2', '3'])
-        node.freeDoc()
-
--- a/scrapy/tests/test_selector.py
+++ b/scrapy/tests/test_selector.py
@ -1,87 +1,85 @@
-"""
-Selectors tests, common for all backends
-"""
-
 import re
+import warnings
 import weakref
-
 from twisted.trial import unittest
-
+from scrapy.exceptions import ScrapyDeprecationWarning
 from scrapy.http import TextResponse, HtmlResponse, XmlResponse
-from scrapy.selector import XmlXPathSelector, HtmlXPathSelector, \
-    XPathSelector
-from scrapy.utils.test import libxml2debug
+from scrapy.selector import Selector
+from scrapy.selector.lxmlsel import XmlXPathSelector, HtmlXPathSelector, XPathSelector

-class XPathSelectorTestCase(unittest.TestCase):

-    xs_cls = XPathSelector
-    hxs_cls = HtmlXPathSelector
-    xxs_cls = XmlXPathSelector
+class SelectorTestCase(unittest.TestCase):

-    @libxml2debug
-    def test_selector_simple(self):
+    sscls = Selector
+
+    def test_simple_selection(self):
        """Simple selector tests"""
        body = "<p><input name='a'value='1'/><input name='b'value='2'/></p>"
        response = TextResponse(url="http://example.com", body=body)
-        xpath = self.hxs_cls(response)
+        sel = self.sscls(response)

-        xl = xpath.select('//input')
+        xl = sel.xpath('//input')
        self.assertEqual(2, len(xl))
        for x in xl:
-            assert isinstance(x, self.hxs_cls)
+            assert isinstance(x, self.sscls)

-        self.assertEqual(xpath.select('//input').extract(),
-                         [x.extract() for x in xpath.select('//input')])
+        self.assertEqual(sel.xpath('//input').extract(),
+                         [x.extract() for x in sel.xpath('//input')])

-        self.assertEqual([x.extract() for x in xpath.select("//input[@name='a']/@name")],
+        self.assertEqual([x.extract() for x in sel.xpath("//input[@name='a']/@name")],
                         [u'a'])
-        self.assertEqual([x.extract() for x in xpath.select("number(concat(//input[@name='a']/@value, //input[@name='b']/@value))")],
+        self.assertEqual([x.extract() for x in sel.xpath("number(concat(//input[@name='a']/@value, //input[@name='b']/@value))")],
                         [u'12.0'])

-        self.assertEqual(xpath.select("concat('xpath', 'rules')").extract(),
+        self.assertEqual(sel.xpath("concat('xpath', 'rules')").extract(),
                         [u'xpathrules'])
-        self.assertEqual([x.extract() for x in xpath.select("concat(//input[@name='a']/@value, //input[@name='b']/@value)")],
+        self.assertEqual([x.extract() for x in sel.xpath("concat(//input[@name='a']/@value, //input[@name='b']/@value)")],
                         [u'12'])

-    @libxml2debug
-    def test_selector_unicode_query(self):
+    def test_select_unicode_query(self):
        body = u"<p><input name='\xa9' value='1'/></p>"
        response = TextResponse(url="http://example.com", body=body, encoding='utf8')
-        xpath = self.hxs_cls(response)
-        self.assertEqual(xpath.select(u'//input[@name="\xa9"]/@value').extract(), [u'1'])
+        sel = self.sscls(response)
+        self.assertEqual(sel.xpath(u'//input[@name="\xa9"]/@value').extract(), [u'1'])

-    @libxml2debug
-    def test_selector_same_type(self):
-        """Test XPathSelector returning the same type in x() method"""
+    def test_list_elements_type(self):
+        """Test Selector returning the same type in selection methods"""
        text = '<p>test<p>'
-        assert isinstance(self.xxs_cls(text=text).select("//p")[0],
-                          self.xxs_cls)
-        assert isinstance(self.hxs_cls(text=text).select("//p")[0], 
-                          self.hxs_cls)
+        assert isinstance(self.sscls(text=text).xpath("//p")[0], self.sscls)
+        assert isinstance(self.sscls(text=text).css("p")[0], self.sscls)

-    @libxml2debug
-    def test_selector_boolean_result(self):
+    def test_boolean_result(self):
        body = "<p><input name='a'value='1'/><input name='b'value='2'/></p>"
        response = TextResponse(url="http://example.com", body=body)
-        xs = self.hxs_cls(response)
-        self.assertEquals(xs.select("//input[@name='a']/@name='a'").extract(), [u'1'])
-        self.assertEquals(xs.select("//input[@name='a']/@name='n'").extract(), [u'0'])
-
-    @libxml2debug
-    def test_selector_xml_html(self):
-        """Test that XML and HTML XPathSelector's behave differently"""
+        xs = self.sscls(response)
+        self.assertEquals(xs.xpath("//input[@name='a']/@name='a'").extract(), [u'1'])
+        self.assertEquals(xs.xpath("//input[@name='a']/@name='n'").extract(), [u'0'])

+    def test_differences_parsing_xml_vs_html(self):
+        """Test that XML and HTML Selector's behave differently"""
        # some text which is parsed differently by XML and HTML flavors
        text = '<div><img src="a.jpg"><p>Hello</div>'
-
-        self.assertEqual(self.xxs_cls(text=text).select("//div").extract(),
-                         [u'<div><img src="a.jpg"><p>Hello</p></img></div>'])
-
-        self.assertEqual(self.hxs_cls(text=text).select("//div").extract(),
+        hs = self.sscls(text=text, type='html')
+        self.assertEqual(hs.xpath("//div").extract(),
                         [u'<div><img src="a.jpg"><p>Hello</p></div>'])

-    @libxml2debug
-    def test_selector_nested(self):
+        xs = self.sscls(text=text, type='xml')
+        self.assertEqual(xs.xpath("//div").extract(),
+                         [u'<div><img src="a.jpg"><p>Hello</p></img></div>'])
+
+    def test_flavor_detection(self):
+        text = '<div><img src="a.jpg"><p>Hello</div>'
+        sel = self.sscls(XmlResponse('http://example.com', body=text))
+        self.assertEqual(sel.type, 'xml')
+        self.assertEqual(sel.xpath("//div").extract(),
+                         [u'<div><img src="a.jpg"><p>Hello</p></img></div>'])
+
+        sel = self.sscls(HtmlResponse('http://example.com', body=text))
+        self.assertEqual(sel.type, 'html')
+        self.assertEqual(sel.xpath("//div").extract(),
+                         [u'<div><img src="a.jpg"><p>Hello</p></div>'])
+
+    def test_nested_selectors(self):
        """Nested selector tests"""
        body = """<body>
                    <div class='one'>
@ -97,26 +95,30 @@ class XPathSelectorTestCase(unittest.TestCase):
                  </body>"""

        response = HtmlResponse(url="http://example.com", body=body)
-        x = self.hxs_cls(response)
-
-        divtwo = x.select('//div[@class="two"]')
-        self.assertEqual(map(unicode.strip, divtwo.select("//li").extract()),
+        x = self.sscls(response)
+        divtwo = x.xpath('//div[@class="two"]')
+        self.assertEqual(divtwo.xpath("//li").extract(),
                         ["<li>one</li>", "<li>two</li>", "<li>four</li>", "<li>five</li>", "<li>six</li>"])
-        self.assertEqual(map(unicode.strip, divtwo.select("./ul/li").extract()),
+        self.assertEqual(divtwo.xpath("./ul/li").extract(),
                         ["<li>four</li>", "<li>five</li>", "<li>six</li>"])
-        self.assertEqual(map(unicode.strip, divtwo.select(".//li").extract()),
+        self.assertEqual(divtwo.xpath(".//li").extract(),
                         ["<li>four</li>", "<li>five</li>", "<li>six</li>"])
-        self.assertEqual(divtwo.select("./li").extract(),
-                         [])
+        self.assertEqual(divtwo.xpath("./li").extract(), [])
+
+    def test_mixed_nested_selectors(self):
+        body = '''<body>
+                    <div id=1>not<span>me</span></div>
+                    <div class="dos"><p>text</p><a href='#'>foo</a></div>
+               </body>'''
+        sel = self.sscls(text=body)
+        self.assertEqual(sel.xpath('//div[@id="1"]').css('span::text').extract(), [u'me'])
+        self.assertEqual(sel.css('#1').xpath('./span/text()').extract(), [u'me'])

-    @libxml2debug
    def test_dont_strip(self):
-        hxs = self.hxs_cls(text='<div>fff: <a href="#">zzz</a></div>')
-        self.assertEqual(hxs.select("//text()").extract(),
-            [u'fff: ', u'zzz'])
+        sel = self.sscls(text='<div>fff: <a href="#">zzz</a></div>')
+        self.assertEqual(sel.xpath("//text()").extract(), [u'fff: ', u'zzz'])

-    @libxml2debug
-    def test_selector_namespaces_simple(self):
+    def test_namespaces_simple(self):
        body = """
        <test xmlns:somens="http://scrapy.org">
           <somens:a id="foo">take this</a>
@ -125,14 +127,13 @@ class XPathSelectorTestCase(unittest.TestCase):
        """

        response = XmlResponse(url="http://example.com", body=body)
-        x = self.xxs_cls(response)
+        x = self.sscls(response)

        x.register_namespace("somens", "http://scrapy.org")
-        self.assertEqual(x.select("//somens:a/text()").extract(),
+        self.assertEqual(x.xpath("//somens:a/text()").extract(),
                         [u'take this'])

-    @libxml2debug
-    def test_selector_namespaces_multiple(self):
+    def test_namespaces_multiple(self):
        body = """<?xml version="1.0" encoding="UTF-8"?>
 <BrowseNode xmlns="http://webservices.amazon.com/AWSECommerceService/2005-10-05"
            xmlns:b="http://somens.com"
@ -143,20 +144,18 @@ class XPathSelectorTestCase(unittest.TestCase):
 </BrowseNode>
        """
        response = XmlResponse(url="http://example.com", body=body)
-        x = self.xxs_cls(response)
-
+        x = self.sscls(response)
        x.register_namespace("xmlns", "http://webservices.amazon.com/AWSECommerceService/2005-10-05")
        x.register_namespace("p", "http://www.scrapy.org/product")
        x.register_namespace("b", "http://somens.com")
-        self.assertEqual(len(x.select("//xmlns:TestTag")), 1)
-        self.assertEqual(x.select("//b:Operation/text()").extract()[0], 'hello')
-        self.assertEqual(x.select("//xmlns:TestTag/@b:att").extract()[0], 'value')
-        self.assertEqual(x.select("//p:SecondTestTag/xmlns:price/text()").extract()[0], '90')
-        self.assertEqual(x.select("//p:SecondTestTag").select("./xmlns:price/text()")[0].extract(), '90')
-        self.assertEqual(x.select("//p:SecondTestTag/xmlns:material/text()").extract()[0], 'iron')
+        self.assertEqual(len(x.xpath("//xmlns:TestTag")), 1)
+        self.assertEqual(x.xpath("//b:Operation/text()").extract()[0], 'hello')
+        self.assertEqual(x.xpath("//xmlns:TestTag/@b:att").extract()[0], 'value')
+        self.assertEqual(x.xpath("//p:SecondTestTag/xmlns:price/text()").extract()[0], '90')
+        self.assertEqual(x.xpath("//p:SecondTestTag").xpath("./xmlns:price/text()")[0].extract(), '90')
+        self.assertEqual(x.xpath("//p:SecondTestTag/xmlns:material/text()").extract()[0], 'iron')

-    @libxml2debug
-    def test_selector_re(self):
+    def test_re(self):
        body = """<div>Name: Mary
                    <ul>
                      <li>Name: John</li>
@ -165,47 +164,35 @@ class XPathSelectorTestCase(unittest.TestCase):
                      <li>Age: 20</li>
                    </ul>
                    Age: 20
-                  </div>
-
-               """
+                  </div>"""
        response = HtmlResponse(url="http://example.com", body=body)
-        x = self.hxs_cls(response)
+        x = self.sscls(response)

        name_re = re.compile("Name: (\w+)")
-        self.assertEqual(x.select("//ul/li").re(name_re),
+        self.assertEqual(x.xpath("//ul/li").re(name_re),
                         ["John", "Paul"])
-        self.assertEqual(x.select("//ul/li").re("Age: (\d+)"),
+        self.assertEqual(x.xpath("//ul/li").re("Age: (\d+)"),
                         ["10", "20"])

-    @libxml2debug
-    def test_selector_re_intl(self):
+    def test_re_intl(self):
        body = """<div>Evento: cumplea\xc3\xb1os</div>"""
        response = HtmlResponse(url="http://example.com", body=body, encoding='utf-8')
-        x = self.hxs_cls(response)
-        self.assertEqual(x.select("//div").re("Evento: (\w+)"), [u'cumplea\xf1os'])
+        x = self.sscls(response)
+        self.assertEqual(x.xpath("//div").re("Evento: (\w+)"), [u'cumplea\xf1os'])

-    @libxml2debug
    def test_selector_over_text(self):
-        hxs = self.hxs_cls(text='<root>lala</root>')
-        self.assertEqual(hxs.extract(),
-                         u'<html><body><root>lala</root></body></html>')
+        hs = self.sscls(text='<root>lala</root>')
+        self.assertEqual(hs.extract(), u'<html><body><root>lala</root></body></html>')
+        xs = self.sscls(text='<root>lala</root>', type='xml')
+        self.assertEqual(xs.extract(), u'<root>lala</root>')
+        self.assertEqual(xs.xpath('.').extract(), [u'<root>lala</root>'])

-        xxs = self.xxs_cls(text='<root>lala</root>')
-        self.assertEqual(xxs.extract(),
-                         u'<root>lala</root>')
-
-        xxs = self.xxs_cls(text='<root>lala</root>')
-        self.assertEqual(xxs.select('.').extract(),
-                         [u'<root>lala</root>'])
-
-
-    @libxml2debug
-    def test_selector_invalid_xpath(self):
+    def test_invalid_xpath(self):
        response = XmlResponse(url="http://example.com", body="<html></html>")
-        x = self.hxs_cls(response)
+        x = self.sscls(response)
        xpath = "//test[@foo='bar]"
        try:
-            x.select(xpath)
+            x.xpath(xpath)
        except ValueError, e:
            assert xpath in str(e), "Exception message does not contain invalid xpath"
        except Exception:
@ -213,7 +200,6 @@ class XPathSelectorTestCase(unittest.TestCase):
        else:
            raise AssertionError("A invalid XPath does not raise an exception")

-    @libxml2debug
    def test_http_header_encoding_precedence(self):
        # u'\xa3'     = pound symbol in unicode
        # u'\xc2\xa3' = pound symbol in utf-8
@ -229,71 +215,121 @@ class XPathSelectorTestCase(unittest.TestCase):

        headers = {'Content-Type': ['text/html; charset=utf-8']}
        response = HtmlResponse(url="http://example.com", headers=headers, body=html_utf8)
-        x = self.hxs_cls(response)
-        self.assertEquals(x.select("//span[@id='blank']/text()").extract(),
+        x = self.sscls(response)
+        self.assertEquals(x.xpath("//span[@id='blank']/text()").extract(),
                          [u'\xa3'])

-    @libxml2debug
    def test_empty_bodies(self):
        # shouldn't raise errors
        r1 = TextResponse('http://www.example.com', body='')
-        self.hxs_cls(r1).select('//text()').extract()
-        self.xxs_cls(r1).select('//text()').extract()
+        self.sscls(r1).xpath('//text()').extract()

-    @libxml2debug
    def test_null_bytes(self):
        # shouldn't raise errors
        r1 = TextResponse('http://www.example.com', \
                          body='<root>pre\x00post</root>', \
                          encoding='utf-8')
-        self.hxs_cls(r1).select('//text()').extract()
-        self.xxs_cls(r1).select('//text()').extract()
+        self.sscls(r1).xpath('//text()').extract()

-    @libxml2debug
    def test_badly_encoded_body(self):
        # \xe9 alone isn't valid utf8 sequence
        r1 = TextResponse('http://www.example.com', \
                          body='<html><p>an Jos\xe9 de</p><html>', \
                          encoding='utf-8')
-        self.hxs_cls(r1).select('//text()').extract()
-        self.xxs_cls(r1).select('//text()').extract()
+        self.sscls(r1).xpath('//text()').extract()

-    @libxml2debug
    def test_select_on_unevaluable_nodes(self):
-        r = self.hxs_cls(text=u'<span class="big">some text</span>')
+        r = self.sscls(text=u'<span class="big">some text</span>')
        # Text node
-        x1 = r.select('//text()')
+        x1 = r.xpath('//text()')
        self.assertEquals(x1.extract(), [u'some text'])
-        self.assertEquals(x1.select('.//b').extract(), [])
+        self.assertEquals(x1.xpath('.//b').extract(), [])
        # Tag attribute
-        x1 = r.select('//span/@class')
+        x1 = r.xpath('//span/@class')
        self.assertEquals(x1.extract(), [u'big'])
-        self.assertEquals(x1.select('.//text()').extract(), [])
+        self.assertEquals(x1.xpath('.//text()').extract(), [])

-    @libxml2debug
    def test_select_on_text_nodes(self):
-        r = self.hxs_cls(text=u'<div><b>Options:</b>opt1</div><div><b>Other</b>opt2</div>')
-        x1 = r.select("//div/descendant::text()[preceding-sibling::b[contains(text(), 'Options')]]")
+        r = self.sscls(text=u'<div><b>Options:</b>opt1</div><div><b>Other</b>opt2</div>')
+        x1 = r.xpath("//div/descendant::text()[preceding-sibling::b[contains(text(), 'Options')]]")
        self.assertEquals(x1.extract(), [u'opt1'])

-        x1 = r.select("//div/descendant::text()/preceding-sibling::b[contains(text(), 'Options')]")
+        x1 = r.xpath("//div/descendant::text()/preceding-sibling::b[contains(text(), 'Options')]")
        self.assertEquals(x1.extract(), [u'<b>Options:</b>'])

-    @libxml2debug
    def test_nested_select_on_text_nodes(self):
        # FIXME: does not work with lxml backend [upstream]
-        r = self.hxs_cls(text=u'<div><b>Options:</b>opt1</div><div><b>Other</b>opt2</div>')
-        x1 = r.select("//div/descendant::text()")
-        x2 = x1.select("./preceding-sibling::b[contains(text(), 'Options')]")
-
+        r = self.sscls(text=u'<div><b>Options:</b>opt1</div><div><b>Other</b>opt2</div>')
+        x1 = r.xpath("//div/descendant::text()")
+        x2 = x1.xpath("./preceding-sibling::b[contains(text(), 'Options')]")
        self.assertEquals(x2.extract(), [u'<b>Options:</b>'])
-    test_nested_select_on_text_nodes.skip = True
+    test_nested_select_on_text_nodes.skip = "Text nodes lost parent node reference in lxml"

-    @libxml2debug
    def test_weakref_slots(self):
        """Check that classes are using slots and are weak-referenceable"""
-        for cls in [self.xs_cls, self.hxs_cls, self.xxs_cls]:
-            x = cls()
-            weakref.ref(x)
-            assert not hasattr(x, '__dict__'), "%s does not use __slots__" % \
-                x.__class__.__name__
+        x = self.sscls()
+        weakref.ref(x)
+        assert not hasattr(x, '__dict__'), "%s does not use __slots__" % \
+            x.__class__.__name__
+
+    def test_remove_namespaces(self):
+        xml = """<?xml version="1.0" encoding="UTF-8"?>
+<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-US" xmlns:media="http://search.yahoo.com/mrss/">
+  <link type="text/html">
+  <link type="application/atom+xml">
+</feed>
+"""
+        sel = self.sscls(XmlResponse("http://example.com/feed.atom", body=xml))
+        self.assertEqual(len(sel.xpath("//link")), 0)
+        sel.remove_namespaces()
+        self.assertEqual(len(sel.xpath("//link")), 2)
+
+    def test_remove_attributes_namespaces(self):
+        xml = """<?xml version="1.0" encoding="UTF-8"?>
+<feed xmlns:atom="http://www.w3.org/2005/Atom" xml:lang="en-US" xmlns:media="http://search.yahoo.com/mrss/">
+  <link atom:type="text/html">
+  <link atom:type="application/atom+xml">
+</feed>
+"""
+        sel = self.sscls(XmlResponse("http://example.com/feed.atom", body=xml))
+        self.assertEqual(len(sel.xpath("//link/@type")), 0)
+        sel.remove_namespaces()
+        self.assertEqual(len(sel.xpath("//link/@type")), 2)
+
+
+class DeprecatedXpathSelectorTest(unittest.TestCase):
+
+    text = '<div><img src="a.jpg"><p>Hello</div>'
+
+    def test_warnings(self):
+        for cls in XPathSelector, HtmlXPathSelector, XPathSelector:
+            with warnings.catch_warnings(record=True) as w:
+                warnings.simplefilter('always')
+                hs = cls(text=self.text)
+                assert len(w) == 1, w
+                assert issubclass(w[0].category, ScrapyDeprecationWarning)
+                assert 'deprecated' in str(w[-1].message)
+                hs.select("//div").extract()
+                assert issubclass(w[1].category, ScrapyDeprecationWarning)
+                assert 'deprecated' in str(w[-1].message)
+
+    def test_xpathselector(self):
+        with warnings.catch_warnings(record=True):
+            hs = XPathSelector(text=self.text)
+            self.assertEqual(hs.select("//div").extract(),
+                             [u'<div><img src="a.jpg"><p>Hello</p></div>'])
+            self.assertRaises(RuntimeError, hs.css, 'div')
+
+    def test_htmlxpathselector(self):
+        with warnings.catch_warnings(record=True):
+            hs = HtmlXPathSelector(text=self.text)
+            self.assertEqual(hs.select("//div").extract(),
+                             [u'<div><img src="a.jpg"><p>Hello</p></div>'])
+            self.assertRaises(RuntimeError, hs.css, 'div')
+
+    def test_xmlxpathselector(self):
+        with warnings.catch_warnings(record=True):
+            xs = XmlXPathSelector(text=self.text)
+            self.assertEqual(xs.select("//div").extract(),
+                             [u'<div><img src="a.jpg"><p>Hello</p></img></div>'])
+            self.assertRaises(RuntimeError, xs.css, 'div')
--- a/scrapy/tests/test_selector_csstranslator.py
+++ b/scrapy/tests/test_selector_csstranslator.py
@ -0,0 +1,153 @@
+"""
+Selector tests for cssselect backend
+"""
+from twisted.trial import unittest
+from scrapy.http import HtmlResponse
+from scrapy.selector.csstranslator import ScrapyHTMLTranslator
+from scrapy.selector import Selector
+from cssselect.parser import SelectorSyntaxError
+from cssselect.xpath import ExpressionError
+
+
+HTMLBODY = '''
+<html>
+<body>
+<div>
+ <a id="name-anchor" name="foo"></a>
+ <a id="tag-anchor" rel="tag" href="http://localhost/foo">link</a>
+ <a id="nofollow-anchor" rel="nofollow" href="https://example.org"> link</a>
+ <p id="paragraph">
+   lorem ipsum text
+   <b id="p-b">hi</b> <em id="p-em">there</em>
+   <b id="p-b2">guy</b>
+   <input type="checkbox" id="checkbox-unchecked" />
+   <input type="checkbox" id="checkbox-disabled" disabled="" />
+   <input type="text" id="text-checked" checked="checked" />
+   <input type="hidden" />
+   <input type="hidden" disabled="disabled" />
+   <input type="checkbox" id="checkbox-checked" checked="checked" />
+   <input type="checkbox" id="checkbox-disabled-checked"
+          disabled="disabled" checked="checked" />
+   <fieldset id="fieldset" disabled="disabled">
+     <input type="checkbox" id="checkbox-fieldset-disabled" />
+     <input type="hidden" />
+   </fieldset>
+ </p>
+ <map name="dummymap">
+   <area shape="circle" coords="200,250,25" href="foo.html" id="area-href" />
+   <area shape="default" id="area-nohref" />
+ </map>
+</div>
+<div class="cool-footer" id="foobar-div" foobar="ab bc cde">
+    <span id="foobar-span">foo ter</span>
+</div>
+</body></html>
+'''
+
+
+class TranslatorMixinTest(unittest.TestCase):
+
+    tr_cls = ScrapyHTMLTranslator
+
+    def setUp(self):
+        self.tr = self.tr_cls()
+        self.c2x = self.tr.css_to_xpath
+
+    def test_attr_function(self):
+        cases = [
+            ('::attr(name)', u'descendant-or-self::*/@name'),
+            ('a::attr(href)', u'descendant-or-self::a/@href'),
+            ('a ::attr(img)', u'descendant-or-self::a/descendant-or-self::*/@img'),
+            ('a > ::attr(class)', u'descendant-or-self::a/*/@class'),
+        ]
+        for css, xpath in cases:
+            self.assertEqual(self.c2x(css), xpath, css)
+
+    def test_attr_function_exception(self):
+        cases = [
+            ('::attr(12)', ExpressionError),
+            ('::attr(34test)', ExpressionError),
+            ('::attr(@href)', SelectorSyntaxError),
+        ]
+        for css, exc in cases:
+            self.assertRaises(exc, self.c2x, css)
+
+    def test_text_pseudo_element(self):
+        cases = [
+            ('::text', u'descendant-or-self::text()'),
+            ('p::text', u'descendant-or-self::p/text()'),
+            ('p ::text', u'descendant-or-self::p/descendant-or-self::text()'),
+            ('#id::text', u"descendant-or-self::*[@id = 'id']/text()"),
+            ('p#id::text', u"descendant-or-self::p[@id = 'id']/text()"),
+            ('p#id ::text', u"descendant-or-self::p[@id = 'id']/descendant-or-self::text()"),
+            ('p#id > ::text', u"descendant-or-self::p[@id = 'id']/*/text()"),
+            ('p#id ~ ::text', u"descendant-or-self::p[@id = 'id']/following-sibling::*/text()"),
+            ('a[href]::text', u'descendant-or-self::a[@href]/text()'),
+            ('a[href] ::text', u'descendant-or-self::a[@href]/descendant-or-self::text()'),
+            ('p::text, a::text', u"descendant-or-self::p/text() | descendant-or-self::a/text()"),
+        ]
+        for css, xpath in cases:
+            self.assertEqual(self.c2x(css), xpath, css)
+
+    def test_pseudo_function_exception(self):
+        cases = [
+            ('::attribute(12)', ExpressionError),
+            ('::text()', ExpressionError),
+            ('::attr(@href)', SelectorSyntaxError),
+        ]
+        for css, exc in cases:
+            self.assertRaises(exc, self.c2x, css)
+
+    def test_unknown_pseudo_element(self):
+        cases = [
+            ('::text-node', ExpressionError),
+        ]
+        for css, exc in cases:
+            self.assertRaises(exc, self.c2x, css)
+
+    def test_unknown_pseudo_class(self):
+        cases = [
+            (':text', ExpressionError),
+            (':attribute(name)', ExpressionError),
+        ]
+        for css, exc in cases:
+            self.assertRaises(exc, self.c2x, css)
+
+
+class CSSSelectorTest(unittest.TestCase):
+
+    sscls = Selector
+
+    def setUp(self):
+        self.htmlresponse = HtmlResponse('http://example.com', body=HTMLBODY)
+        self.sel = self.sscls(self.htmlresponse)
+
+    def x(self, *a, **kw):
+        return [v.strip() for v in self.sel.css(*a, **kw).extract() if v.strip()]
+
+    def test_selector_simple(self):
+        for x in self.sel.css('input'):
+            self.assertTrue(isinstance(x, self.sel.__class__), x)
+        self.assertEqual(self.sel.css('input').extract(),
+                         [x.extract() for x in self.sel.css('input')])
+
+    def test_text_pseudo_element(self):
+        self.assertEqual(self.x('#p-b2'), [u'<b id="p-b2">guy</b>'])
+        self.assertEqual(self.x('#p-b2::text'), [u'guy'])
+        self.assertEqual(self.x('#p-b2 ::text'), [u'guy'])
+        self.assertEqual(self.x('#paragraph::text'), [u'lorem ipsum text'])
+        self.assertEqual(self.x('#paragraph ::text'), [u'lorem ipsum text', u'hi', u'there', u'guy'])
+        self.assertEqual(self.x('p::text'), [u'lorem ipsum text'])
+        self.assertEqual(self.x('p ::text'), [u'lorem ipsum text', u'hi', u'there', u'guy'])
+
+    def test_attribute_function(self):
+        self.assertEqual(self.x('#p-b2::attr(id)'), [u'p-b2'])
+        self.assertEqual(self.x('.cool-footer::attr(class)'), [u'cool-footer'])
+        self.assertEqual(self.x('.cool-footer ::attr(id)'), [u'foobar-div', u'foobar-span'])
+        self.assertEqual(self.x('map[name="dummymap"] ::attr(shape)'), [u'circle', u'default'])
+
+    def test_nested_selector(self):
+        self.assertEqual(self.sel.css('p').css('b::text').extract(),
+                         [u'hi', u'guy'])
+        self.assertEqual(self.sel.css('div').css('area:last-child').extract(),
+                         [u'<area shape="default" id="area-nohref">'])
--- a/scrapy/tests/test_selector_libxml2.py
+++ b/scrapy/tests/test_selector_libxml2.py
@ -1,98 +0,0 @@
-"""
-Selectors tests, specific for libxml2 backend
-"""
-
-from twisted.trial import unittest
-from scrapy import optional_features
-
-
-from scrapy.http import TextResponse, HtmlResponse, XmlResponse
-from scrapy.selector.libxml2sel import XmlXPathSelector, HtmlXPathSelector, \
-    XPathSelector
-from scrapy.selector.libxml2document import Libxml2Document
-from scrapy.utils.test import libxml2debug
-from scrapy.tests import test_selector
-
-
-class Libxml2XPathSelectorTestCase(test_selector.XPathSelectorTestCase):
-
-    xs_cls = XPathSelector
-    hxs_cls = HtmlXPathSelector
-    xxs_cls = XmlXPathSelector
-
-    skip = 'libxml2' not in optional_features
-
-    @libxml2debug
-    def test_null_bytes(self):
-        hxs = HtmlXPathSelector(text='<root>la\x00la</root>')
-        self.assertEqual(hxs.extract(),
-                         u'<html><body><root>lala</root></body></html>')
-
-        xxs = XmlXPathSelector(text='<root>la\x00la</root>')
-        self.assertEqual(xxs.extract(),
-                         u'<root>lala</root>')
-
-    @libxml2debug
-    def test_unquote(self):
-        xmldoc = '\n'.join((
-            '<root>',
-            '  lala',
-            '  <node>',
-            '    blabla&amp;more<!--comment-->a<b>test</b>oh',
-            '    <![CDATA[lalalal&ppppp<b>PPPP</b>ppp&amp;la]]>',
-            '  </node>',
-            '  pff',
-            '</root>'))
-        xxs = XmlXPathSelector(text=xmldoc)
-
-        self.assertEqual(xxs.extract_unquoted(), u'')
-
-        self.assertEqual(xxs.select('/root').extract_unquoted(), [u''])
-        self.assertEqual(xxs.select('/root/text()').extract_unquoted(), [
-            u'\n  lala\n  ',
-            u'\n  pff\n'])
-
-        self.assertEqual(xxs.select('//*').extract_unquoted(), [u'', u'', u''])
-        self.assertEqual(xxs.select('//text()').extract_unquoted(), [
-            u'\n  lala\n  ',
-            u'\n    blabla&more',
-            u'a',
-            u'test',
-            u'oh\n    ',
-            u'lalalal&ppppp<b>PPPP</b>ppp&amp;la',
-            u'\n  ',
-            u'\n  pff\n'])
-
-
-class Libxml2DocumentTest(unittest.TestCase):
-
-    skip = 'libxml2' not in optional_features
-
-    @libxml2debug
-    def test_response_libxml2_caching(self):
-        r1 = HtmlResponse('http://www.example.com', body='<html><head></head><body></body></html>')
-        r2 = r1.copy()
-
-        doc1 = Libxml2Document(r1)
-        doc2 = Libxml2Document(r1)
-        doc3 = Libxml2Document(r2)
-
-        # make sure it's cached
-        assert doc1 is doc2
-        assert doc1.xmlDoc is doc2.xmlDoc
-        assert doc1 is not doc3
-        assert doc1.xmlDoc is not doc3.xmlDoc
-
-        # don't leave libxml2 documents in memory to avoid wrong libxml2 leaks reports
-        del doc1, doc2, doc3
-
-    @libxml2debug
-    def test_null_char(self):
-        # make sure bodies with null char ('\x00') don't raise a TypeError exception
-        self.body_content = 'test problematic \x00 body'
-        response = TextResponse('http://example.com/catalog/product/blabla-123',
-                            headers={'Content-Type': 'text/plain; charset=utf-8'}, body=self.body_content)
-        Libxml2Document(response)
-
-if __name__ == "__main__":
-    unittest.main()
--- a/scrapy/tests/test_selector_lxml.py
+++ b/scrapy/tests/test_selector_lxml.py
@ -1,64 +0,0 @@
-"""
-Selectors tests, specific for lxml backend
-"""
-
-import unittest
-from scrapy.tests import test_selector
-from scrapy.http import TextResponse, HtmlResponse, XmlResponse
-from scrapy.selector.lxmldocument import LxmlDocument
-from scrapy.selector.lxmlsel import XmlXPathSelector, HtmlXPathSelector, XPathSelector
-
-
-class LxmlXPathSelectorTestCase(test_selector.XPathSelectorTestCase):
-
-    xs_cls = XPathSelector
-    hxs_cls = HtmlXPathSelector
-    xxs_cls = XmlXPathSelector
-
-    def test_remove_namespaces(self):
-        xml = """<?xml version="1.0" encoding="UTF-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-US" xmlns:media="http://search.yahoo.com/mrss/">
-  <link type="text/html">
-  <link type="application/atom+xml">
-</feed>
-"""
-        xxs = XmlXPathSelector(XmlResponse("http://example.com/feed.atom", body=xml))
-        self.assertEqual(len(xxs.select("//link")), 0)
-        xxs.remove_namespaces()
-        self.assertEqual(len(xxs.select("//link")), 2)
-
-    def test_remove_attributes_namespaces(self):
-        xml = """<?xml version="1.0" encoding="UTF-8"?>
-<feed xmlns:atom="http://www.w3.org/2005/Atom" xml:lang="en-US" xmlns:media="http://search.yahoo.com/mrss/">
-  <link atom:type="text/html">
-  <link atom:type="application/atom+xml">
-</feed>
-"""
-        xxs = XmlXPathSelector(XmlResponse("http://example.com/feed.atom", body=xml))
-        self.assertEqual(len(xxs.select("//link/@type")), 0)
-        xxs.remove_namespaces()
-        self.assertEqual(len(xxs.select("//link/@type")), 2)
-
-class Libxml2DocumentTest(unittest.TestCase):
-
-    def test_caching(self):
-        r1 = HtmlResponse('http://www.example.com', body='<html><head></head><body></body></html>')
-        r2 = r1.copy()
-
-        doc1 = LxmlDocument(r1)
-        doc2 = LxmlDocument(r1)
-        doc3 = LxmlDocument(r2)
-
-        # make sure it's cached
-        assert doc1 is doc2
-        assert doc1 is not doc3
-
-        # don't leave documents in memory to avoid wrong libxml2 leaks reports
-        del doc1, doc2, doc3
-
-    def test_null_char(self):
-        # make sure bodies with null char ('\x00') don't raise a TypeError exception
-        self.body_content = 'test problematic \x00 body'
-        response = TextResponse('http://example.com/catalog/product/blabla-123',
-                            headers={'Content-Type': 'text/plain; charset=utf-8'}, body=self.body_content)
-        LxmlDocument(response)
--- a/scrapy/tests/test_selector_lxmldocument.py
+++ b/scrapy/tests/test_selector_lxmldocument.py
@ -0,0 +1,26 @@
+import unittest
+from scrapy.selector.lxmldocument import LxmlDocument
+from scrapy.http import TextResponse, HtmlResponse
+
+
+class LxmlDocumentTest(unittest.TestCase):
+
+    def test_caching(self):
+        r1 = HtmlResponse('http://www.example.com', body='<html><head></head><body></body></html>')
+        r2 = r1.copy()
+
+        doc1 = LxmlDocument(r1)
+        doc2 = LxmlDocument(r1)
+        doc3 = LxmlDocument(r2)
+
+        # make sure it's cached
+        assert doc1 is doc2
+        assert doc1 is not doc3
+
+    def test_null_char(self):
+        # make sure bodies with null char ('\x00') don't raise a TypeError exception
+        body = 'test problematic \x00 body'
+        response = TextResponse('http://example.com/catalog/product/blabla-123',
+                                headers={'Content-Type': 'text/plain; charset=utf-8'},
+                                body=body)
+        LxmlDocument(response)
--- a/scrapy/tests/test_spider.py
+++ b/scrapy/tests/test_spider.py
@ -70,10 +70,10 @@ class XMLFeedSpiderTest(BaseSpiderTest):

            def parse_node(self, response, selector):
                yield {
-                    'loc': selector.select('a:loc/text()').extract(),
-                    'updated': selector.select('b:updated/text()').extract(),
-                    'other': selector.select('other/@value').extract(),
-                    'custom': selector.select('other/@b:custom').extract(),
+                    'loc': selector.xpath('a:loc/text()').extract(),
+                    'updated': selector.xpath('b:updated/text()').extract(),
+                    'other': selector.xpath('other/@value').extract(),
+                    'custom': selector.xpath('other/@b:custom').extract(),
                }

        for iterator in ('iternodes', 'xml'):
--- a/scrapy/tests/test_utils_iterators.py
+++ b/scrapy/tests/test_utils_iterators.py
@ -28,7 +28,7 @@ class XmliterTestCase(unittest.TestCase):
        response = XmlResponse(url="http://example.com", body=body)
        attrs = []
        for x in self.xmliter(response, 'product'):
-            attrs.append((x.select("@id").extract(), x.select("name/text()").extract(), x.select("./type/text()").extract()))
+            attrs.append((x.xpath("@id").extract(), x.xpath("name/text()").extract(), x.xpath("./type/text()").extract()))

        self.assertEqual(attrs,
                         [(['001'], ['Name 1'], ['Type 1']), (['002'], ['Name 2'], ['Type 2'])])
@ -36,7 +36,7 @@ class XmliterTestCase(unittest.TestCase):
    def test_xmliter_text(self):
        body = u"""<?xml version="1.0" encoding="UTF-8"?><products><product>one</product><product>two</product></products>"""

-        self.assertEqual([x.select("text()").extract() for x in self.xmliter(body, 'product')],
+        self.assertEqual([x.xpath("text()").extract() for x in self.xmliter(body, 'product')],
                         [[u'one'], [u'two']])

    def test_xmliter_namespaces(self):
@ -63,15 +63,15 @@ class XmliterTestCase(unittest.TestCase):

        node = my_iter.next()
        node.register_namespace('g', 'http://base.google.com/ns/1.0')
-        self.assertEqual(node.select('title/text()').extract(), ['Item 1'])
-        self.assertEqual(node.select('description/text()').extract(), ['This is item 1'])
-        self.assertEqual(node.select('link/text()').extract(), ['http://www.mydummycompany.com/items/1'])
-        self.assertEqual(node.select('g:image_link/text()').extract(), ['http://www.mydummycompany.com/images/item1.jpg'])
-        self.assertEqual(node.select('g:id/text()').extract(), ['ITEM_1'])
-        self.assertEqual(node.select('g:price/text()').extract(), ['400'])
-        self.assertEqual(node.select('image_link/text()').extract(), [])
-        self.assertEqual(node.select('id/text()').extract(), [])
-        self.assertEqual(node.select('price/text()').extract(), [])
+        self.assertEqual(node.xpath('title/text()').extract(), ['Item 1'])
+        self.assertEqual(node.xpath('description/text()').extract(), ['This is item 1'])
+        self.assertEqual(node.xpath('link/text()').extract(), ['http://www.mydummycompany.com/items/1'])
+        self.assertEqual(node.xpath('g:image_link/text()').extract(), ['http://www.mydummycompany.com/images/item1.jpg'])
+        self.assertEqual(node.xpath('g:id/text()').extract(), ['ITEM_1'])
+        self.assertEqual(node.xpath('g:price/text()').extract(), ['400'])
+        self.assertEqual(node.xpath('image_link/text()').extract(), [])
+        self.assertEqual(node.xpath('id/text()').extract(), [])
+        self.assertEqual(node.xpath('price/text()').extract(), [])

    def test_xmliter_exception(self):
        body = u"""<?xml version="1.0" encoding="UTF-8"?><products><product>one</product><product>two</product></products>"""
@ -123,9 +123,9 @@ class LxmlXmliterTestCase(XmliterTestCase):

        namespace_iter = self.xmliter(response, 'image_link', 'http://base.google.com/ns/1.0')
        node = namespace_iter.next()
-        self.assertEqual(node.select('text()').extract(), ['http://www.mydummycompany.com/images/item1.jpg'])
+        self.assertEqual(node.xpath('text()').extract(), ['http://www.mydummycompany.com/images/item1.jpg'])
        node = namespace_iter.next()
-        self.assertEqual(node.select('text()').extract(), ['http://www.mydummycompany.com/images/item2.jpg'])
+        self.assertEqual(node.xpath('text()').extract(), ['http://www.mydummycompany.com/images/item2.jpg'])


 class UtilsCsvTestCase(unittest.TestCase):
--- a/scrapy/utils/iterators.py
+++ b/scrapy/utils/iterators.py
@ -2,14 +2,14 @@ import re, csv
 from cStringIO import StringIO

 from scrapy.http import TextResponse
-from scrapy.selector import XmlXPathSelector
+from scrapy.selector import Selector
 from scrapy import log
 from scrapy.utils.python import re_rsearch, str_to_unicode
 from scrapy.utils.response import body_or_str


 def xmliter(obj, nodename):
-    """Return a iterator of XPathSelector's over all nodes of a XML document,
+    """Return a iterator of Selector's over all nodes of a XML document,
       given tha name of the node to iterate. Useful for parsing XML feeds.

    obj can be:
@ -29,7 +29,7 @@ def xmliter(obj, nodename):
    r = re.compile(r"<%s[\s>].*?</%s>" % (nodename, nodename), re.DOTALL)
    for match in r.finditer(text):
        nodetext = header_start + match.group() + header_end
-        yield XmlXPathSelector(text=nodetext).select('//' + nodename)[0]
+        yield Selector(text=nodetext, type='xml').xpath('//' + nodename)[0]


 def csviter(obj, delimiter=None, headers=None, encoding=None):
--- a/scrapy/utils/test.py
+++ b/scrapy/utils/test.py
@ -6,30 +6,6 @@ import os

 from twisted.trial.unittest import SkipTest

-def libxml2debug(testfunction):
-    """Decorator for debugging libxml2 memory leaks inside a function.
-    
-    We've found libxml2 memory leaks are something very weird, and can happen
-    sometimes depending on the order where tests are run. So this decorator
-    enables libxml2 memory leaks debugging only when the environment variable
-    LIBXML2_DEBUGLEAKS is set.
-
-    """
-    try:
-        import libxml2
-    except ImportError:
-        return testfunction
-    def newfunc(*args, **kwargs):
-        libxml2.debugMemory(1)
-        testfunction(*args, **kwargs)
-        libxml2.cleanupParser()
-        leaked_bytes = libxml2.debugMemory(0) 
-        assert leaked_bytes == 0, "libxml2 memory leak detected: %d bytes" % leaked_bytes
-
-    if 'LIBXML2_DEBUGLEAKS' in os.environ:
-        return newfunc
-    else:
-        return testfunction

 def assert_aws_environ():
    """Asserts the current environment is suitable for running AWS testsi.
--- a/sep/sep-020.rst
+++ b/sep/sep-020.rst
@ -91,9 +91,9 @@ This is a working code sample that covers just the basics.
            """ Pull the text label out of selected markup

            :param entity: Found markup
-            :type entity: HtmlXPathSelector
+            :type entity: Selector
            """
-            label = ' '.join(entity.select('.//text()').extract())
+            label = ' '.join(entity.xpath('.//text()').extract())
            label = label.encode('ascii', 'xmlcharrefreplace') if label else ''
            label = label.strip('&#160;') if '&#160;' in label else label
            label = label.strip(':') if ':' in label else label
@ -108,7 +108,7 @@ This is a working code sample that covers just the basics.
            :return: The list of selectors
            :rtype: list
            """
-            return self.selector.select(self.base_xpath + xpath)
+            return self.selector.xpath(self.base_xpath + xpath)

        def parse_dl(self, xpath=u'//dl'):
            """ Look for the specified definition list pattern and store all found
@ -120,7 +120,7 @@ This is a working code sample that covers just the basics.
            for term in self._get_entities(xpath + '/dt'):
                label = self._get_label(term)
                if label and label not in self.ignore:
-                    value = term.select('following-sibling::dd[1]//text()')
+                    value = term.xpath('following-sibling::dd[1]//text()')
                    if value:
                        self.add_value(label, value.extract(),
                            MapCompose(lambda v: v.strip()))
--- a/setup.py
+++ b/setup.py
@ -122,6 +122,6 @@ try:
 except ImportError:
    from distutils.core import setup
 else:
-    setup_args['install_requires'] = ['Twisted>=10.0.0', 'w3lib>=1.2', 'queuelib', 'lxml', 'pyOpenSSL']
+    setup_args['install_requires'] = ['Twisted>=10.0.0', 'w3lib>=1.2', 'queuelib', 'lxml', 'pyOpenSSL', 'cssselect>0.8']

 setup(**setup_args)