Merge remote-tracking branch 'origin/master' into 1550-shell_file-cont

2025-02-24 19:03:54 +00:00 · 2016-01-28 14:02:48 +01:00 · 2016-01-28 14:02:48 +01:00 · c6f374f2eb
commit c6f374f2eb
parent 481e251775 4c9a6ef3ce
30 changed files with 443 additions and 145 deletions
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@ -0,0 +1,50 @@
+# Contributor Code of Conduct
+
+As contributors and maintainers of this project, and in the interest of
+fostering an open and welcoming community, we pledge to respect all people who
+contribute through reporting issues, posting feature requests, updating
+documentation, submitting pull requests or patches, and other activities.
+
+We are committed to making participation in this project a harassment-free
+experience for everyone, regardless of level of experience, gender, gender
+identity and expression, sexual orientation, disability, personal appearance,
+body size, race, ethnicity, age, religion, or nationality.
+
+Examples of unacceptable behavior by participants include:
+
+* The use of sexualized language or imagery
+* Personal attacks
+* Trolling or insulting/derogatory comments
+* Public or private harassment
+* Publishing other's private information, such as physical or electronic
+  addresses, without explicit permission
+* Other unethical or unprofessional conduct
+
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+
+By adopting this Code of Conduct, project maintainers commit themselves to
+fairly and consistently applying these principles to every aspect of managing
+this project. Project maintainers who do not follow or enforce the Code of
+Conduct may be permanently removed from the project team.
+
+This Code of Conduct applies both within project spaces and in public spaces
+when an individual is representing the project or its community.
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting a project maintainer at opensource@scrapinghub.com. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. Maintainers are
+obligated to maintain confidentiality with regard to the reporter of an
+incident.
+
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 1.3.0, available at
+[http://contributor-covenant.org/version/1/3/0/][version]
+
+[homepage]: http://contributor-covenant.org
+[version]: http://contributor-covenant.org/version/1/3/0/
--- a/README.rst
+++ b/README.rst
@ -73,6 +73,12 @@ See http://scrapy.org/community/
 Contributing
 ============

+Please note that this project is released with a Contributor Code of Conduct
+(see https://github.com/scrapy/scrapy/blob/master/CODE_OF_CONDUCT.md).
+
+By participating in this project you agree to abide by its terms.
+Please report unacceptable behavior to opensource@scrapinghub.com.
+
 See http://doc.scrapy.org/en/master/contributing.html

 Companies using Scrapy
--- a/docs/faq.rst
+++ b/docs/faq.rst
@ -45,7 +45,7 @@ Did Scrapy "steal" X from Django?

 Probably, but we don't like that word. We think Django_ is a great open source
 project and an example to follow, so we've used it as an inspiration for
-Scrapy. 
+Scrapy.

 We believe that, if something is already done well, there's no need to reinvent
 it. This concept, besides being one of the foundations for open source and free
@ -85,6 +85,8 @@ How can I simulate a user login in my spider?

 See :ref:`topics-request-response-ref-request-userlogin`.

+.. _faq-bfo-dfo:
+
 Does Scrapy crawl in breadth-first or depth-first order?
 --------------------------------------------------------

--- a/docs/topics/request-response.rst
+++ b/docs/topics/request-response.rst
@ -445,10 +445,10 @@ Response objects

    .. attribute:: Response.body

-        A str containing the body of this Response. Keep in mind that Response.body
-        is always a str. If you want the unicode version use
-        :meth:`TextResponse.body_as_unicode` (only available in
-        :class:`TextResponse` and subclasses).
+        The body of this Response. Keep in mind that Response.body
+        is always a bytes object. If you want the unicode version use
+        :attr:`TextResponse.text` (only available in :class:`TextResponse`
+        and subclasses).

        This attribute is read-only. To change the body of a Response use
        :meth:`replace`.
@ -542,6 +542,21 @@ TextResponse objects
    :class:`TextResponse` objects support the following attributes in addition
    to the standard :class:`Response` ones:

+    .. attribute:: TextResponse.text
+
+       Response body, as unicode.
+
+       The same as ``response.body.decode(response.encoding)``, but the
+       result is cached after the first call, so you can access
+       ``response.text`` multiple times without extra overhead.
+
+       .. note::
+
+            ``unicode(response.body)`` is not a correct way to convert response
+            body to unicode: you would be using the system default encoding
+            (typically `ascii`) instead of the response encoding.
+
+
    .. attribute:: TextResponse.encoding

       A string with the encoding of this response. The encoding is resolved by
@ -568,20 +583,6 @@ TextResponse objects
    :class:`TextResponse` objects support the following methods in addition to
    the standard :class:`Response` ones:

-    .. method:: TextResponse.body_as_unicode()
-
-        Returns the body of the response as unicode. This is equivalent to::
-
-            response.body.decode(response.encoding)
-
-        But **not** equivalent to::
-
-            unicode(response.body)
-
-        Since, in the latter case, you would be using the system default encoding
-        (typically `ascii`) to convert the body to unicode, instead of the response
-        encoding.
-
    .. method:: TextResponse.xpath(query)

        A shortcut to ``TextResponse.selector.xpath(query)``::
@ -594,6 +595,11 @@ TextResponse objects

            response.css('p')

+    .. method:: TextResponse.body_as_unicode()
+
+        The same as :attr:`text`, but available as a method. This method is
+        kept for backwards compatibility; please prefer ``response.text``.
+

 HtmlResponse objects
 --------------------
--- a/docs/topics/settings.rst
+++ b/docs/topics/settings.rst
@ -276,6 +276,8 @@ DEPTH_LIMIT

 Default: ``0``

+Scope: ``scrapy.spidermiddlewares.depth.DepthMiddleware``
+
 The maximum depth that will be allowed to crawl for any site. If zero, no limit
 will be imposed.

@ -286,9 +288,24 @@ DEPTH_PRIORITY

 Default: ``0``

-An integer that is used to adjust the request priority based on its depth.
+Scope: ``scrapy.spidermiddlewares.depth.DepthMiddleware``

-If zero, no priority adjustment is made from depth.
+An integer that is used to adjust the request priority based on its depth:
+
+- if zero (default), no priority adjustment is made from depth
+- **a positive value will decrease the priority, i.e. higher depth
+  requests will be processed later** ; this is commonly used when doing
+  breadth-first crawls (BFO)
+- a negative value will increase priority, i.e., higher depth requests
+  will be processed sooner (DFO)
+
+See also: :ref:`faq-bfo-dfo` about tuning Scrapy for BFO or DFO.
+
+.. note::
+
+    This setting adjusts priority **in the opposite way** compared to
+    other priority settings :setting:`REDIRECT_PRIORITY_ADJUST`
+    and :setting:`RETRY_PRIORITY_ADJUST`.

 .. setting:: DEPTH_STATS

@ -297,6 +314,8 @@ DEPTH_STATS

 Default: ``True``

+Scope: ``scrapy.spidermiddlewares.depth.DepthMiddleware``
+
 Whether to collect maximum depth stats.

 .. setting:: DEPTH_STATS_VERBOSE
@ -306,6 +325,8 @@ DEPTH_STATS_VERBOSE

 Default: ``False``

+Scope: ``scrapy.spidermiddlewares.depth.DepthMiddleware``
+
 Whether to collect verbose depth stats. If this is enabled, the number of
 requests for each depth is collected in the stats.

@ -750,8 +771,8 @@ Default: ``60.0``
 Scope: ``scrapy.extensions.memusage``

 The :ref:`Memory usage extension <topics-extensions-ref-memusage>`
-checks the current memory usage, versus the limits set by 
-:setting:`MEMUSAGE_LIMIT_MB` and :setting:`MEMUSAGE_WARNING_MB`, 
+checks the current memory usage, versus the limits set by
+:setting:`MEMUSAGE_LIMIT_MB` and :setting:`MEMUSAGE_WARNING_MB`,
 at fixed time intervals.

 This sets the length of these intervals, in seconds.
@ -864,8 +885,26 @@ REDIRECT_PRIORITY_ADJUST

 Default: ``+2``

-Adjust redirect request priority relative to original request.
-A negative priority adjust means more priority.
+Scope: ``scrapy.downloadermiddlewares.redirect.RedirectMiddleware``
+
+Adjust redirect request priority relative to original request:
+
+- **a positive priority adjust (default) means higher priority.**
+- a negative priority adjust means lower priority.
+
+.. setting:: RETRY_PRIORITY_ADJUST
+
+RETRY_PRIORITY_ADJUST
+---------------------
+
+Default: ``-1``
+
+Scope: ``scrapy.downloadermiddlewares.retry.RetryMiddleware``
+
+Adjust retry request priority relative to original request:
+
+- a positive priority adjust means higher priority.
+- **a negative priority adjust (default) means lower priority.**

 .. setting:: ROBOTSTXT_OBEY

@ -877,7 +916,13 @@ Default: ``False``
 Scope: ``scrapy.downloadermiddlewares.robotstxt``

 If enabled, Scrapy will respect robots.txt policies. For more information see
-:ref:`topics-dlmw-robots`
+:ref:`topics-dlmw-robots`.
+
+.. note::
+
+    While the default value is ``False`` for historical reasons,
+    this option is enabled by default in settings.py file generated
+    by ``scrapy startproject`` command.

 .. setting:: SCHEDULER

@ -1036,7 +1081,7 @@ TEMPLATES_DIR
 Default: ``templates`` dir inside scrapy module

 The directory where to look for templates when creating new projects with
-:command:`startproject` command and new spiders with :command:`genspider` 
+:command:`startproject` command and new spiders with :command:`genspider`
 command.

 The project name must not conflict with the name of custom files or directories
--- a/docs/topics/spider-middleware.rst
+++ b/docs/topics/spider-middleware.rst
@ -273,6 +273,9 @@ OffsiteMiddleware

   This middleware filters out every request whose host names aren't in the
   spider's :attr:`~scrapy.spiders.Spider.allowed_domains` attribute.
+   All subdomains of any domain in the list are also allowed.
+   E.g. the rule ``www.example.org`` will also allow ``bob.www.example.org``
+   but not ``www2.example.com`` nor ``example.com``.

   When your spider returns a request for a domain not belonging to those
   covered by the spider, this middleware will log a debug message similar to
--- a/docs/topics/spiders.rst
+++ b/docs/topics/spiders.rst
@ -76,7 +76,7 @@ scrapy.Spider

       An optional list of strings containing domains that this spider is
       allowed to crawl. Requests for URLs not belonging to the domain names
-       specified in this list won't be followed if
+       specified in this list (or their subdomains) won't be followed if
       :class:`~scrapy.spidermiddlewares.offsite.OffsiteMiddleware` is enabled.

   .. attribute:: start_urls
--- a/scrapy/downloadermiddlewares/ajaxcrawl.py
+++ b/scrapy/downloadermiddlewares/ajaxcrawl.py
@ -63,7 +63,7 @@ class AjaxCrawlMiddleware(object):
        Return True if a page without hash fragment could be "AJAX crawlable"
        according to https://developers.google.com/webmasters/ajax-crawling/docs/getting-started.
        """
-        body = response.body_as_unicode()[:self.lookup_bytes]
+        body = response.text[:self.lookup_bytes]
        return _has_ajaxcrawlable_meta(body)


--- a/scrapy/downloadermiddlewares/robotstxt.py
+++ b/scrapy/downloadermiddlewares/robotstxt.py
@ -83,8 +83,8 @@ class RobotsTxtMiddleware(object):
    def _parse_robots(self, response, netloc):
        rp = robotparser.RobotFileParser(response.url)
        body = ''
-        if hasattr(response, 'body_as_unicode'):
-            body = response.body_as_unicode()
+        if hasattr(response, 'text'):
+            body = response.text
        else: # last effort try
            try:
                body = response.body.decode('utf-8')
--- a/scrapy/exporters.py
+++ b/scrapy/exporters.py
@ -3,6 +3,7 @@ Item Exporters are used to export/serialize items into different formats.
 """

 import csv
+import io
 import sys
 import pprint
 import marshal
@ -11,7 +12,11 @@ from six.moves import cPickle as pickle
 from xml.sax.saxutils import XMLGenerator

 from scrapy.utils.serialize import ScrapyJSONEncoder
+from scrapy.utils.python import to_bytes, to_unicode, to_native_str, is_listlike
 from scrapy.item import BaseItem
+from scrapy.exceptions import ScrapyDeprecationWarning
+import warnings
+

 __all__ = ['BaseItemExporter', 'PprintItemExporter', 'PickleItemExporter',
           'CsvItemExporter', 'XmlItemExporter', 'JsonLinesItemExporter',
@ -38,7 +43,7 @@ class BaseItemExporter(object):
        raise NotImplementedError

    def serialize_field(self, field, name, value):
-        serializer = field.get('serializer', self._to_str_if_unicode)
+        serializer = field.get('serializer', lambda x: x)
        return serializer(value)

    def start_exporting(self):
@ -47,9 +52,6 @@ class BaseItemExporter(object):
    def finish_exporting(self):
        pass

-    def _to_str_if_unicode(self, value):
-        return value.encode(self.encoding) if isinstance(value, unicode) else value
-
    def _get_serialized_fields(self, item, default_value=None, include_empty=None):
        """Return the fields to export as an iterable of tuples
        (name, serialized_value)
@ -86,10 +88,10 @@ class JsonLinesItemExporter(BaseItemExporter):

    def export_item(self, item):
        itemdict = dict(self._get_serialized_fields(item))
-        self.file.write(self.encoder.encode(itemdict) + '\n')
+        self.file.write(to_bytes(self.encoder.encode(itemdict) + '\n'))


-class JsonItemExporter(JsonLinesItemExporter):
+class JsonItemExporter(BaseItemExporter):

    def __init__(self, file, **kwargs):
        self._configure(kwargs, dont_fail=True)
@ -98,18 +100,18 @@ class JsonItemExporter(JsonLinesItemExporter):
        self.first_item = True

    def start_exporting(self):
-        self.file.write("[")
+        self.file.write(b"[")

    def finish_exporting(self):
-        self.file.write("]")
+        self.file.write(b"]")

    def export_item(self, item):
        if self.first_item:
            self.first_item = False
        else:
-            self.file.write(',\n')
+            self.file.write(b',\n')
        itemdict = dict(self._get_serialized_fields(item))
-        self.file.write(self.encoder.encode(itemdict))
+        self.file.write(to_bytes(self.encoder.encode(itemdict)))


 class XmlItemExporter(BaseItemExporter):
@ -139,7 +141,7 @@ class XmlItemExporter(BaseItemExporter):
        if hasattr(serialized_value, 'items'):
            for subname, value in serialized_value.items():
                self._export_xml_field(subname, value)
-        elif hasattr(serialized_value, '__iter__'):
+        elif is_listlike(serialized_value):
            for value in serialized_value:
                self._export_xml_field('value', value)
        else:
@ -153,10 +155,10 @@ class XmlItemExporter(BaseItemExporter):
    # and Python 3.x will require unicode, so ">= 2.7.4" should be fine.
    if sys.version_info[:3] >= (2, 7, 4):
        def _xg_characters(self, serialized_value):
-            if not isinstance(serialized_value, unicode):
+            if not isinstance(serialized_value, six.text_type):
                serialized_value = serialized_value.decode(self.encoding)
            return self.xg.characters(serialized_value)
-    else:
+    else:  # pragma: no cover
        def _xg_characters(self, serialized_value):
            return self.xg.characters(serialized_value)

@ -166,17 +168,22 @@ class CsvItemExporter(BaseItemExporter):
    def __init__(self, file, include_headers_line=True, join_multivalued=',', **kwargs):
        self._configure(kwargs, dont_fail=True)
        self.include_headers_line = include_headers_line
+        file = file if six.PY2 else io.TextIOWrapper(file, line_buffering=True)
        self.csv_writer = csv.writer(file, **kwargs)
        self._headers_not_written = True
        self._join_multivalued = join_multivalued

-    def _to_str_if_unicode(self, value):
+    def serialize_field(self, field, name, value):
+        serializer = field.get('serializer', self._join_if_needed)
+        return serializer(value)
+
+    def _join_if_needed(self, value):
        if isinstance(value, (list, tuple)):
            try:
-                value = self._join_multivalued.join(value)
+                return self._join_multivalued.join(value)
            except TypeError:  # list in value may not contain strings
                pass
-        return super(CsvItemExporter, self)._to_str_if_unicode(value)
+        return value

    def export_item(self, item):
        if self._headers_not_written:
@ -185,9 +192,16 @@ class CsvItemExporter(BaseItemExporter):

        fields = self._get_serialized_fields(item, default_value='',
                                             include_empty=True)
-        values = [x[1] for x in fields]
+        values = list(self._build_row(x for _, x in fields))
        self.csv_writer.writerow(values)

+    def _build_row(self, values):
+        for s in values:
+            try:
+                yield to_native_str(s)
+            except TypeError:
+                yield s
+
    def _write_headers_and_set_fields_to_export(self, item):
        if self.include_headers_line:
            if not self.fields_to_export:
@ -197,7 +211,8 @@ class CsvItemExporter(BaseItemExporter):
                else:
                    # use fields declared in Item
                    self.fields_to_export = list(item.fields.keys())
-            self.csv_writer.writerow(self.fields_to_export)
+            row = list(self._build_row(self.fields_to_export))
+            self.csv_writer.writerow(row)


 class PickleItemExporter(BaseItemExporter):
@ -230,7 +245,7 @@ class PprintItemExporter(BaseItemExporter):

    def export_item(self, item):
        itemdict = dict(self._get_serialized_fields(item))
-        self.file.write(pprint.pformat(itemdict) + '\n')
+        self.file.write(to_bytes(pprint.pformat(itemdict) + '\n'))


 class PythonItemExporter(BaseItemExporter):
@ -239,6 +254,13 @@ class PythonItemExporter(BaseItemExporter):
    json, msgpack, binc, etc) can be used on top of it. Its main goal is to
    seamless support what BaseItemExporter does plus nested items.
    """
+    def _configure(self, options, dont_fail=False):
+        self.binary = options.pop('binary', True)
+        super(PythonItemExporter, self)._configure(options, dont_fail)
+        if self.binary:
+            warnings.warn(
+                "PythonItemExporter will drop support for binary export in the future",
+                ScrapyDeprecationWarning)

    def serialize_field(self, field, name, value):
        serializer = field.get('serializer', self._serialize_value)
@ -249,13 +271,20 @@ class PythonItemExporter(BaseItemExporter):
            return self.export_item(value)
        if isinstance(value, dict):
            return dict(self._serialize_dict(value))
-        if hasattr(value, '__iter__'):
+        if is_listlike(value):
            return [self._serialize_value(v) for v in value]
-        return self._to_str_if_unicode(value)
+        encode_func = to_bytes if self.binary else to_unicode
+        if isinstance(value, (six.text_type, bytes)):
+            return encode_func(value, encoding=self.encoding)
+        return value

    def _serialize_dict(self, value):
        for key, val in six.iteritems(value):
+            key = to_bytes(key) if self.binary else key
            yield key, self._serialize_value(val)

    def export_item(self, item):
-        return dict(self._get_serialized_fields(item))
+        result = dict(self._get_serialized_fields(item))
+        if self.binary:
+            result = dict(self._serialize_dict(result))
+        return result
--- a/scrapy/extensions/closespider.py
+++ b/scrapy/extensions/closespider.py
@ -9,6 +9,7 @@ from collections import defaultdict
 from twisted.internet import reactor

 from scrapy import signals
+from scrapy.exceptions import NotConfigured


 class CloseSpider(object):
@ -23,6 +24,9 @@ class CloseSpider(object):
            'errorcount': crawler.settings.getint('CLOSESPIDER_ERRORCOUNT'),
            }

+        if not any(self.close_on.values()):
+            raise NotConfigured
+
        self.counter = defaultdict(int)

        if self.close_on.get('errorcount'):
--- a/scrapy/extensions/spiderstate.py
+++ b/scrapy/extensions/spiderstate.py
@ -2,6 +2,7 @@ import os
 from six.moves import cPickle as pickle

 from scrapy import signals
+from scrapy.exceptions import NotConfigured
 from scrapy.utils.job import job_dir

 class SpiderState(object):
@ -12,7 +13,11 @@ class SpiderState(object):

    @classmethod
    def from_crawler(cls, crawler):
-        obj = cls(job_dir(crawler.settings))
+        jobdir = job_dir(crawler.settings)
+        if not jobdir:
+            raise NotConfigured
+
+        obj = cls(jobdir)
        crawler.signals.connect(obj.spider_closed, signal=signals.spider_closed)
        crawler.signals.connect(obj.spider_opened, signal=signals.spider_opened)
        return obj
--- a/scrapy/http/request/form.py
+++ b/scrapy/http/request/form.py
@ -64,8 +64,8 @@ def _urlencode(seq, enc):

 def _get_form(response, formname, formid, formnumber, formxpath):
    """Find the form element """
-    text = response.body_as_unicode()
-    root = create_root_node(text, lxml.html.HTMLParser, base_url=get_base_url(response))
+    root = create_root_node(response.text, lxml.html.HTMLParser,
+                            base_url=get_base_url(response))
    forms = root.xpath('//form')
    if not forms:
        raise ValueError("No <form> element found in %s" % response)
--- a/scrapy/http/response/text.py
+++ b/scrapy/http/response/text.py
@ -59,7 +59,12 @@ class TextResponse(Response):

    def body_as_unicode(self):
        """Return body as unicode"""
-        # check for self.encoding before _cached_ubody just in
+        return self.text
+
+    @property
+    def text(self):
+        """ Body as unicode """
+        # access self.encoding before _cached_ubody to make sure
        # _body_inferred_encoding is called
        benc = self.encoding
        if self._cached_ubody is None:
--- a/scrapy/middleware.py
+++ b/scrapy/middleware.py
@ -28,6 +28,7 @@ class MiddlewareManager(object):
    def from_settings(cls, settings, crawler=None):
        mwlist = cls._get_mwlist_from_settings(settings)
        middlewares = []
+        enabled = []
        for clspath in mwlist:
            try:
                mwcls = load_object(clspath)
@ -38,15 +39,17 @@ class MiddlewareManager(object):
                else:
                    mw = mwcls()
                middlewares.append(mw)
+                enabled.append(clspath)
            except NotConfigured as e:
                if e.args:
                    clsname = clspath.split('.')[-1]
                    logger.warning("Disabled %(clsname)s: %(eargs)s",
                                   {'clsname': clsname, 'eargs': e.args[0]},
                                   extra={'crawler': crawler})
+
        logger.info("Enabled %(componentname)ss:\n%(enabledlist)s",
                    {'componentname': cls.component_name,
-                     'enabledlist': pprint.pformat(mwlist)},
+                     'enabledlist': pprint.pformat(enabled)},
                    extra={'crawler': crawler})
        return cls(*middlewares)

--- a/scrapy/selector/unified.py
+++ b/scrapy/selector/unified.py
@ -60,7 +60,7 @@ class Selector(_ParselSelector, object_ref):
            response = _response_from_text(text, st)

        if response is not None:
-            text = response.body_as_unicode()
+            text = response.text
            kwargs.setdefault('base_url', response.url)

        self.response = response
--- a/scrapy/templates/project/module/settings.py.tmpl
+++ b/scrapy/templates/project/module/settings.py.tmpl
@ -18,6 +18,9 @@ NEWSPIDER_MODULE = '$project_name.spiders'
 # Crawl responsibly by identifying yourself (and your website) on the user-agent
 #USER_AGENT = '$project_name (+http://www.yourdomain.com)'

+# Obey robots.txt rules
+ROBOTSTXT_OBEY = True
+
 # Configure maximum concurrent requests performed by Scrapy (default: 16)
 #CONCURRENT_REQUESTS = 32

--- a/scrapy/utils/datatypes.py
+++ b/scrapy/utils/datatypes.py
@ -7,11 +7,22 @@ This module must not depend on any module outside the Standard Library.

 import copy
 import six
+import warnings
 from collections import OrderedDict

+from scrapy.exceptions import ScrapyDeprecationWarning
+

 class MultiValueDictKeyError(KeyError):
-    pass
+    def __init__(self, *args, **kwargs):
+        warnings.warn(
+            "scrapy.utils.datatypes.MultiValueDictKeyError is deprecated "
+            "and will be removed in future releases.",
+            category=ScrapyDeprecationWarning,
+            stacklevel=2
+        )
+        super(MultiValueDictKeyError, self).__init__(*args, **kwargs)
+

 class MultiValueDict(dict):
    """
@ -31,6 +42,10 @@ class MultiValueDict(dict):
    single name-value pairs.
    """
    def __init__(self, key_to_list_mapping=()):
+        warnings.warn("scrapy.utils.datatypes.MultiValueDict is deprecated "
+                      "and will be removed in future releases.",
+                      category=ScrapyDeprecationWarning,
+                      stacklevel=2)
        dict.__init__(self, key_to_list_mapping)

    def __repr__(self):
@ -137,10 +152,18 @@ class MultiValueDict(dict):
        for key, value in six.iteritems(kwargs):
            self.setlistdefault(key, []).append(value)

+
 class SiteNode(object):
    """Class to represent a site node (page, image or any other file)"""

    def __init__(self, url):
+        warnings.warn(
+            "scrapy.utils.datatypes.SiteNode is deprecated "
+            "and will be removed in future releases.",
+            category=ScrapyDeprecationWarning,
+            stacklevel=2
+        )
+
        self.url = url
        self.itemnames = []
        self.children = []
--- a/scrapy/utils/iterators.py
+++ b/scrapy/utils/iterators.py
@ -137,7 +137,7 @@ def _body_or_str(obj, unicode=True):
        if not unicode:
            return obj.body
        elif isinstance(obj, TextResponse):
-            return obj.body_as_unicode()
+            return obj.text
        else:
            return obj.body.decode('utf-8')
    elif isinstance(obj, six.text_type):
--- a/scrapy/utils/response.py
+++ b/scrapy/utils/response.py
@ -25,7 +25,7 @@ _baseurl_cache = weakref.WeakKeyDictionary()
 def get_base_url(response):
    """Return the base url of the given response, joined with the response url"""
    if response not in _baseurl_cache:
-        text = response.body_as_unicode()[0:4096]
+        text = response.text[0:4096]
        _baseurl_cache[response] = html.get_base_url(text, response.url,
            response.encoding)
    return _baseurl_cache[response]
@ -37,7 +37,7 @@ _metaref_cache = weakref.WeakKeyDictionary()
 def get_meta_refresh(response):
    """Parse the http-equiv refrsh parameter from the given response"""
    if response not in _metaref_cache:
-        text = response.body_as_unicode()[0:4096]
+        text = response.text[0:4096]
        text = _noscript_re.sub(u'', text)
        text = _script_re.sub(u'', text)
        _metaref_cache[response] = html.get_meta_refresh(text, response.url,
--- a/tests/py3-ignores.txt
+++ b/tests/py3-ignores.txt
@ -1,10 +1,5 @@
-tests/test_exporters.py
 tests/test_linkextractors_deprecated.py
-tests/test_mail.py
-tests/test_pipeline_files.py
-tests/test_pipeline_images.py
 tests/test_proxy_connect.py
-tests/test_spidermiddleware_httperror.py

 scrapy/xlib/tx/iweb.py
 scrapy/xlib/tx/interfaces.py
@ -14,12 +9,9 @@ scrapy/xlib/tx/_newclient.py
 scrapy/xlib/tx/__init__.py
 scrapy/core/downloader/handlers/s3.py
 scrapy/core/downloader/handlers/ftp.py
-scrapy/pipelines/images.py
-scrapy/pipelines/files.py
 scrapy/linkextractors/sgml.py
 scrapy/linkextractors/regex.py
 scrapy/linkextractors/htmlparser.py
 scrapy/downloadermiddlewares/cookies.py
 scrapy/extensions/statsmailer.py
 scrapy/extensions/memusage.py
-scrapy/mail.py
--- a/tests/requirements-py3.txt
+++ b/tests/requirements-py3.txt
@ -4,6 +4,7 @@ pytest-cov
 testfixtures
 jmespath
 leveldb
+boto
 # optional for shell wrapper tests
 bpython
 ipython
--- a/tests/test_downloader_handlers.py
+++ b/tests/test_downloader_handlers.py
@ -437,6 +437,8 @@ class S3AnonTestCase(unittest.TestCase):
        import boto
    except ImportError:
        skip = 'missing boto library'
+    if six.PY3:
+        skip = 'S3 not supported on Py3'

    def setUp(self):
        self.s3reqh = S3DownloadHandler(Settings(),
@ -459,6 +461,8 @@ class S3TestCase(unittest.TestCase):
        import boto
    except ImportError:
        skip = 'missing boto library'
+    if six.PY3:
+        skip = 'S3 not supported on Py3'

    # test use same example keys than amazon developer guide
    # http://s3.amazonaws.com/awsdocs/S3/20060301/s3-dg-20060301.pdf
--- a/tests/test_engine.py
+++ b/tests/test_engine.py
@ -55,12 +55,11 @@ class TestSpider(Spider):

    def parse_item(self, response):
        item = self.item_cls()
-        body = response.body_as_unicode()
-        m = self.name_re.search(body)
+        m = self.name_re.search(response.text)
        if m:
            item['name'] = m.group(1)
        item['url'] = response.url
-        m = self.price_re.search(body)
+        m = self.price_re.search(response.text)
        if m:
            item['price'] = m.group(1)
        return item
--- a/tests/test_exporters.py
+++ b/tests/test_exporters.py
@ -1,17 +1,21 @@
 from __future__ import absolute_import
 import re
 import json
+import marshal
+import tempfile
 import unittest
 from io import BytesIO
 from six.moves import cPickle as pickle

 import lxml.etree
+import six

 from scrapy.item import Item, Field
 from scrapy.utils.python import to_unicode
 from scrapy.exporters import (
    BaseItemExporter, PprintItemExporter, PickleItemExporter, CsvItemExporter,
-    XmlItemExporter, JsonLinesItemExporter, JsonItemExporter, PythonItemExporter
+    XmlItemExporter, JsonLinesItemExporter, JsonItemExporter,
+    PythonItemExporter, MarshalItemExporter
 )


@ -23,7 +27,7 @@ class TestItem(Item):
 class BaseItemExporterTest(unittest.TestCase):

    def setUp(self):
-        self.i = TestItem(name=u'John\xa3', age='22')
+        self.i = TestItem(name=u'John\xa3', age=u'22')
        self.output = BytesIO()
        self.ie = self._get_exporter()

@ -56,19 +60,19 @@ class BaseItemExporterTest(unittest.TestCase):

    def test_serialize_field(self):
        res = self.ie.serialize_field(self.i.fields['name'], 'name', self.i['name'])
-        self.assertEqual(res, 'John\xc2\xa3')
+        self.assertEqual(res, u'John\xa3')

        res = self.ie.serialize_field(self.i.fields['age'], 'age', self.i['age'])
-        self.assertEqual(res, '22')
+        self.assertEqual(res, u'22')

    def test_fields_to_export(self):
        ie = self._get_exporter(fields_to_export=['name'])
-        self.assertEqual(list(ie._get_serialized_fields(self.i)), [('name', 'John\xc2\xa3')])
+        self.assertEqual(list(ie._get_serialized_fields(self.i)), [('name', u'John\xa3')])

        ie = self._get_exporter(fields_to_export=['name'], encoding='latin-1')
-        name = list(ie._get_serialized_fields(self.i))[0][1]
-        assert isinstance(name, str)
-        self.assertEqual(name, 'John\xa3')
+        _, name = list(ie._get_serialized_fields(self.i))[0]
+        assert isinstance(name, six.text_type)
+        self.assertEqual(name, u'John\xa3')

    def test_field_custom_serializer(self):
        def custom_serializer(value):
@ -78,16 +82,20 @@ class BaseItemExporterTest(unittest.TestCase):
            name = Field()
            age = Field(serializer=custom_serializer)

-        i = CustomFieldItem(name=u'John\xa3', age='22')
+        i = CustomFieldItem(name=u'John\xa3', age=u'22')

        ie = self._get_exporter()
-        self.assertEqual(ie.serialize_field(i.fields['name'], 'name', i['name']), 'John\xc2\xa3')
+        self.assertEqual(ie.serialize_field(i.fields['name'], 'name', i['name']), u'John\xa3')
        self.assertEqual(ie.serialize_field(i.fields['age'], 'age', i['age']), '24')


 class PythonItemExporterTest(BaseItemExporterTest):
    def _get_exporter(self, **kwargs):
-        return PythonItemExporter(**kwargs)
+        return PythonItemExporter(binary=False, **kwargs)
+
+    def test_invalid_option(self):
+        with self.assertRaisesRegexp(TypeError, "Unexpected options: invalid_option"):
+            PythonItemExporter(invalid_option='something')

    def test_nested_item(self):
        i1 = TestItem(name=u'Joseph', age='22')
@ -120,6 +128,25 @@ class PythonItemExporterTest(BaseItemExporterTest):
        self.assertEqual(type(exported['age'][0]), dict)
        self.assertEqual(type(exported['age'][0]['age'][0]), dict)

+    def test_export_binary(self):
+        exporter = PythonItemExporter(binary=True)
+        value = TestItem(name=u'John\xa3', age=u'22')
+        expected = {b'name': b'John\xc2\xa3', b'age': b'22'}
+        self.assertEqual(expected, exporter.export_item(value))
+
+    def test_other_python_types_item(self):
+        from datetime import datetime
+        now = datetime.now()
+        item = {
+            'boolean': False,
+            'number': 22,
+            'time': now,
+            'float': 3.14,
+        }
+        ie = self._get_exporter()
+        exported = ie.export_item(item)
+        self.assertEqual(exported, item)
+

 class PprintItemExporterTest(BaseItemExporterTest):

@ -152,18 +179,30 @@ class PickleItemExporterTest(BaseItemExporterTest):
        self.assertEqual(pickle.load(f), i2)


-class CsvItemExporterTest(BaseItemExporterTest):
+class MarshalItemExporterTest(BaseItemExporterTest):

+    def _get_exporter(self, **kwargs):
+        self.output = tempfile.TemporaryFile()
+        return MarshalItemExporter(self.output, **kwargs)
+
+    def _check_output(self):
+        self.output.seek(0)
+        self._assert_expected_item(marshal.load(self.output))
+
+
+class CsvItemExporterTest(BaseItemExporterTest):
    def _get_exporter(self, **kwargs):
        return CsvItemExporter(self.output, **kwargs)

    def assertCsvEqual(self, first, second, msg=None):
+        first = to_unicode(first)
+        second = to_unicode(second)
        csvsplit = lambda csv: [sorted(re.split(r'(,|\s+)', line))
                                for line in csv.splitlines(True)]
        return self.assertEqual(csvsplit(first), csvsplit(second), msg)

    def _check_output(self):
-        self.assertCsvEqual(self.output.getvalue(), 'age,name\r\n22,John\xc2\xa3\r\n')
+        self.assertCsvEqual(to_unicode(self.output.getvalue()), u'age,name\r\n22,John\xa3\r\n')

    def assertExportResult(self, item, expected, **kwargs):
        fp = BytesIO()
@ -177,13 +216,13 @@ class CsvItemExporterTest(BaseItemExporterTest):
        self.assertExportResult(
            item=self.i,
            fields_to_export=self.i.fields.keys(),
-            expected='age,name\r\n22,John\xc2\xa3\r\n',
+            expected=b'age,name\r\n22,John\xc2\xa3\r\n',
        )

    def test_header_export_all_dict(self):
        self.assertExportResult(
            item=dict(self.i),
-            expected='age,name\r\n22,John\xc2\xa3\r\n',
+            expected=b'age,name\r\n22,John\xc2\xa3\r\n',
        )

    def test_header_export_single_field(self):
@ -191,7 +230,7 @@ class CsvItemExporterTest(BaseItemExporterTest):
            self.assertExportResult(
                item=item,
                fields_to_export=['age'],
-                expected='age\r\n22\r\n',
+                expected=b'age\r\n22\r\n',
            )

    def test_header_export_two_items(self):
@ -202,14 +241,15 @@ class CsvItemExporterTest(BaseItemExporterTest):
            ie.export_item(item)
            ie.export_item(item)
            ie.finish_exporting()
-            self.assertCsvEqual(output.getvalue(), 'age,name\r\n22,John\xc2\xa3\r\n22,John\xc2\xa3\r\n')
+            self.assertCsvEqual(output.getvalue(),
+                                b'age,name\r\n22,John\xc2\xa3\r\n22,John\xc2\xa3\r\n')

    def test_header_no_header_line(self):
        for item in [self.i, dict(self.i)]:
            self.assertExportResult(
                item=item,
                include_headers_line=False,
-                expected='22,John\xc2\xa3\r\n',
+                expected=b'22,John\xc2\xa3\r\n',
            )

    def test_join_multivalue(self):
@ -224,6 +264,28 @@ class CsvItemExporterTest(BaseItemExporterTest):
                expected='"Mary,Paul",John\r\n',
            )

+    def test_join_multivalue_not_strings(self):
+        self.assertExportResult(
+            item=dict(name='John', friends=[4, 8]),
+            include_headers_line=False,
+            expected='"[4, 8]",John\r\n',
+        )
+
+    def test_other_python_types_item(self):
+        from datetime import datetime
+        now = datetime(2015, 1, 1, 1, 1, 1)
+        item = {
+            'boolean': False,
+            'number': 22,
+            'time': now,
+            'float': 3.14,
+        }
+        self.assertExportResult(
+            item=item,
+            include_headers_line=False,
+            expected='22,False,3.14,2015-01-01 01:01:01\r\n'
+        )
+

 class XmlItemExporterTest(BaseItemExporterTest):

@ -252,13 +314,13 @@ class XmlItemExporterTest(BaseItemExporterTest):
        self.assertXmlEquivalent(fp.getvalue(), expected_value)

    def _check_output(self):
-        expected_value = '<?xml version="1.0" encoding="utf-8"?>\n<items><item><age>22</age><name>John\xc2\xa3</name></item></items>'
+        expected_value = b'<?xml version="1.0" encoding="utf-8"?>\n<items><item><age>22</age><name>John\xc2\xa3</name></item></items>'
        self.assertXmlEquivalent(self.output.getvalue(), expected_value)

    def test_multivalued_fields(self):
        self.assertExportResult(
            TestItem(name=[u'John\xa3', u'Doe']),
-            '<?xml version="1.0" encoding="utf-8"?>\n<items><item><name><value>John\xc2\xa3</value><value>Doe</value></name></item></items>'
+            b'<?xml version="1.0" encoding="utf-8"?>\n<items><item><name><value>John\xc2\xa3</value><value>Doe</value></name></item></items>'
        )

    def test_nested_item(self):
@ -267,19 +329,19 @@ class XmlItemExporterTest(BaseItemExporterTest):
        i3 = TestItem(name=u'buz', age=i2)

        self.assertExportResult(i3,
-            '<?xml version="1.0" encoding="utf-8"?>\n'
-            '<items>'
-                '<item>'
-                    '<age>'
-                        '<age>'
-                            '<age>22</age>'
-                            '<name>foo\xc2\xa3hoo</name>'
-                        '</age>'
-                        '<name>bar</name>'
-                    '</age>'
-                    '<name>buz</name>'
-                '</item>'
-            '</items>'
+            b'<?xml version="1.0" encoding="utf-8"?>\n'
+            b'<items>'
+                b'<item>'
+                    b'<age>'
+                        b'<age>'
+                            b'<age>22</age>'
+                            b'<name>foo\xc2\xa3hoo</name>'
+                        b'</age>'
+                        b'<name>bar</name>'
+                    b'</age>'
+                    b'<name>buz</name>'
+                b'</item>'
+            b'</items>'
        )

    def test_nested_list_item(self):
@ -288,16 +350,16 @@ class XmlItemExporterTest(BaseItemExporterTest):
        i3 = TestItem(name=u'buz', age=[i1, i2])

        self.assertExportResult(i3,
-            '<?xml version="1.0" encoding="utf-8"?>\n'
-            '<items>'
-                '<item>'
-                    '<age>'
-                        '<value><name>foo</name></value>'
-                        '<value><name>bar</name><v2><egg><value>spam</value></egg></v2></value>'
-                    '</age>'
-                    '<name>buz</name>'
-                '</item>'
-            '</items>'
+            b'<?xml version="1.0" encoding="utf-8"?>\n'
+            b'<items>'
+                b'<item>'
+                    b'<age>'
+                        b'<value><name>foo</name></value>'
+                        b'<value><name>bar</name><v2><egg><value>spam</value></egg></v2></value>'
+                    b'</age>'
+                    b'<name>buz</name>'
+                b'</item>'
+            b'</items>'
        )


@ -309,7 +371,7 @@ class JsonLinesItemExporterTest(BaseItemExporterTest):
        return JsonLinesItemExporter(self.output, **kwargs)

    def _check_output(self):
-        exported = json.loads(self.output.getvalue().strip())
+        exported = json.loads(to_unicode(self.output.getvalue().strip()))
        self.assertEqual(exported, dict(self.i))

    def test_nested_item(self):
@ -319,7 +381,7 @@ class JsonLinesItemExporterTest(BaseItemExporterTest):
        self.ie.start_exporting()
        self.ie.export_item(i3)
        self.ie.finish_exporting()
-        exported = json.loads(self.output.getvalue())
+        exported = json.loads(to_unicode(self.output.getvalue()))
        self.assertEqual(exported, self._expected_nested)

    def test_extra_keywords(self):
@ -337,7 +399,7 @@ class JsonItemExporterTest(JsonLinesItemExporterTest):
        return JsonItemExporter(self.output, **kwargs)

    def _check_output(self):
-        exported = json.loads(self.output.getvalue().strip())
+        exported = json.loads(to_unicode(self.output.getvalue().strip()))
        self.assertEqual(exported, [dict(self.i)])

    def assertTwoItemsExported(self, item):
@ -345,7 +407,7 @@ class JsonItemExporterTest(JsonLinesItemExporterTest):
        self.ie.export_item(item)
        self.ie.export_item(item)
        self.ie.finish_exporting()
-        exported = json.loads(self.output.getvalue())
+        exported = json.loads(to_unicode(self.output.getvalue()))
        self.assertEqual(exported, [dict(item), dict(item)])

    def test_two_items(self):
@ -361,7 +423,7 @@ class JsonItemExporterTest(JsonLinesItemExporterTest):
        self.ie.start_exporting()
        self.ie.export_item(i3)
        self.ie.finish_exporting()
-        exported = json.loads(self.output.getvalue())
+        exported = json.loads(to_unicode(self.output.getvalue()))
        expected = {'name': u'Jesus', 'age': {'name': 'Maria', 'age': dict(i1)}}
        self.assertEqual(exported, [expected])

@ -372,7 +434,7 @@ class JsonItemExporterTest(JsonLinesItemExporterTest):
        self.ie.start_exporting()
        self.ie.export_item(i3)
        self.ie.finish_exporting()
-        exported = json.loads(self.output.getvalue())
+        exported = json.loads(to_unicode(self.output.getvalue()))
        expected = {'name': u'Jesus', 'age': {'name': 'Maria', 'age': i1}}
        self.assertEqual(exported, [expected])

--- a/tests/test_feedexport.py
+++ b/tests/test_feedexport.py
@ -5,7 +5,6 @@ import json
 from io import BytesIO
 import tempfile
 import shutil
-import six
 from six.moves.urllib.parse import urlparse

 from zope.interface.verify import verifyObject
@ -22,6 +21,7 @@ from scrapy.extensions.feedexport import (
    S3FeedStorage, StdoutFeedStorage
 )
 from scrapy.utils.test import assert_aws_environ
+from scrapy.utils.python import to_native_str


 class FileFeedStorageTest(unittest.TestCase):
@ -120,8 +120,6 @@ class StdoutFeedStorageTest(unittest.TestCase):

 class FeedExportTest(unittest.TestCase):

-    skip = not six.PY2
-
    class MyItem(scrapy.Item):
        foo = scrapy.Field()
        egg = scrapy.Field()
@ -170,7 +168,7 @@ class FeedExportTest(unittest.TestCase):
        settings.update({'FEED_FORMAT': 'csv'})
        data = yield self.exported_data(items, settings)

-        reader = csv.DictReader(data.splitlines())
+        reader = csv.DictReader(to_native_str(data).splitlines())
        got_rows = list(reader)
        if ordered:
            self.assertEqual(reader.fieldnames, header)
@ -184,14 +182,57 @@ class FeedExportTest(unittest.TestCase):
        settings = settings or {}
        settings.update({'FEED_FORMAT': 'jl'})
        data = yield self.exported_data(items, settings)
-        parsed = [json.loads(line) for line in data.splitlines()]
+        parsed = [json.loads(to_native_str(line)) for line in data.splitlines()]
        rows = [{k: v for k, v in row.items() if v} for row in rows]
        self.assertEqual(rows, parsed)

+    @defer.inlineCallbacks
+    def assertExportedXml(self, items, rows, settings=None):
+        settings = settings or {}
+        settings.update({'FEED_FORMAT': 'xml'})
+        data = yield self.exported_data(items, settings)
+        rows = [{k: v for k, v in row.items() if v} for row in rows]
+        import lxml.etree
+        root = lxml.etree.fromstring(data)
+        got_rows = [{e.tag: e.text for e in it} for it in root.findall('item')]
+        self.assertEqual(rows, got_rows)
+
+    def _load_until_eof(self, data, load_func):
+        bytes_output = BytesIO(data)
+        result = []
+        while True:
+            try:
+                result.append(load_func(bytes_output))
+            except EOFError:
+                break
+        return result
+
+    @defer.inlineCallbacks
+    def assertExportedPickle(self, items, rows, settings=None):
+        settings = settings or {}
+        settings.update({'FEED_FORMAT': 'pickle'})
+        data = yield self.exported_data(items, settings)
+        expected = [{k: v for k, v in row.items() if v} for row in rows]
+        import pickle
+        result = self._load_until_eof(data, load_func=pickle.load)
+        self.assertEqual(expected, result)
+
+    @defer.inlineCallbacks
+    def assertExportedMarshal(self, items, rows, settings=None):
+        settings = settings or {}
+        settings.update({'FEED_FORMAT': 'marshal'})
+        data = yield self.exported_data(items, settings)
+        expected = [{k: v for k, v in row.items() if v} for row in rows]
+        import marshal
+        result = self._load_until_eof(data, load_func=marshal.load)
+        self.assertEqual(expected, result)
+
    @defer.inlineCallbacks
    def assertExported(self, items, header, rows, settings=None, ordered=True):
        yield self.assertExportedCsv(items, header, rows, settings, ordered)
        yield self.assertExportedJsonLines(items, rows, settings)
+        yield self.assertExportedXml(items, rows, settings)
+        yield self.assertExportedPickle(items, rows, settings)

    @defer.inlineCallbacks
    def test_export_items(self):
--- a/tests/test_http_response.py
+++ b/tests/test_http_response.py
@ -107,9 +107,11 @@ class BaseResponseTest(unittest.TestCase):
            body_bytes = body

        assert isinstance(response.body, bytes)
+        assert isinstance(response.text, six.text_type)
        self._assert_response_encoding(response, encoding)
        self.assertEqual(response.body, body_bytes)
        self.assertEqual(response.body_as_unicode(), body_unicode)
+        self.assertEqual(response.text, body_unicode)

    def _assert_response_encoding(self, response, encoding):
        self.assertEqual(response.encoding, resolve_encoding(encoding))
@ -171,6 +173,10 @@ class TextResponseTest(BaseResponseTest):
        self.assertTrue(isinstance(r1.body_as_unicode(), six.text_type))
        self.assertEqual(r1.body_as_unicode(), unicode_string)

+        # check response.text
+        self.assertTrue(isinstance(r1.text, six.text_type))
+        self.assertEqual(r1.text, unicode_string)
+
    def test_encoding(self):
        r1 = self.response_class("http://www.example.com", headers={"Content-type": ["text/html; charset=utf-8"]}, body=b"\xc2\xa3")
        r2 = self.response_class("http://www.example.com", encoding='utf-8', body=u"\xa3")
@ -219,12 +225,12 @@ class TextResponseTest(BaseResponseTest):
                                 headers={"Content-type": ["text/html; charset=utf-8"]},
                                 body=b"\xef\xbb\xbfWORD\xe3\xab")
        self.assertEqual(r6.encoding, 'utf-8')
-        self.assertEqual(r6.body_as_unicode(), u'WORD\ufffd\ufffd')
+        self.assertEqual(r6.text, u'WORD\ufffd\ufffd')

    def test_bom_is_removed_from_body(self):
        # Inferring encoding from body also cache decoded body as sideeffect,
        # this test tries to ensure that calling response.encoding and
-        # response.body_as_unicode() in indistint order doesn't affect final
+        # response.text in indistint order doesn't affect final
        # values for encoding and decoded body.
        url = 'http://example.com'
        body = b"\xef\xbb\xbfWORD"
@ -233,9 +239,9 @@ class TextResponseTest(BaseResponseTest):
        # Test response without content-type and BOM encoding
        response = self.response_class(url, body=body)
        self.assertEqual(response.encoding, 'utf-8')
-        self.assertEqual(response.body_as_unicode(), u'WORD')
+        self.assertEqual(response.text, u'WORD')
        response = self.response_class(url, body=body)
-        self.assertEqual(response.body_as_unicode(), u'WORD')
+        self.assertEqual(response.text, u'WORD')
        self.assertEqual(response.encoding, 'utf-8')

        # Body caching sideeffect isn't triggered when encoding is declared in
@ -243,9 +249,9 @@ class TextResponseTest(BaseResponseTest):
        # body
        response = self.response_class(url, headers=headers, body=body)
        self.assertEqual(response.encoding, 'utf-8')
-        self.assertEqual(response.body_as_unicode(), u'WORD')
+        self.assertEqual(response.text, u'WORD')
        response = self.response_class(url, headers=headers, body=body)
-        self.assertEqual(response.body_as_unicode(), u'WORD')
+        self.assertEqual(response.text, u'WORD')
        self.assertEqual(response.encoding, 'utf-8')

    def test_replace_wrong_encoding(self):
@ -253,18 +259,18 @@ class TextResponseTest(BaseResponseTest):
        r = self.response_class("http://www.example.com", encoding='utf-8', body=b'PREFIX\xe3\xabSUFFIX')
        # XXX: Policy for replacing invalid chars may suffer minor variations
        # but it should always contain the unicode replacement char (u'\ufffd')
-        assert u'\ufffd' in r.body_as_unicode(), repr(r.body_as_unicode())
-        assert u'PREFIX' in r.body_as_unicode(), repr(r.body_as_unicode())
-        assert u'SUFFIX' in r.body_as_unicode(), repr(r.body_as_unicode())
+        assert u'\ufffd' in r.text, repr(r.text)
+        assert u'PREFIX' in r.text, repr(r.text)
+        assert u'SUFFIX' in r.text, repr(r.text)

        # Do not destroy html tags due to encoding bugs
        r = self.response_class("http://example.com", encoding='utf-8', \
                body=b'\xf0<span>value</span>')
-        assert u'<span>value</span>' in r.body_as_unicode(), repr(r.body_as_unicode())
+        assert u'<span>value</span>' in r.text, repr(r.text)

        # FIXME: This test should pass once we stop using BeautifulSoup's UnicodeDammit in TextResponse
-        #r = self.response_class("http://www.example.com", body='PREFIX\xe3\xabSUFFIX')
-        #assert u'\ufffd' in r.body_as_unicode(), repr(r.body_as_unicode())
+        #r = self.response_class("http://www.example.com", body=b'PREFIX\xe3\xabSUFFIX')
+        #assert u'\ufffd' in r.text, repr(r.text)

    def test_selector(self):
        body = b"<html><head><title>Some page</title><body></body></html>"
--- a/tests/test_mail.py
+++ b/tests/test_mail.py
@ -53,8 +53,8 @@ class MailSenderTest(unittest.TestCase):
        self.assertEqual(len(payload), 2)

        text, attach = payload
-        self.assertEqual(text.get_payload(decode=True), 'body')
-        self.assertEqual(attach.get_payload(decode=True), 'content')
+        self.assertEqual(text.get_payload(decode=True), b'body')
+        self.assertEqual(attach.get_payload(decode=True), b'content')

    def _catch_mail_sent(self, **kwargs):
        self.catched_msg = dict(**kwargs)
--- a/tests/test_spidermiddleware_offsite.py
+++ b/tests/test_spidermiddleware_offsite.py
@ -16,7 +16,7 @@ class TestOffsiteMiddleware(TestCase):
        self.mw.spider_opened(self.spider)

    def _get_spiderargs(self):
-        return dict(name='foo', allowed_domains=['scrapytest.org', 'scrapy.org'])
+        return dict(name='foo', allowed_domains=['scrapytest.org', 'scrapy.org', 'scrapy.test.org'])

    def test_process_spider_output(self):
        res = Response('http://scrapytest.org')
@ -24,13 +24,16 @@ class TestOffsiteMiddleware(TestCase):
        onsite_reqs = [Request('http://scrapytest.org/1'),
                       Request('http://scrapy.org/1'),
                       Request('http://sub.scrapy.org/1'),
-                       Request('http://offsite.tld/letmepass', dont_filter=True)]
+                       Request('http://offsite.tld/letmepass', dont_filter=True),
+                       Request('http://scrapy.test.org/')]
        offsite_reqs = [Request('http://scrapy2.org'),
                       Request('http://offsite.tld/'),
                       Request('http://offsite.tld/scrapytest.org'),
                       Request('http://offsite.tld/rogue.scrapytest.org'),
                       Request('http://rogue.scrapytest.org.haha.com'),
-                       Request('http://roguescrapytest.org')]
+                       Request('http://roguescrapytest.org'),
+                       Request('http://test.org/'),
+                       Request('http://notscrapy.test.org/')]
        reqs = onsite_reqs + offsite_reqs

        out = list(self.mw.process_spider_output(res, reqs, self.spider))
--- a/tests/test_spiderstate.py
+++ b/tests/test_spiderstate.py
@ -4,6 +4,8 @@ from twisted.trial import unittest

 from scrapy.extensions.spiderstate import SpiderState
 from scrapy.spiders import Spider
+from scrapy.exceptions import NotConfigured
+from scrapy.utils.test import get_crawler


 class SpiderStateTest(unittest.TestCase):
@ -34,3 +36,7 @@ class SpiderStateTest(unittest.TestCase):
        ss.spider_opened(spider)
        self.assertEqual(spider.state, {})
        ss.spider_closed(spider)
+
+    def test_not_configured(self):
+        crawler = get_crawler(Spider)
+        self.assertRaises(NotConfigured, SpiderState.from_crawler, crawler)