reaplying black, fixing conflicts and ignoring bandit checks on test directory

2025-02-06 11:00:46 +00:00 · 2023-01-20 10:54:46 -03:00 · 2023-01-20 10:54:46 -03:00 · 8ee4817471
commit 8ee4817471
parent 23e8b553b4
49 changed files with 326 additions and 139 deletions
--- a/.bandit.yml
+++ b/.bandit.yml
@ -17,3 +17,4 @@ skips:
 - B503
 - B603
 - B605
+exclude_dirs: ['tests']
--- a/.github/workflows/checks.yml
+++ b/.github/workflows/checks.yml
@ -14,9 +14,7 @@ jobs:
        - python-version: "3.11"
          env:
            TOXENV: flake8
-        # Pylint requires installing reppy, which does not support Python 3.9
-        # https://github.com/seomoz/reppy/issues/122
-        - python-version: 3.8
+        - python-version: "3.11"
          env:
            TOXENV: pylint
        - python-version: 3.7
--- a/.github/workflows/publish.yml
+++ b/.github/workflows/publish.yml
@ -24,8 +24,8 @@ jobs:
    - name: Publish to PyPI
      if: steps.check-release-tag.outputs.release_tag == 'true'
      run: |
-        pip install --upgrade setuptools wheel twine
-        python setup.py sdist bdist_wheel
+        pip install --upgrade build twine
+        python -m build
        export TWINE_USERNAME=__token__
        export TWINE_PASSWORD=${{ secrets.PYPI_TOKEN }}
        twine upload dist/*
--- a/.github/workflows/tests-ubuntu.yml
+++ b/.github/workflows/tests-ubuntu.yml
@ -38,10 +38,7 @@ jobs:
          env:
            TOXENV: pypy3-pinned

-        # extras
-        # extra-deps includes reppy, which does not support Python 3.9
-        # https://github.com/seomoz/reppy/issues/122
-        - python-version: 3.8
+        - python-version: "3.11"
          env:
            TOXENV: extra-deps

--- a/docs/news.rst
+++ b/docs/news.rst
@ -4700,7 +4700,7 @@ Scrapy 0.22.1 (released 2014-02-08)
 - BaseSgmlLinkExtractor: Added unit test of a link with an inner tag (:commit:`c1cb418`)
 - BaseSgmlLinkExtractor: Fixed unknown_endtag() so that it only set current_link=None when the end tag match the opening tag (:commit:`7e4d627`)
 - Fix tests for Travis-CI build (:commit:`76c7e20`)
- replace unencodable codepoints with html entities. fixes #562 and #285 (:commit:`5f87b17`)
+- replace unencodeable codepoints with html entities. fixes #562 and #285 (:commit:`5f87b17`)
 - RegexLinkExtractor: encode URL unicode value when creating Links (:commit:`d0ee545`)
 - Updated the tutorial crawl output with latest output. (:commit:`8da65de`)
 - Updated shell docs with the crawler reference and fixed the actual shell output. (:commit:`875b9ab`)
@ -4725,7 +4725,7 @@ Enhancements
 - [**Backward incompatible**] Switched HTTPCacheMiddleware backend to filesystem (:issue:`541`)
  To restore old backend set ``HTTPCACHE_STORAGE`` to ``scrapy.contrib.httpcache.DbmCacheStorage``
 - Proxy \https:// urls using CONNECT method (:issue:`392`, :issue:`397`)
- Add a middleware to crawl ajax crawleable pages as defined by google (:issue:`343`)
+- Add a middleware to crawl ajax crawlable pages as defined by google (:issue:`343`)
 - Rename scrapy.spider.BaseSpider to scrapy.spider.Spider (:issue:`510`, :issue:`519`)
 - Selectors register EXSLT namespaces by default (:issue:`472`)
 - Unify item loaders similar to selectors renaming (:issue:`461`)
@ -4905,7 +4905,7 @@ Scrapy 0.18.0 (released 2013-08-09)
 -----------------------------------

 - Lot of improvements to testsuite run using Tox, including a way to test on pypi
- Handle GET parameters for AJAX crawleable urls (:commit:`3fe2a32`)
+- Handle GET parameters for AJAX crawlable urls (:commit:`3fe2a32`)
 - Use lxml recover option to parse sitemaps (:issue:`347`)
 - Bugfix cookie merging by hostname and not by netloc (:issue:`352`)
 - Support disabling ``HttpCompressionMiddleware`` using a flag setting (:issue:`359`)
@ -4939,8 +4939,8 @@ Scrapy 0.18.0 (released 2013-08-09)
 - Added ``--pdb`` option to ``scrapy`` command line tool
 - Added :meth:`XPathSelector.remove_namespaces <scrapy.selector.Selector.remove_namespaces>` which allows to remove all namespaces from XML documents for convenience (to work with namespace-less XPaths). Documented in :ref:`topics-selectors`.
 - Several improvements to spider contracts
- New default middleware named MetaRefreshMiddldeware that handles meta-refresh html tag redirections,
- MetaRefreshMiddldeware and RedirectMiddleware have different priorities to address #62
+- New default middleware named MetaRefreshMiddleware that handles meta-refresh html tag redirections,
+- MetaRefreshMiddleware and RedirectMiddleware have different priorities to address #62
 - added from_crawler method to spiders
 - added system tests with mock server
 - more improvements to macOS compatibility (thanks Alex Cepoi)
@ -5082,7 +5082,7 @@ Scrapy changes:
 - promoted :ref:`topics-djangoitem` to main contrib
 - LogFormatter method now return dicts(instead of strings) to support lazy formatting (:issue:`164`, :commit:`dcef7b0`)
 - downloader handlers (:setting:`DOWNLOAD_HANDLERS` setting) now receive settings as the first argument of the ``__init__`` method
- replaced memory usage acounting with (more portable) `resource`_ module, removed ``scrapy.utils.memory`` module
+- replaced memory usage accounting with (more portable) `resource`_ module, removed ``scrapy.utils.memory`` module
 - removed signal: ``scrapy.mail.mail_sent``
 - removed ``TRACK_REFS`` setting, now :ref:`trackrefs <topics-leaks-trackrefs>` is always enabled
 - DBM is now the default storage backend for HTTP cache middleware
@ -5148,7 +5148,7 @@ Scrapy 0.14
 New features and settings
 ~~~~~~~~~~~~~~~~~~~~~~~~~

- Support for `AJAX crawleable urls`_
+- Support for `AJAX crawlable urls`_
 - New persistent scheduler that stores requests on disk, allowing to suspend and resume crawls (:rev:`2737`)
 - added ``-o`` option to ``scrapy crawl``, a shortcut for dumping scraped items into a file (or standard output using ``-``)
 - Added support for passing custom settings to Scrapyd ``schedule.json`` api (:rev:`2779`, :rev:`2783`)
@ -5408,7 +5408,7 @@ Backward-incompatible changes
 - Renamed setting: ``REQUESTS_PER_DOMAIN`` to ``CONCURRENT_REQUESTS_PER_SPIDER`` (:rev:`1830`, :rev:`1844`)
 - Renamed setting: ``CONCURRENT_DOMAINS`` to ``CONCURRENT_SPIDERS`` (:rev:`1830`)
 - Refactored HTTP Cache middleware
- HTTP Cache middleware has been heavilty refactored, retaining the same functionality except for the domain sectorization which was removed. (:rev:`1843` )
+- HTTP Cache middleware has been heavily refactored, retaining the same functionality except for the domain sectorization which was removed. (:rev:`1843` )
 - Renamed exception: ``DontCloseDomain`` to ``DontCloseSpider`` (:rev:`1859` | #120)
 - Renamed extension: ``DelayedCloseDomain`` to ``SpiderCloseDelay`` (:rev:`1861` | #121)
 - Removed obsolete ``scrapy.utils.markup.remove_escape_chars`` function - use ``scrapy.utils.markup.replace_escape_chars`` instead (:rev:`1865`)
@ -5419,7 +5419,7 @@ Scrapy 0.7
 First release of Scrapy.


-.. _AJAX crawleable urls: https://developers.google.com/search/docs/ajax-crawling/docs/getting-started?csw=1
+.. _AJAX crawlable urls: https://developers.google.com/search/docs/ajax-crawling/docs/getting-started?csw=1
 .. _botocore: https://github.com/boto/botocore
 .. _chunked transfer encoding: https://en.wikipedia.org/wiki/Chunked_transfer_encoding
 .. _ClientForm: http://wwwsearch.sourceforge.net/old/ClientForm/
--- a/docs/topics/settings.rst
+++ b/docs/topics/settings.rst
@ -636,19 +636,30 @@ DOWNLOAD_DELAY

 Default: ``0``

-The amount of time (in secs) that the downloader should wait before downloading
-consecutive pages from the same website. This can be used to throttle the
-crawling speed to avoid hitting servers too hard. Decimal numbers are
-supported.  Example::
+Minimum seconds to wait between 2 consecutive requests to the same domain.

-    DOWNLOAD_DELAY = 0.25    # 250 ms of delay
+Use :setting:`DOWNLOAD_DELAY` to throttle your crawling speed, to avoid hitting
+servers too hard.
+
+Decimal numbers are supported. For example, to send a maximum of 4 requests
+every 10 seconds::
+
+    DOWNLOAD_DELAY = 2.5

 This setting is also affected by the :setting:`RANDOMIZE_DOWNLOAD_DELAY`
-setting (which is enabled by default). By default, Scrapy doesn't wait a fixed
-amount of time between requests, but uses a random interval between 0.5 * :setting:`DOWNLOAD_DELAY` and 1.5 * :setting:`DOWNLOAD_DELAY`.
+setting, which is enabled by default.

 When :setting:`CONCURRENT_REQUESTS_PER_IP` is non-zero, delays are enforced
-per ip address instead of per domain.
+per IP address instead of per domain.
+
+Note that :setting:`DOWNLOAD_DELAY` can lower the effective per-domain
+concurrency below :setting:`CONCURRENT_REQUESTS_PER_DOMAIN`. If the response
+time of a domain is lower than :setting:`DOWNLOAD_DELAY`, the effective
+concurrency for that domain is 1. When testing throttling configurations, it
+usually makes sense to lower :setting:`CONCURRENT_REQUESTS_PER_DOMAIN` first,
+and only increase :setting:`DOWNLOAD_DELAY` once
+:setting:`CONCURRENT_REQUESTS_PER_DOMAIN` is 1 but a higher throttling is
+desired.

 .. _spider-download_delay-attribute:

@ -656,6 +667,11 @@ per ip address instead of per domain.

    This delay can be set per spider using :attr:`download_delay` spider attribute.

+It is also possible to change this setting per domain, although it requires
+non-trivial code. See the implementation of the :ref:`AutoThrottle
+<topics-autothrottle>` extension for an example.
+
+
 .. setting:: DOWNLOAD_HANDLERS

 DOWNLOAD_HANDLERS
--- a/docs/topics/spiders.rst
+++ b/docs/topics/spiders.rst
@ -99,7 +99,7 @@ scrapy.Spider
   .. attribute:: crawler

      This attribute is set by the :meth:`from_crawler` class method after
-      initializating the class, and links to the
+      initializing the class, and links to the
      :class:`~scrapy.crawler.Crawler` object to which this spider instance is
      bound.

--- a/extras/qpsclient.py
+++ b/extras/qpsclient.py
@ -1,9 +1,10 @@
 """
-A spider that generate light requests to meassure QPS throughput
+A spider that generate light requests to measure QPS throughput

 usage:

-    scrapy runspider qpsclient.py --loglevel=INFO --set RANDOMIZE_DOWNLOAD_DELAY=0 --set CONCURRENT_REQUESTS=50 -a qps=10 -a latency=0.3
+    scrapy runspider qpsclient.py --loglevel=INFO --set RANDOMIZE_DOWNLOAD_DELAY=0
+     --set CONCURRENT_REQUESTS=50 -a qps=10 -a latency=0.3

 """

--- a/scrapy/cmdline.py
+++ b/scrapy/cmdline.py
@ -24,7 +24,7 @@ class ScrapyArgumentParser(argparse.ArgumentParser):


 def _iter_command_classes(module_name):
-    # TODO: add `name` attribute to commands and and merge this function with
+    # TODO: add `name` attribute to commands and merge this function with
    # scrapy.utils.spider.iter_spider_classes
    for module in walk_modules(module_name):
        for obj in vars(module).values():
--- a/scrapy/core/downloader/contextfactory.py
+++ b/scrapy/core/downloader/contextfactory.py
@ -83,7 +83,9 @@ class ScrapyClientContextFactory(BrowserLikePolicyForHTTPS):
    # kept for old-style HTTP/1.0 downloader context twisted calls,
    # e.g. connectSSL()
    def getContext(self, hostname=None, port=None):
-        return self.getCertificateOptions().getContext()
+        ctx = self.getCertificateOptions().getContext()
+        ctx.set_options(0x4)  # OP_LEGACY_SERVER_CONNECT
+        return ctx

    def creatorForNetloc(self, hostname, port):
        return ScrapyClientTLSOptions(
--- a/scrapy/core/downloader/tls.py
+++ b/scrapy/core/downloader/tls.py
@ -23,8 +23,8 @@ METHOD_TLSv12 = "TLSv1.2"
 openssl_methods = {
    METHOD_TLS: SSL.SSLv23_METHOD,  # protocol negotiation (recommended)
    METHOD_TLSv10: SSL.TLSv1_METHOD,  # TLS 1.0 only
-    METHOD_TLSv11: getattr(SSL, "TLSv1_1_METHOD", 5),  # TLS 1.1 only
-    METHOD_TLSv12: getattr(SSL, "TLSv1_2_METHOD", 6),  # TLS 1.2 only
+    METHOD_TLSv11: SSL.TLSv1_1_METHOD,  # TLS 1.1 only
+    METHOD_TLSv12: SSL.TLSv1_2_METHOD,  # TLS 1.2 only
 }


--- a/scrapy/core/downloader/webclient.py
+++ b/scrapy/core/downloader/webclient.py
@ -101,7 +101,7 @@ class ScrapyHTTPPageGetter(HTTPClient):
 # This class used to inherit from Twisted’s
 # twisted.web.client.HTTPClientFactory. When that class was deprecated in
 # Twisted (https://github.com/twisted/twisted/pull/643), we merged its
-# non-overriden code into this class.
+# non-overridden code into this class.
 class ScrapyHTTPClientFactory(ClientFactory):

    protocol = ScrapyHTTPPageGetter
--- a/scrapy/core/http2/stream.py
+++ b/scrapy/core/http2/stream.py
@ -348,7 +348,7 @@ class Stream:

    def receive_headers(self, headers: List[HeaderTuple]) -> None:
        for name, value in headers:
-            self._response["headers"][name] = value
+            self._response["headers"].appendlist(name, value)

        # Check if we exceed the allowed max data size which can be received
        expected_size = int(self._response["headers"].get(b"Content-Length", -1))
--- a/scrapy/extensions/feedexport.py
+++ b/scrapy/extensions/feedexport.py
@ -384,11 +384,11 @@ class FeedExporter:
        return defer.DeferredList(deferred_list) if deferred_list else None

    def _close_slot(self, slot, spider):
+        slot.finish_exporting()
        if not slot.itemcount and not slot.store_empty:
            # We need to call slot.storage.store nonetheless to get the file
            # properly closed.
            return defer.maybeDeferred(slot.storage.store, slot.file)
-        slot.finish_exporting()
        logmsg = f"{slot.format} feed ({slot.itemcount} items) in: {slot.uri}"
        d = defer.maybeDeferred(slot.storage.store, slot.file)

--- a/scrapy/extensions/httpcache.py
+++ b/scrapy/extensions/httpcache.py
@ -196,7 +196,7 @@ class RFC2616Policy:
        if response.status in (300, 301, 308):
            return self.MAXAGE

-        # Insufficient information to compute fresshness lifetime
+        # Insufficient information to compute freshness lifetime
        return 0

    def _compute_current_age(self, response, request, now):
--- a/scrapy/http/request/form.py
+++ b/scrapy/http/request/form.py
@ -200,7 +200,7 @@ def _select_value(ele: SelectElement, n: str, v: str):
        o = ele.value_options
        return (n, o[0]) if o else (None, None)
    if v is not None and multiple:
-        # This is a workround to bug in lxml fixed 2.3.1
+        # This is a workaround to bug in lxml fixed 2.3.1
        # fix https://github.com/lxml/lxml/commit/57f49eed82068a20da3db8f1b18ae00c1bab8b12#L1L1139
        selected_options = ele.xpath(".//option[@selected]")
        values = [(o.get("value") or o.text or "").strip() for o in selected_options]
--- a/scrapy/linkextractors/lxmlhtml.py
+++ b/scrapy/linkextractors/lxmlhtml.py
@ -226,7 +226,8 @@ class LxmlLinkExtractor:
        Only links that match the settings passed to the ``__init__`` method of
        the link extractor are returned.

-        Duplicate links are omitted.
+        Duplicate links are omitted if the ``unique`` attribute is set to ``True``,
+        otherwise they are returned.
        """
        base_url = get_base_url(response)
        if self.restrict_xpaths:
@ -239,4 +240,6 @@ class LxmlLinkExtractor:
        for doc in docs:
            links = self._extract_links(doc, response.url, response.encoding, base_url)
            all_links.extend(self._process_links(links))
-        return unique_list(all_links)
+        if self.link_extractor.unique:
+            return unique_list(all_links)
+        return all_links
--- a/scrapy/pipelines/images.py
+++ b/scrapy/pipelines/images.py
@ -151,8 +151,8 @@ class ImagesPipeline(FilesPipeline):
            )
            if self._deprecated_convert_image:
                warnings.warn(
-                    f"{self.__class__.__name__}.convert_image() method overriden in a deprecated way, "
-                    "overriden method does not accept response_body argument.",
+                    f"{self.__class__.__name__}.convert_image() method overridden in a deprecated way, "
+                    "overridden method does not accept response_body argument.",
                    category=ScrapyDeprecationWarning,
                )

--- a/scrapy/shell.py
+++ b/scrapy/shell.py
@ -177,7 +177,11 @@ class Shell:

 def inspect_response(response, spider):
    """Open a shell to inspect the given response"""
+    # Shell.start removes the SIGINT handler, so save it and re-add it after
+    # the shell has closed
+    sigint_handler = signal.getsignal(signal.SIGINT)
    Shell(spider.crawler).start(response=response, spider=spider)
+    signal.signal(signal.SIGINT, sigint_handler)


 def _request_deferred(request):
--- a/scrapy/utils/console.py
+++ b/scrapy/utils/console.py
@ -1,5 +1,4 @@
 from functools import wraps
-from collections import OrderedDict


 def _embed_ipython_shell(namespace={}, banner=""):
@ -70,14 +69,12 @@ def _embed_standard_shell(namespace={}, banner=""):
    return wrapper


-DEFAULT_PYTHON_SHELLS = OrderedDict(
-    [
-        ("ptpython", _embed_ptpython_shell),
-        ("ipython", _embed_ipython_shell),
-        ("bpython", _embed_bpython_shell),
-        ("python", _embed_standard_shell),
-    ]
-)
+DEFAULT_PYTHON_SHELLS = {
+    "ptpython": _embed_ptpython_shell,
+    "ipython": _embed_ipython_shell,
+    "bpython": _embed_bpython_shell,
+    "python": _embed_standard_shell,
+}


 def get_shell_embed_func(shells=None, known_shells=None):
--- a/scrapy/utils/defer.py
+++ b/scrapy/utils/defer.py
@ -26,10 +26,7 @@ from twisted.python import failure
 from twisted.python.failure import Failure

 from scrapy.exceptions import IgnoreRequest
-from scrapy.utils.reactor import (
-    is_asyncio_reactor_installed,
-    get_asyncio_event_loop_policy,
-)
+from scrapy.utils.reactor import is_asyncio_reactor_installed, _get_asyncio_event_loop


 def defer_fail(_failure: Failure) -> Deferred:
@ -290,7 +287,7 @@ def deferred_from_coro(o) -> Any:
            # that use asyncio, e.g. "await asyncio.sleep(1)"
            return ensureDeferred(o)
        # wrapping the coroutine into a Future and then into a Deferred, this requires AsyncioSelectorReactor
-        event_loop = get_asyncio_event_loop_policy().get_event_loop()
+        event_loop = _get_asyncio_event_loop()
        return Deferred.fromFuture(asyncio.ensure_future(o, loop=event_loop))
    return o

@ -343,8 +340,7 @@ def deferred_to_future(d: Deferred) -> Future:
                d = treq.get('https://example.com/additional')
                additional_response = await deferred_to_future(d)
    """
-    policy = get_asyncio_event_loop_policy()
-    return d.asFuture(policy.get_event_loop())
+    return d.asFuture(_get_asyncio_event_loop())


 def maybe_deferred_to_future(d: Deferred) -> Union[Deferred, Future]:
--- a/scrapy/utils/reactor.py
+++ b/scrapy/utils/reactor.py
@ -1,6 +1,7 @@
 import asyncio
 import sys
 from contextlib import suppress
+from warnings import catch_warnings, filterwarnings

 from twisted.internet import asyncioreactor, error

@ -83,6 +84,10 @@ def install_reactor(reactor_path, event_loop_path=None):
            installer()


+def _get_asyncio_event_loop():
+    return set_asyncio_event_loop(None)
+
+
 def set_asyncio_event_loop(event_loop_path):
    """Sets and returns the event loop with specified import path."""
    policy = get_asyncio_event_loop_policy()
@ -92,11 +97,26 @@ def set_asyncio_event_loop(event_loop_path):
        asyncio.set_event_loop(event_loop)
    else:
        try:
-            event_loop = policy.get_event_loop()
+            with catch_warnings():
+                # In Python 3.10.9, 3.11.1, 3.12 and 3.13, a DeprecationWarning
+                # is emitted about the lack of a current event loop, because in
+                # Python 3.14 and later `get_event_loop` will raise a
+                # RuntimeError in that event. Because our code is already
+                # prepared for that future behavior, we ignore the deprecation
+                # warning.
+                filterwarnings(
+                    "ignore",
+                    message="There is no current event loop",
+                    category=DeprecationWarning,
+                )
+                event_loop = policy.get_event_loop()
        except RuntimeError:
-            # `get_event_loop` is expected to fail when called from a new thread
-            # with no asyncio event loop yet installed. Such is the case when
-            # called from `scrapy shell`
+            # `get_event_loop` raises RuntimeError when called with no asyncio
+            # event loop yet installed in the following scenarios:
+            # - From a thread other than the main thread. For example, when
+            #   using ``scrapy shell``.
+            # - Previsibly on Python 3.14 and later.
+            #   https://github.com/python/cpython/issues/100160#issuecomment-1345581902
            event_loop = policy.new_event_loop()
            asyncio.set_event_loop(event_loop)
    return event_loop
--- a/scrapy/utils/response.py
+++ b/scrapy/utils/response.py
@ -40,7 +40,7 @@ def get_meta_refresh(
    response: "scrapy.http.response.text.TextResponse",
    ignore_tags: Optional[Iterable[str]] = ("script", "noscript"),
 ) -> Union[Tuple[None, None], Tuple[float, str]]:
-    """Parse the http-equiv refrsh parameter from the given response"""
+    """Parse the http-equiv refresh parameter from the given response"""
    if response not in _metaref_cache:
        text = response.text[0:4096]
        _metaref_cache[response] = html.get_meta_refresh(
--- a/scrapy/utils/ssl.py
+++ b/scrapy/utils/ssl.py
@ -1,14 +1,9 @@
-import OpenSSL
+import OpenSSL.SSL
 import OpenSSL._util as pyOpenSSLutil

 from scrapy.utils.python import to_unicode


-# The OpenSSL symbol is present since 1.1.1 but it's not currently supported in any version of pyOpenSSL.
-# Using the binding directly, as this code does, requires cryptography 2.4.
-SSL_OP_NO_TLSv1_3 = getattr(pyOpenSSLutil.lib, "SSL_OP_NO_TLSv1_3", 0)
-
-
 def ffi_buf_to_string(buf):
    return to_unicode(pyOpenSSLutil.ffi.string(buf))

@ -24,11 +19,6 @@ def x509name_to_string(x509name):


 def get_temp_key_info(ssl_object):
-    if not hasattr(
-        pyOpenSSLutil.lib, "SSL_get_server_tmp_key"
-    ):  # requires OpenSSL 1.0.2
-        return None
-
    # adapted from OpenSSL apps/s_cb.c::ssl_print_tmp_key()
    temp_key_p = pyOpenSSLutil.ffi.new("EVP_PKEY **")
    if not pyOpenSSLutil.lib.SSL_get_server_tmp_key(ssl_object, temp_key_p):
--- a/scrapy/utils/url.py
+++ b/scrapy/utils/url.py
@ -48,7 +48,7 @@ def parse_url(url, encoding=None):

 def escape_ajax(url):
    """
-    Return the crawleable url according to:
+    Return the crawlable url according to:
    https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

    >>> escape_ajax("www.example.com/ajax.html#!key=value")
--- a/sep/sep-016.rst
+++ b/sep/sep-016.rst
@ -148,7 +148,7 @@ Another example could be for building URL canonicalizers:
 ::

   #!python
-   class CanonializeUrl(LegSpider):
+   class CanonicalizeUrl(LegSpider):

       def process_request(self, request):
           curl = canonicalize_url(request.url, rules=self.spider.canonicalization_rules)
--- a/sep/sep-018.rst
+++ b/sep/sep-018.rst
@ -321,7 +321,7 @@ Another example could be for building URL canonicalizers:
 ::

   #!python
-   class CanonializeUrl(object):
+   class CanonicalizeUrl(object):

       def process_request(self, request, response, spider):
           curl = canonicalize_url(request.url, 
@ -594,18 +594,18 @@ A middleware to Scrape data using Parsley as described in UsingParsley

   class ParsleyExtractor(object):

-       def __init__(self, parslet_json_code):
-           parslet = json.loads(parselet_json_code)
+       def __init__(self, parsley_json_code):
+           parsley = json.loads(parselet_json_code)
           class ParsleyItem(Item):
               def __init__(self, *a, **kw):
-                   for name in parslet.keys():
+                   for name in parsley.keys():
                       self.fields[name] = Field()
               super(ParsleyItem, self).__init__(*a, **kw)
           self.item_class = ParsleyItem    
-           self.parsley = PyParsley(parslet, output='python') 
+           self.parsley = PyParsley(parsley, output='python') 

       def process_response(self, response, request, spider):
-           return self.item_class(self.parsly.parse(string=response.body))
+           return self.item_class(self.parsley.parse(string=response.body))



--- a/sep/sep-021.rst
+++ b/sep/sep-021.rst
@ -79,7 +79,7 @@ If it raises an exception, Scrapy will print it and exit.
 Examples::

    def addon_configure(settings):
-        settings.overrides['DOWNLADER_MIDDLEWARES'].update({
+        settings.overrides['DOWNLOADER_MIDDLEWARES'].update({
            'scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware': 900,
        })

--- a/setup.py
+++ b/setup.py
@ -19,7 +19,7 @@ def has_environment_marker_platform_impl_support():

 install_requires = [
    "Twisted>=18.9.0",
-    "cryptography>=3.3",
+    "cryptography>=3.4.6",
    "cssselect>=0.9.1",
    "itemloaders>=1.0.1",
    "parsel>=1.5.0",
--- a/tests/mockserver.py
+++ b/tests/mockserver.py
@ -19,7 +19,6 @@ from twisted.web.static import File
 from twisted.web.util import redirectTo

 from scrapy.utils.python import to_bytes, to_unicode
-from scrapy.utils.ssl import SSL_OP_NO_TLSv1_3
 from scrapy.utils.test import get_testenv


@ -358,7 +357,7 @@ def ssl_context_factory(
    if cipher_string:
        ctx = factory.getContext()
        # disabling TLS1.3 because it unconditionally enables some strong ciphers
-        ctx.set_options(SSL.OP_CIPHER_SERVER_PREFERENCE | SSL_OP_NO_TLSv1_3)
+        ctx.set_options(SSL.OP_CIPHER_SERVER_PREFERENCE | SSL.OP_NO_TLSv1_3)
        ctx.set_cipher_list(to_bytes(cipher_string))
    return factory

--- a/tests/pipelines.py
+++ b/tests/pipelines.py
@ -11,6 +11,6 @@ class ZeroDivisionErrorPipeline:
        return item


-class ProcessWithZeroDivisionErrorPipiline:
+class ProcessWithZeroDivisionErrorPipeline:
    def process_item(self, item, spider):
        1 / 0
--- a/tests/sample_data/link_extractor/linkextractor.html
+++ b/tests/sample_data/link_extractor/linkextractor.html
@ -13,6 +13,7 @@
      </div>
      <a href='http://example.com/sample3.html' title='sample 3'>sample 3 text</a>
      <a href='sample3.html'>sample 3 repetition</a>
+      <a href='sample3.html'>sample 3 repetition</a>
      <a href='sample3.html#foo'>sample 3 repetition with fragment</a>
      <a href='http://www.google.com/something'></a>
      <a href='http://example.com/innertag.html'><strong>inner</strong> tag</a>
--- a/tests/test_commands.py
+++ b/tests/test_commands.py
@ -336,7 +336,7 @@ class StartprojectTemplatesTest(ProjectTest):
        self.assertEqual(actual_permissions, expected_permissions)

    def test_startproject_permissions_unchanged_in_destination(self):
-        """Check that pre-existing folders and files in the destination folder
+        """Check that preexisting folders and files in the destination folder
        do not see their permissions modified."""
        scrapy_path = scrapy.__path__[0]
        project_template = Path(scrapy_path, "templates", "project")
--- a/tests/test_crawl.py
+++ b/tests/test_crawl.py
@ -154,7 +154,7 @@ class CrawlTestCase(TestCase):
            raise unittest.SkipTest("Non-existing hosts are resolvable")
        crawler = get_crawler(SimpleSpider)
        with LogCapture() as log:
-            # try to fetch the homepage of a non-existent domain
+            # try to fetch the homepage of a nonexistent domain
            yield crawler.crawl(
                "http://dns.resolution.invalid./", mockserver=self.mockserver
            )
@ -183,7 +183,7 @@ class CrawlTestCase(TestCase):
        self.assertIs(record.exc_info[0], ZeroDivisionError)

    @defer.inlineCallbacks
-    def test_start_requests_lazyness(self):
+    def test_start_requests_laziness(self):
        settings = {"CONCURRENT_REQUESTS": 1}
        crawler = get_crawler(BrokenStartRequestsSpider, settings)
        yield crawler.crawl(mockserver=self.mockserver)
--- a/tests/test_downloader_handlers.py
+++ b/tests/test_downloader_handlers.py
@ -209,6 +209,12 @@ class LargeChunkedFileResource(resource.Resource):
        return server.NOT_DONE_YET


+class DuplicateHeaderResource(resource.Resource):
+    def render(self, request):
+        request.responseHeaders.setRawHeaders(b"Set-Cookie", [b"a=b", b"c=d"])
+        return b""
+
+
 class HttpTestCase(unittest.TestCase):
    scheme = "http"
    download_handler_cls: Type = HTTPDownloadHandler
@ -234,6 +240,7 @@ class HttpTestCase(unittest.TestCase):
        r.putChild(b"contentlength", ContentLengthHeaderResource())
        r.putChild(b"nocontenttype", EmptyContentTypeHeaderResource())
        r.putChild(b"largechunkedfile", LargeChunkedFileResource())
+        r.putChild(b"duplicate-header", DuplicateHeaderResource())
        r.putChild(b"echo", Echo())
        self.site = server.Site(r, timeout=None)
        self.wrapper = WrappingFactory(self.site)
@ -407,6 +414,16 @@ class HttpTestCase(unittest.TestCase):
            HtmlResponse,
        )

+    def test_get_duplicate_header(self):
+        def _test(response):
+            self.assertEqual(
+                response.headers.getlist(b"Set-Cookie"),
+                [b"a=b", b"c=d"],
+            )
+
+        request = Request(self.getURL("duplicate-header"))
+        return self.download_request(request, Spider("foo")).addCallback(_test)
+

 class Http10TestCase(HttpTestCase):
    """HTTP 1.0 test case"""
@ -1095,9 +1112,9 @@ class BaseFTPTestCase(unittest.TestCase):

        return self._add_test_callbacks(d, _test)

-    def test_ftp_download_notexist(self):
+    def test_ftp_download_nonexistent(self):
        request = Request(
-            url=f"ftp://127.0.0.1:{self.portNum}/notexist.txt", meta=self.req_meta
+            url=f"ftp://127.0.0.1:{self.portNum}/nonexistent.txt", meta=self.req_meta
        )
        d = self.download_handler.download_request(request, None)

--- a/tests/test_downloadermiddleware_useragent.py
+++ b/tests/test_downloadermiddleware_useragent.py
@ -19,7 +19,7 @@ class UserAgentMiddlewareTest(TestCase):
        self.assertEqual(req.headers["User-Agent"], b"default_useragent")

    def test_remove_agent(self):
-        # settings UESR_AGENT to None should remove the user agent
+        # settings USER_AGENT to None should remove the user agent
        spider, mw = self.get_spider_and_mw("default_useragent")
        spider.user_agent = None
        mw.spider_opened(spider)
--- a/tests/test_engine.py
+++ b/tests/test_engine.py
@ -109,7 +109,7 @@ class DataClassItemsSpider(TestSpider):
 class ItemZeroDivisionErrorSpider(TestSpider):
    custom_settings = {
        "ITEM_PIPELINES": {
-            "tests.pipelines.ProcessWithZeroDivisionErrorPipiline": 300,
+            "tests.pipelines.ProcessWithZeroDivisionErrorPipeline": 300,
        }
    }

--- a/tests/test_feedexport.py
+++ b/tests/test_feedexport.py
@ -33,8 +33,9 @@ from zope.interface.verify import verifyObject

 import scrapy
 from scrapy.exceptions import NotConfigured, ScrapyDeprecationWarning
-from scrapy.exporters import CsvItemExporter
+from scrapy.exporters import CsvItemExporter, JsonItemExporter
 from scrapy.extensions.feedexport import (
+    _FeedSlot,
    BlockingFeedStorage,
    FeedExporter,
    FileFeedStorage,
@ -664,6 +665,50 @@ class FeedExportTestBase(ABC, unittest.TestCase):
        return result


+class InstrumentedFeedSlot(_FeedSlot):
+    """Instrumented _FeedSlot subclass for keeping track of calls to
+    start_exporting and finish_exporting."""
+
+    def start_exporting(self):
+        self.update_listener("start")
+        super().start_exporting()
+
+    def finish_exporting(self):
+        self.update_listener("finish")
+        super().finish_exporting()
+
+    @classmethod
+    def subscribe__listener(cls, listener):
+        cls.update_listener = listener.update
+
+
+class IsExportingListener:
+    """When subscribed to InstrumentedFeedSlot, keeps track of when
+    a call to start_exporting has been made without a closing call to
+    finish_exporting and when a call to finish_exporting has been made
+    before a call to start_exporting."""
+
+    def __init__(self):
+        self.start_without_finish = False
+        self.finish_without_start = False
+
+    def update(self, method):
+        if method == "start":
+            self.start_without_finish = True
+        elif method == "finish":
+            if self.start_without_finish:
+                self.start_without_finish = False
+            else:
+                self.finish_before_start = True
+
+
+class ExceptionJsonItemExporter(JsonItemExporter):
+    """JsonItemExporter that throws an exception every time export_item is called."""
+
+    def export_item(self, _):
+        raise Exception("foo")
+
+
 class FeedExportTest(FeedExportTestBase):
    __test__ = True

@ -909,6 +954,84 @@ class FeedExportTest(FeedExportTestBase):
            data = yield self.exported_no_data(settings)
            self.assertEqual(b"", data[fmt])

+    @defer.inlineCallbacks
+    def test_start_finish_exporting_items(self):
+        items = [
+            self.MyItem({"foo": "bar1", "egg": "spam1"}),
+        ]
+        settings = {
+            "FEEDS": {
+                self._random_temp_filename(): {"format": "json"},
+            },
+            "FEED_EXPORT_INDENT": None,
+        }
+
+        listener = IsExportingListener()
+        InstrumentedFeedSlot.subscribe__listener(listener)
+
+        with mock.patch("scrapy.extensions.feedexport._FeedSlot", InstrumentedFeedSlot):
+            _ = yield self.exported_data(items, settings)
+            self.assertFalse(listener.start_without_finish)
+            self.assertFalse(listener.finish_without_start)
+
+    @defer.inlineCallbacks
+    def test_start_finish_exporting_no_items(self):
+        items = []
+        settings = {
+            "FEEDS": {
+                self._random_temp_filename(): {"format": "json"},
+            },
+            "FEED_EXPORT_INDENT": None,
+        }
+
+        listener = IsExportingListener()
+        InstrumentedFeedSlot.subscribe__listener(listener)
+
+        with mock.patch("scrapy.extensions.feedexport._FeedSlot", InstrumentedFeedSlot):
+            _ = yield self.exported_data(items, settings)
+            self.assertFalse(listener.start_without_finish)
+            self.assertFalse(listener.finish_without_start)
+
+    @defer.inlineCallbacks
+    def test_start_finish_exporting_items_exception(self):
+        items = [
+            self.MyItem({"foo": "bar1", "egg": "spam1"}),
+        ]
+        settings = {
+            "FEEDS": {
+                self._random_temp_filename(): {"format": "json"},
+            },
+            "FEED_EXPORTERS": {"json": ExceptionJsonItemExporter},
+            "FEED_EXPORT_INDENT": None,
+        }
+
+        listener = IsExportingListener()
+        InstrumentedFeedSlot.subscribe__listener(listener)
+
+        with mock.patch("scrapy.extensions.feedexport._FeedSlot", InstrumentedFeedSlot):
+            _ = yield self.exported_data(items, settings)
+            self.assertFalse(listener.start_without_finish)
+            self.assertFalse(listener.finish_without_start)
+
+    @defer.inlineCallbacks
+    def test_start_finish_exporting_no_items_exception(self):
+        items = []
+        settings = {
+            "FEEDS": {
+                self._random_temp_filename(): {"format": "json"},
+            },
+            "FEED_EXPORTERS": {"json": ExceptionJsonItemExporter},
+            "FEED_EXPORT_INDENT": None,
+        }
+
+        listener = IsExportingListener()
+        InstrumentedFeedSlot.subscribe__listener(listener)
+
+        with mock.patch("scrapy.extensions.feedexport._FeedSlot", InstrumentedFeedSlot):
+            _ = yield self.exported_data(items, settings)
+            self.assertFalse(listener.start_without_finish)
+            self.assertFalse(listener.finish_without_start)
+
    @defer.inlineCallbacks
    def test_export_no_items_store_empty(self):
        formats = (
--- a/tests/test_http_request.py
+++ b/tests/test_http_request.py
@ -399,7 +399,7 @@ class RequestTest(unittest.TestCase):
            )
            self.assertEqual(r.method, "DELETE")

-        # If `ignore_unknon_options` is set to `False` it raises an error with
+        # If `ignore_unknown_options` is set to `False` it raises an error with
        # the unknown options: --foo and -z
        self.assertRaises(
            ValueError,
@ -997,7 +997,7 @@ class FormRequestTest(RequestTest):
        fs = _qs(r1)
        self.assertEqual(fs, {b"four": [b"4"], b"three": [b"3"]})

-    def test_from_response_formname_notexist(self):
+    def test_from_response_formname_nonexistent(self):
        response = _buildresponse(
            """<form name="form1" action="post.php" method="POST">
            <input type="hidden" name="one" value="1">
@ -1044,7 +1044,7 @@ class FormRequestTest(RequestTest):
        fs = _qs(r1)
        self.assertEqual(fs, {b"four": [b"4"], b"three": [b"3"]})

-    def test_from_response_formname_notexists_fallback_formid(self):
+    def test_from_response_formname_nonexistent_fallback_formid(self):
        response = _buildresponse(
            """<form action="post.php" method="POST">
            <input type="hidden" name="one" value="1">
@ -1062,7 +1062,7 @@ class FormRequestTest(RequestTest):
        fs = _qs(r1)
        self.assertEqual(fs, {b"four": [b"4"], b"three": [b"3"]})

-    def test_from_response_formid_notexist(self):
+    def test_from_response_formid_nonexistent(self):
        response = _buildresponse(
            """<form id="form1" action="post.php" method="POST">
            <input type="hidden" name="one" value="1">
--- a/tests/test_http_response.py
+++ b/tests/test_http_response.py
@ -518,7 +518,7 @@ class TextResponseTest(BaseResponseTest):
    def test_bom_is_removed_from_body(self):
        # Inferring encoding from body also cache decoded body as sideeffect,
        # this test tries to ensure that calling response.encoding and
-        # response.text in indistint order doesn't affect final
+        # response.text in indistinct order doesn't affect final
        # values for encoding and decoded body.
        url = "http://example.com"
        body = b"\xef\xbb\xbfWORD"
@ -645,6 +645,7 @@ class TextResponseTest(BaseResponseTest):
            "http://example.com/sample2.html",
            "http://example.com/sample3.html",
            "http://example.com/sample3.html",
+            "http://example.com/sample3.html",
            "http://example.com/sample3.html#foo",
            "http://www.google.com/something",
            "http://example.com/innertag.html",
--- a/tests/test_linkextractors.py
+++ b/tests/test_linkextractors.py
@ -74,6 +74,10 @@ class Base:
                        url="http://example.com/sample3.html",
                        text="sample 3 repetition",
                    ),
+                    Link(
+                        url="http://example.com/sample3.html",
+                        text="sample 3 repetition",
+                    ),
                    Link(
                        url="http://example.com/sample3.html#foo",
                        text="sample 3 repetition with fragment",
@ -93,6 +97,10 @@ class Base:
                        url="http://example.com/sample3.html",
                        text="sample 3 repetition",
                    ),
+                    Link(
+                        url="http://example.com/sample3.html",
+                        text="sample 3 repetition",
+                    ),
                    Link(
                        url="http://example.com/sample3.html",
                        text="sample 3 repetition with fragment",
--- a/tests/test_pipeline_images.py
+++ b/tests/test_pipeline_images.py
@ -225,8 +225,8 @@ class ImagesPipelineTestCase(unittest.TestCase):
                self.assertEqual(buf.getvalue(), thumb_buf.getvalue())

                expected_warning_msg = (
-                    ".convert_image() method overriden in a deprecated way, "
-                    "overriden method does not accept response_body argument."
+                    ".convert_image() method overridden in a deprecated way, "
+                    "overridden method does not accept response_body argument."
                )
                self.assertEqual(
                    len(
@ -244,7 +244,7 @@ class ImagesPipelineTestCase(unittest.TestCase):
        with warnings.catch_warnings(record=True) as w:
            warnings.simplefilter("always")
            SIZE = (100, 100)
-            # straigh forward case: RGB and JPEG
+            # straight forward case: RGB and JPEG
            COLOUR = (0, 127, 255)
            im, _ = _create_image("JPEG", "RGB", SIZE, COLOUR)
            converted, _ = self.pipeline.convert_image(im)
@ -271,7 +271,7 @@ class ImagesPipelineTestCase(unittest.TestCase):
            self.assertEqual(converted.mode, "RGB")
            self.assertEqual(converted.getcolors(), [(10000, (205, 230, 255))])

-            # ensure that we recieved deprecation warnings
+            # ensure that we received deprecation warnings
            expected_warning_msg = ".convert_image() method called in a deprecated way"
            self.assertTrue(
                len(
@ -287,7 +287,7 @@ class ImagesPipelineTestCase(unittest.TestCase):
    def test_convert_image_new(self):
        # tests for new API
        SIZE = (100, 100)
-        # straigh forward case: RGB and JPEG
+        # straight forward case: RGB and JPEG
        COLOUR = (0, 127, 255)
        im, buf = _create_image("JPEG", "RGB", SIZE, COLOUR)
        converted, converted_buf = self.pipeline.convert_image(im, response_body=buf)
--- a/tests/test_request_attribute_binding.py
+++ b/tests/test_request_attribute_binding.py
@ -11,12 +11,12 @@ from tests.mockserver import MockServer
 from tests.spiders import SingleRequestSpider


-OVERRIDEN_URL = "https://example.org"
+OVERRIDDEN_URL = "https://example.org"


 class ProcessResponseMiddleware:
    def process_response(self, request, response, spider):
-        return response.replace(request=Request(OVERRIDEN_URL))
+        return response.replace(request=Request(OVERRIDDEN_URL))


 class RaiseExceptionRequestMiddleware:
@ -30,7 +30,7 @@ class CatchExceptionOverrideRequestMiddleware:
        return Response(
            url="http://localhost/",
            body=b"Caught " + exception.__class__.__name__.encode("utf-8"),
-            request=Request(OVERRIDEN_URL),
+            request=Request(OVERRIDDEN_URL),
        )


@ -52,7 +52,7 @@ class AlternativeCallbacksSpider(SingleRequestSpider):
 class AlternativeCallbacksMiddleware:
    def process_response(self, request, response, spider):
        new_request = request.replace(
-            url=OVERRIDEN_URL,
+            url=OVERRIDDEN_URL,
            callback=spider.alt_callback,
            cb_kwargs={"foo": "bar"},
        )
@ -132,16 +132,16 @@ class CrawlTestCase(TestCase):
            yield crawler.crawl(seed=url, mockserver=self.mockserver)

        response = crawler.spider.meta["responses"][0]
-        self.assertEqual(response.request.url, OVERRIDEN_URL)
+        self.assertEqual(response.request.url, OVERRIDDEN_URL)

        self.assertEqual(signal_params["response"].url, url)
-        self.assertEqual(signal_params["request"].url, OVERRIDEN_URL)
+        self.assertEqual(signal_params["request"].url, OVERRIDDEN_URL)

        log.check_present(
            (
                "scrapy.core.engine",
                "DEBUG",
-                f"Crawled (200) <GET {OVERRIDEN_URL}> (referer: None)",
+                f"Crawled (200) <GET {OVERRIDDEN_URL}> (referer: None)",
            ),
        )

@ -166,7 +166,7 @@ class CrawlTestCase(TestCase):
        yield crawler.crawl(seed=url, mockserver=self.mockserver)
        response = crawler.spider.meta["responses"][0]
        self.assertEqual(response.body, b"Caught ZeroDivisionError")
-        self.assertEqual(response.request.url, OVERRIDEN_URL)
+        self.assertEqual(response.request.url, OVERRIDDEN_URL)

    @defer.inlineCallbacks
    def test_downloader_middleware_do_not_override_in_process_exception(self):
--- a/tests/test_spidermiddleware_referer.py
+++ b/tests/test_spidermiddleware_referer.py
@ -227,7 +227,7 @@ class MixinSameOrigin:
        ),
        ("http://example.com:81/page.html", "http://example.com/not-page.html", None),
        ("http://example.com/page.html", "http://example.com:81/not-page.html", None),
-        # Different protocols: do NOT send refferer
+        # Different protocols: do NOT send referrer
        ("https://example.com/page.html", "http://example.com/not-page.html", None),
        ("https://example.com/page.html", "http://not.example.com/", None),
        ("ftps://example.com/urls.zip", "https://example.com/not-page.html", None),
@ -750,19 +750,19 @@ class TestRequestMetaUnsafeUrl(MixinUnsafeUrl, TestRefererMiddleware):
    req_meta = {"referrer_policy": POLICY_UNSAFE_URL}


-class TestRequestMetaPredecence001(MixinUnsafeUrl, TestRefererMiddleware):
+class TestRequestMetaPrecedence001(MixinUnsafeUrl, TestRefererMiddleware):
    settings = {"REFERRER_POLICY": "scrapy.spidermiddlewares.referer.SameOriginPolicy"}
    req_meta = {"referrer_policy": POLICY_UNSAFE_URL}


-class TestRequestMetaPredecence002(MixinNoReferrer, TestRefererMiddleware):
+class TestRequestMetaPrecedence002(MixinNoReferrer, TestRefererMiddleware):
    settings = {
        "REFERRER_POLICY": "scrapy.spidermiddlewares.referer.NoReferrerWhenDowngradePolicy"
    }
    req_meta = {"referrer_policy": POLICY_NO_REFERRER}


-class TestRequestMetaPredecence003(MixinUnsafeUrl, TestRefererMiddleware):
+class TestRequestMetaPrecedence003(MixinUnsafeUrl, TestRefererMiddleware):
    settings = {
        "REFERRER_POLICY": "scrapy.spidermiddlewares.referer.OriginWhenCrossOriginPolicy"
    }
@ -888,19 +888,19 @@ class TestSettingsPolicyByName(TestCase):
            RefererMiddleware(settings)


-class TestPolicyHeaderPredecence001(MixinUnsafeUrl, TestRefererMiddleware):
+class TestPolicyHeaderPrecedence001(MixinUnsafeUrl, TestRefererMiddleware):
    settings = {"REFERRER_POLICY": "scrapy.spidermiddlewares.referer.SameOriginPolicy"}
    resp_headers = {"Referrer-Policy": POLICY_UNSAFE_URL.upper()}


-class TestPolicyHeaderPredecence002(MixinNoReferrer, TestRefererMiddleware):
+class TestPolicyHeaderPrecedence002(MixinNoReferrer, TestRefererMiddleware):
    settings = {
        "REFERRER_POLICY": "scrapy.spidermiddlewares.referer.NoReferrerWhenDowngradePolicy"
    }
    resp_headers = {"Referrer-Policy": POLICY_NO_REFERRER.swapcase()}


-class TestPolicyHeaderPredecence003(
+class TestPolicyHeaderPrecedence003(
    MixinNoReferrerWhenDowngrade, TestRefererMiddleware
 ):
    settings = {
@ -909,7 +909,7 @@ class TestPolicyHeaderPredecence003(
    resp_headers = {"Referrer-Policy": POLICY_NO_REFERRER_WHEN_DOWNGRADE.title()}


-class TestPolicyHeaderPredecence004(
+class TestPolicyHeaderPrecedence004(
    MixinNoReferrerWhenDowngrade, TestRefererMiddleware
 ):
    """
--- a/tests/test_utils_asyncio.py
+++ b/tests/test_utils_asyncio.py
@ -15,6 +15,11 @@ class AsyncioTest(TestCase):
        )

    def test_install_asyncio_reactor(self):
+        from twisted.internet import reactor as original_reactor
+
        with warnings.catch_warnings(record=True) as w:
            install_reactor("twisted.internet.asyncioreactor.AsyncioSelectorReactor")
            self.assertEqual(len(w), 0)
+        from twisted.internet import reactor
+
+        assert original_reactor == reactor
--- a/tests/test_utils_deprecate.py
+++ b/tests/test_utils_deprecate.py
@ -74,7 +74,7 @@ class WarnWhenSubclassedTest(unittest.TestCase):
        self.assertIn("foo.NewClass", str(w[1].message))
        self.assertIn("bar.OldClass", str(w[1].message))

-    def test_subclassing_warns_only_on_direct_childs(self):
+    def test_subclassing_warns_only_on_direct_children(self):
        Deprecated = create_deprecated_class(
            "Deprecated", NewName, warn_once=False, warn_category=MyWarning
        )
--- a/tests/test_utils_display.py
+++ b/tests/test_utils_display.py
@ -7,17 +7,27 @@ from scrapy.utils.display import pformat, pprint

 class TestDisplay(TestCase):
    object = {"a": 1}
-    colorized_string = (
-        "{\x1b[33m'\x1b[39;49;00m\x1b[33ma\x1b[39;49;00m\x1b[33m'"
-        "\x1b[39;49;00m: \x1b[34m1\x1b[39;49;00m}\n"
-    )
+    colorized_strings = {
+        (
+            (
+                "{\x1b[33m'\x1b[39;49;00m\x1b[33ma\x1b[39;49;00m\x1b[33m'"
+                "\x1b[39;49;00m: \x1b[34m1\x1b[39;49;00m}"
+            )
+            + suffix
+        )
+        for suffix in (
+            # https://github.com/pygments/pygments/issues/2313
+            "\n",  # pygments ≤ 2.13
+            "\x1b[37m\x1b[39;49;00m\n",  # pygments ≥ 2.14
+        )
+    }
    plain_string = "{'a': 1}"

    @mock.patch("sys.platform", "linux")
    @mock.patch("sys.stdout.isatty")
    def test_pformat(self, isatty):
        isatty.return_value = True
-        self.assertEqual(pformat(self.object), self.colorized_string)
+        self.assertIn(pformat(self.object), self.colorized_strings)

    @mock.patch("sys.stdout.isatty")
    def test_pformat_dont_colorize(self, isatty):
@ -33,7 +43,7 @@ class TestDisplay(TestCase):
    def test_pformat_old_windows(self, isatty, version):
        isatty.return_value = True
        version.return_value = "10.0.14392"
-        self.assertEqual(pformat(self.object), self.colorized_string)
+        self.assertIn(pformat(self.object), self.colorized_strings)

    @mock.patch("sys.platform", "win32")
    @mock.patch("scrapy.utils.display._enable_windows_terminal_processing")
@ -55,7 +65,7 @@ class TestDisplay(TestCase):
        isatty.return_value = True
        version.return_value = "10.0.14393"
        terminal_processing.return_value = True
-        self.assertEqual(pformat(self.object), self.colorized_string)
+        self.assertIn(pformat(self.object), self.colorized_strings)

    @mock.patch("sys.platform", "linux")
    @mock.patch("sys.stdout.isatty")
--- a/tests/test_utils_python.py
+++ b/tests/test_utils_python.py
@ -159,7 +159,7 @@ class UtilsPythonTestCase(unittest.TestCase):
        b = Obj()
        # no attributes given return False
        self.assertFalse(equal_attributes(a, b, []))
-        # not existent attributes
+        # nonexistent attributes
        self.assertFalse(equal_attributes(a, b, ["x", "y"]))

        a.x = 1
--- a/tox.ini
+++ b/tox.ini
@ -32,7 +32,7 @@ download = true
 commands =
    pytest --cov=scrapy --cov-report=xml --cov-report= {posargs:--durations=10 docs scrapy tests}
 install_command =
-    pip install -U -ctests/upper-constraints.txt {opts} {packages}
+    python -I -m pip install -ctests/upper-constraints.txt {opts} {packages}

 [testenv:typing]
 basepython = python3
@ -63,8 +63,7 @@ commands =
    flake8 {posargs:docs scrapy tests}

 [testenv:pylint]
-# reppy does not support Python 3.9+
-basepython = python3.8
+basepython = python3
 deps =
    {[testenv:extra-deps]deps}
    pylint==2.15.6
@ -75,13 +74,14 @@ commands =
 basepython = python3
 deps =
    twine==4.0.1
+    build==0.9.0
 commands =
-    python setup.py sdist
+    python -m build --sdist
    twine check dist/*

 [pinned]
 deps =
-    cryptography==3.3
+    cryptography==3.4.6
    cssselect==0.9.1
    h2==3.0
    itemadapter==0.1.0
@ -106,7 +106,7 @@ deps =
 setenv =
    _SCRAPY_PINNED=true
 install_command =
-    pip install -U {opts} {packages}
+    python -I -m pip install {opts} {packages}

 [testenv:pinned]
 deps =
@ -126,8 +126,7 @@ setenv =
    {[pinned]setenv}

 [testenv:extra-deps]
-# reppy does not support Python 3.9+
-basepython = python3.8
+basepython = python3
 deps =
    {[testenv]deps}
    boto
@ -135,7 +134,6 @@ deps =
    # Twisted[http2] currently forces old mitmproxy because of h2 version
    # restrictions in their deps, so we need to pin old markupsafe here too.
    markupsafe < 2.1.0
-    reppy
    robotexclusionrulesparser
    Pillow>=4.0.0
    Twisted[http2]>=17.9.0