1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 12:44:06 +00:00

Merge pull request #3053 from scrapy/release-notes-1.5

Release notes for the upcoming 1.5.0 version
This commit is contained in:
Daniel Graña 2017-12-29 11:57:04 -03:00 committed by GitHub
commit 9b4d6a40a6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -3,6 +3,118 @@
Release notes
=============
Scrapy 1.5.0 (2017-XX-XX)
-------------------------
This release brings small new features and improvements across the codebase.
Some highlights:
* Google Cloud Storage is supported in FilesPipeline and ImagesPipeline.
* Crawling with proxy servers becomes more efficient, as connections
to proxies can be reused now.
* Warnings, exception and logging messages are improved to make debugging
easier.
* ``scrapy parse`` command now allows to set custom request meta via
``--meta`` argument.
* Compatibility with Python 3.6, PyPy and PyPy3 is improved;
PyPy and PyPy3 are now supported officially, by running tests on CI.
* Better default handling of HTTP 308, 522 and 524 status codes.
* Documentation is improved, as usual.
Backwards Incompatible Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Scrapy 1.5 drops support for Python 3.3.
* Default Scrapy User-Agent now uses https link to scrapy.org (:issue:`2983`).
**This is technically backwards-incompatible**; override
:setting:`USER_AGENT` if you relied on old value.
* Logging of settings overridden by ``custom_settings`` is fixed;
**this is technically backwards-incompatible** because the logger
changes from ``[scrapy.utils.log]`` to ``[scrapy.crawler]``. If you're
parsing Scrapy logs, please update your log parsers (:issue:`1343`).
* LinkExtractor now ignores ``m4v`` extension by default, this is change
in behavior.
* 522 and 524 status codes are added to ``RETRY_HTTP_CODES`` (:issue:`2851`)
New features
~~~~~~~~~~~~
- Support ``<link>`` tags in ``Response.follow`` (:issue:`2785`)
- Support for ``ptpython`` REPL (:issue:`2654`)
- Google Cloud Storage support for FilesPipeline and ImagesPipeline
(:issue:`2923`).
- New ``--meta`` option of the "scrapy parse" command allows to pass additional
request.meta (:issue:`2883`)
- Populate spider variable when using ``shell.inspect_response`` (:issue:`2812`)
- Handle HTTP 308 Permanent Redirect (:issue:`2844`)
- Add 522 and 524 to ``RETRY_HTTP_CODES`` (:issue:`2851`)
- Log versions information at startup (:issue:`2857`)
- ``scrapy.mail.MailSender`` now works in Python 3 (it requires Twisted 17.9.0)
- Connections to proxy servers are reused (:issue:`2743`)
- Add template for a downloader middleware (:issue:`2755`)
- Explicit message for NotImplementedError when parse callback not defined
(:issue:`2831`)
- CrawlerProcess got an option to disable installation of root log handler
(:issue:`2921`)
- LinkExtractor now ignores ``m4v`` extension by default
- Better log messages for responses over :setting:`DOWNLOAD_WARNSIZE` and
:setting:`DOWNLOAD_MAXSIZE` limits (:issue:`2927`)
- Show warning when a URL is put to ``Spider.allowed_domains`` instead of
a domain (:issue:`2250`).
Bug fixes
~~~~~~~~~
- Fix logging of settings overridden by ``custom_settings``;
**this is technically backwards-incompatible** because the logger
changes from ``[scrapy.utils.log]`` to ``[scrapy.crawler]``, so please
update your log parsers if needed (:issue:`1343`)
- Default Scrapy User-Agent now uses https link to scrapy.org (:issue:`2983`).
**This is technically backwards-incompatible**; override
:setting:`USER_AGENT` if you relied on old value.
- Fix PyPy and PyPy3 test failures, support them officially
(:issue:`2793`, :issue:`2935`, :issue:`2990`, :issue:`3050`, :issue:`2213`,
:issue:`3048`)
- Fix DNS resolver when ``DNSCACHE_ENABLED=False`` (:issue:`2811`)
- Add ``cryptography`` for Debian Jessie tox test env (:issue:`2848`)
- Add verification to check if Request callback is callable (:issue:`2766`)
- Port ``extras/qpsclient.py`` to Python 3 (:issue:`2849`)
- Use getfullargspec under the scenes for Python 3 to stop DeprecationWarning
(:issue:`2862`)
- Update deprecated test aliases (:issue:`2876`)
- Fix ``SitemapSpider`` support for alternate links (:issue:`2853`)
Docs
~~~~
- Added missing bullet point for the ``AUTOTHROTTLE_TARGET_CONCURRENCY``
setting. (:issue:`2756`)
- Update Contributing docs, document new support channels
(:issue:`2762`, issue:`3038`)
- Include references to Scrapy subreddit in the docs
- Fix broken links; use https:// for external links
(:issue:`2978`, :issue:`2982`, :issue:`2958`)
- Document CloseSpider extension better (:issue:`2759`)
- Use ``pymongo.collection.Collection.insert_one()`` in MongoDB example
(:issue:`2781`)
- Spelling mistake and typos
(:issue:`2828`, :issue:`2837`, :issue:`#2884`, :issue:`2924`)
- Clarify ``CSVFeedSpider.headers`` documentation (:issue:`2826`)
- Document ``DontCloseSpider`` exception and clarify ``spider_idle``
(:issue:`2791`)
- Update "Releases" section in README (:issue:`2764`)
- Fix rst syntax in ``DOWNLOAD_FAIL_ON_DATALOSS`` docs (:issue:`2763`)
- Small fix in description of startproject arguments (:issue:`2866`)
- Clarify data types in Response.body docs (:issue:`2922`)
- Add a note about ``request.meta['depth']`` to DepthMiddleware docs (:issue:`2374`)
- Add a note about ``request.meta['dont_merge_cookies']`` to CookiesMiddleware
docs (:issue:`2999`)
- Up-to-date example of project structure (:issue:`2964`, :issue:`2976`)
- A better example of ItemExporters usage (:issue:`2989`)
- Document ``from_crawler`` methods for spider and downloader middlewares
(:issue:`3019`)
Scrapy 1.4.0 (2017-05-18)
-------------------------