mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-23 14:24:19 +00:00
DOC update changelog
* changes from recently merged pull requests * more highlights * re-organized headers * Selector API changes
This commit is contained in:
parent
706910790b
commit
e479f5aa15
142
docs/news.rst
142
docs/news.rst
@ -6,41 +6,83 @@ Release notes
|
||||
Scrapy 1.6.0 (unreleased)
|
||||
-------------------------
|
||||
|
||||
Highlights for this release:
|
||||
Highlights:
|
||||
|
||||
* better Windows compatibility;
|
||||
* better Windows support;
|
||||
* Python 3.7 compatibility;
|
||||
* big documentation improvements, including a switch
|
||||
from ``.extract() / .extract_first()`` API to ``.get() / .getall()`` API;
|
||||
* Feed exports, FilePipeline and MediaPipeline improvements;
|
||||
from ``.extract()`` + ``.extract_first()`` API to ``.get()`` + ``.getall()``
|
||||
API;
|
||||
* feed exports, FilePipeline and MediaPipeline improvements;
|
||||
* better extensibility: :signal:`item_error` and
|
||||
:signal:`request_reached_downloader` signals; ``from_crawler`` support
|
||||
for feed exporters, feed storages and dupefilters.
|
||||
* ``scrapy.contracts`` fixes and new features;
|
||||
* large clean-up of deprecated code
|
||||
* TODO
|
||||
* telnet console security improvements;
|
||||
* clean-up of the deprecated code;
|
||||
* various bug fixes, small new features and usability improvements across
|
||||
the codebase.
|
||||
|
||||
parsel 1.5
|
||||
~~~~~~~~~~
|
||||
Selector API changes
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
TODO
|
||||
While this is not a change in Scrapy itself, a new version of ``parsel``
|
||||
is released; Scrapy now depends on ``parsel >= 1.5``.
|
||||
While these are not changes in Scrapy itself, but rather in the parsel_
|
||||
library which Scrapy uses for xpath/css selectors, these changes are
|
||||
worth mentioning here. Scrapy now depends on parsel >= 1.5, and
|
||||
Scrapy documentation is updated to follow recent ``parsel`` API conventions.
|
||||
|
||||
Feed export improvements
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Most visible change is that ``.get()`` and ``.getall()`` selector
|
||||
methods are now preferred over ``.extract()`` and ``.extract_first()``.
|
||||
We feel that these new methods result in a more concise and readable code.
|
||||
See :ref:`old-extraction-api` for more details.
|
||||
|
||||
.. note::
|
||||
There are currently **no plans** to deprecate ``.extract()``
|
||||
and ``.extract_first()`` methods.
|
||||
|
||||
Another useful new feature is the introduction of ``Selector.attrib`` and
|
||||
``SelectorList.attrib`` properties, which make it easier to get
|
||||
attributes of HTML elements. See :ref:`selecting-attributes`.
|
||||
|
||||
CSS selectors are cached in parsel >= 1.5, which makes them faster
|
||||
when the same CSS path is used many times. This is very common in
|
||||
case of Scrapy spiders: callbacks are usually called several times,
|
||||
on different pages.
|
||||
|
||||
If you're using custom ``Selector`` or ``SelectorList`` subclasses,
|
||||
a **backwards incompatible** change in parsel may affect your code.
|
||||
See `parsel changelog`_ for a detailed description, as well as for the
|
||||
full list of improvements.
|
||||
|
||||
.. _parsel changelog: https://parsel.readthedocs.io/en/latest/history.html
|
||||
|
||||
Telnet console
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
**Backwards incompatible**: Scrapy's telnet console now requires username
|
||||
and password. See :ref:`topics-telnetconsole` for more details.
|
||||
|
||||
New extensibility features
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* ``from_crawler`` support is added to feed exporters and feed storages. This,
|
||||
among other things, allow to access Scrapy settings from custom storages
|
||||
and exporters (:issue:`1605`, :issue:`3348`).
|
||||
* fixed issue with extra blank lines in .csv exports under Windows
|
||||
(:issue:`3039`);
|
||||
* better error message when an exporter is disabled (:issue:`3358`);
|
||||
among other things, allows to access Scrapy settings from custom feed
|
||||
storages and exporters (:issue:`1605`, :issue:`3348`).
|
||||
* ``from_crawler`` support is added to dupefilters (:issue:`2956`); this allows
|
||||
to access e.g. settings or a spider from a dupefilter.
|
||||
* :signal:`item_error` is fired when an error happens in a pipeline
|
||||
(:issue:`3256`);
|
||||
* :signal:`request_reached_downloader` is fired when Downloader gets
|
||||
a new Request; this signal can be useful e.g. for custom Schedulers
|
||||
(:issue:`3393`).
|
||||
|
||||
FilePipeline and MediaPipeline improvements
|
||||
New FilePipeline and MediaPipeline features
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* Expose more options for S3FilesStore: :setting:`AWS_ENDPOINT_URL`,
|
||||
:setting:`AWS_USE_SSL`, :setting:`AWS_VERIFY`, :setting:`AWS_REGION_NAME`.
|
||||
For example, this allows to use alternative or self-hosted
|
||||
AWS-compatible providers (:issue:`2609`).
|
||||
AWS-compatible providers (:issue:`2609`, :issue:`3548`).
|
||||
* ACL support for Google Cloud Storage: :setting:`FILES_STORE_GCS_ACL` and
|
||||
:setting:`IMAGES_STORE_GCS_ACL` (:issue:`3199`).
|
||||
|
||||
@ -55,6 +97,47 @@ FilePipeline and MediaPipeline improvements
|
||||
* Fixed errback handling in contracts, e.g. for cases where a contract
|
||||
is executed for URL which returns non-200 response (:issue:`3371`).
|
||||
|
||||
Usability and other improvements, cleanups
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* All Scrapy tests now pass on Windows; Scrapy testing suite is executed
|
||||
in a Windows environment on CI (:issue:`3315`).
|
||||
* Python 3.7 support (:issue:`3326`, :issue:`3150`, :issue:`3547`).
|
||||
* Lazy loading of Downloader Handlers is now optional; this enables better
|
||||
initialization error handling in custom Downloader Handlers (:issue:`3394`).
|
||||
* Testing and CI fixes (:issue:`3526`, :issue:`3538`, :issue:`3308`,
|
||||
:issue:`3311`, :issue:`3309`, :issue:`3305`, :issue:`3210`, :issue:`3299`)
|
||||
* better error message when an exporter is disabled (:issue:`3358`);
|
||||
* ``scrapy.http.cookies.CookieJar.clear`` accepts "domain", "path" and "name"
|
||||
optional arguments (:issue:`3231`).
|
||||
* more stats for RobotsTxtMiddleware (:issue:`3100`)
|
||||
* INFO log level is used to show telnet host/port (:issue:`3115`)
|
||||
* a message is added to IgnoreRequest in RobotsTxtMiddleware (:issue:`3113`)
|
||||
* better validation of ``url`` argument in ``Response.follow`` (:issue:`3131`)
|
||||
* non-zero exit code is returned from Scrapy commands when error happens
|
||||
on spider inititalization (:issue:`3226`);
|
||||
* link extraction improvements: "ftp" is added to scheme list (:issue:`3152`);
|
||||
"flv" is added to common video extensions (:issue:`3165`)
|
||||
* `scrapy shell --help` mentions syntax required for local files
|
||||
(``./file.html``) - :issue:`3496`.
|
||||
* additional files are included to sdist (:issue:`3495`);
|
||||
* code style fixes (:issue:`3405`, :issue:`3304`);
|
||||
* unneeded .strip() call is removed (:issue:`3519`);
|
||||
* collections.deque is used to store MiddlewareManager methods instead
|
||||
of a list (:issue:`3476`)
|
||||
|
||||
Bug fixes
|
||||
~~~~~~~~~
|
||||
|
||||
* fixed issue with extra blank lines in .csv exports under Windows
|
||||
(:issue:`3039`);
|
||||
* proper handling of pickling errors in Python 3 when serializing objects
|
||||
for disk queues (:issue:`3082`)
|
||||
* flags are now preserved when copying Requests (:issue:`3342`);
|
||||
* FormRequest.from_response clickdata shouldn't ignore elements with
|
||||
``input[type=image]`` (:issue:`3153`).
|
||||
* FormRequest.from_response should preserve duplicate keys (:issue:`3247`)
|
||||
|
||||
Documentation improvements
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -73,21 +156,6 @@ Documentation improvements
|
||||
* remove unused `DEPTH_STATS` option from docs (:issue:`3245`);
|
||||
* other cleanups (:issue:`3347`, :issue:`3350`, :issue:`3445`).
|
||||
|
||||
Better Windows support
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* All Scrapy tests now pass on Windows; Scrapy testing suite is executed
|
||||
in a Windows environment on CI (:issue:`3315`).
|
||||
* Scrapy used to produce unnecessary blank lines in .csv exports on Windows,
|
||||
this is fixed (:issue:`3039`).
|
||||
|
||||
Testing fixes
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
* Python 3.7 support (:issue:`3326`, :issue:`3150`, :issue:`3547`)
|
||||
* Testing and CI fixes (:issue:`3526`, :issue:`3538`, :issue:`3308`,
|
||||
:issue:`3311`, :issue:`3309`, :issue:`3305`, :issue:`3210`, :issue:`3299`)
|
||||
|
||||
Deprecation removals
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -107,7 +175,7 @@ Compatibility shims for pre-1.0 Scrapy module names are removed
|
||||
* ``scrapy.statscol``
|
||||
* ``scrapy.utils.decorator``
|
||||
|
||||
See :ref:`module_relocations` for more information, or use suggestions
|
||||
See :ref:`module-relocations` for more information, or use suggestions
|
||||
from Scrapy 1.5.x deprecation warnings to update your code.
|
||||
|
||||
Other deprecation removals:
|
||||
@ -1225,7 +1293,7 @@ until it reaches a stable status.
|
||||
|
||||
See more examples for scripts running Scrapy: :ref:`topics-practices`
|
||||
|
||||
.. _module_relocations:
|
||||
.. _module-relocations:
|
||||
|
||||
Module Relocations
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
Loading…
x
Reference in New Issue
Block a user