1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 15:43:48 +00:00

Merge pull request #3609 from Gallaecio/2253

Document FilesPipeline.file_path and ImagesPipeline.file_path
This commit is contained in:
Mikhail Korobov 2019-07-04 22:05:46 +05:00 committed by GitHub
commit 4d4bd0e823
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -392,6 +392,36 @@ See here the methods that you can override in your custom Files Pipeline:
.. class:: FilesPipeline
.. method:: file_path(request, response, info)
This method is called once per downloaded item. It returns the
download path of the file originating from the specified
:class:`response <scrapy.http.Response>`.
In addition to ``response``, this method receives the original
:class:`request <scrapy.Request>` and
:class:`info <scrapy.pipelines.media.MediaPipeline.SpiderInfo>`.
You can override this method to customize the download path of each file.
For example, if file URLs end like regular paths (e.g.
``https://example.com/a/b/c/foo.png``), you can use the following
approach to download all files into the ``files`` folder with their
original filenames (e.g. ``files/foo.png``)::
import os
from urllib.parse import urlparse
from scrapy.pipelines.files import FilesPipeline
class MyFilesPipeline(FilesPipeline):
def file_path(self, request, response, info):
return 'files/' + os.path.basename(urlparse(request.url).path)
By default the :meth:`file_path` method returns
``full/<request URL hash>.<extension>``.
.. method:: FilesPipeline.get_media_requests(item, info)
As seen on the workflow, the pipeline will get the URLs of the images to
@ -475,6 +505,36 @@ See here the methods that you can override in your custom Images Pipeline:
The :class:`ImagesPipeline` is an extension of the :class:`FilesPipeline`,
customizing the field names and adding custom behavior for images.
.. method:: file_path(request, response, info)
This method is called once per downloaded item. It returns the
download path of the file originating from the specified
:class:`response <scrapy.http.Response>`.
In addition to ``response``, this method receives the original
:class:`request <scrapy.Request>` and
:class:`info <scrapy.pipelines.media.MediaPipeline.SpiderInfo>`.
You can override this method to customize the download path of each file.
For example, if file URLs end like regular paths (e.g.
``https://example.com/a/b/c/foo.png``), you can use the following
approach to download all files into the ``files`` folder with their
original filenames (e.g. ``files/foo.png``)::
import os
from urllib.parse import urlparse
from scrapy.pipelines.images import ImagesPipeline
class MyImagesPipeline(ImagesPipeline):
def file_path(self, request, response, info):
return 'files/' + os.path.basename(urlparse(request.url).path)
By default the :meth:`file_path` method returns
``full/<request URL hash>.<extension>``.
.. method:: ImagesPipeline.get_media_requests(item, info)
Works the same way as :meth:`FilesPipeline.get_media_requests` method,