mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-24 15:43:48 +00:00
Merge pull request #3609 from Gallaecio/2253
Document FilesPipeline.file_path and ImagesPipeline.file_path
This commit is contained in:
commit
4d4bd0e823
@ -392,6 +392,36 @@ See here the methods that you can override in your custom Files Pipeline:
|
||||
|
||||
.. class:: FilesPipeline
|
||||
|
||||
.. method:: file_path(request, response, info)
|
||||
|
||||
This method is called once per downloaded item. It returns the
|
||||
download path of the file originating from the specified
|
||||
:class:`response <scrapy.http.Response>`.
|
||||
|
||||
In addition to ``response``, this method receives the original
|
||||
:class:`request <scrapy.Request>` and
|
||||
:class:`info <scrapy.pipelines.media.MediaPipeline.SpiderInfo>`.
|
||||
|
||||
You can override this method to customize the download path of each file.
|
||||
|
||||
For example, if file URLs end like regular paths (e.g.
|
||||
``https://example.com/a/b/c/foo.png``), you can use the following
|
||||
approach to download all files into the ``files`` folder with their
|
||||
original filenames (e.g. ``files/foo.png``)::
|
||||
|
||||
import os
|
||||
from urllib.parse import urlparse
|
||||
|
||||
from scrapy.pipelines.files import FilesPipeline
|
||||
|
||||
class MyFilesPipeline(FilesPipeline):
|
||||
|
||||
def file_path(self, request, response, info):
|
||||
return 'files/' + os.path.basename(urlparse(request.url).path)
|
||||
|
||||
By default the :meth:`file_path` method returns
|
||||
``full/<request URL hash>.<extension>``.
|
||||
|
||||
.. method:: FilesPipeline.get_media_requests(item, info)
|
||||
|
||||
As seen on the workflow, the pipeline will get the URLs of the images to
|
||||
@ -475,6 +505,36 @@ See here the methods that you can override in your custom Images Pipeline:
|
||||
The :class:`ImagesPipeline` is an extension of the :class:`FilesPipeline`,
|
||||
customizing the field names and adding custom behavior for images.
|
||||
|
||||
.. method:: file_path(request, response, info)
|
||||
|
||||
This method is called once per downloaded item. It returns the
|
||||
download path of the file originating from the specified
|
||||
:class:`response <scrapy.http.Response>`.
|
||||
|
||||
In addition to ``response``, this method receives the original
|
||||
:class:`request <scrapy.Request>` and
|
||||
:class:`info <scrapy.pipelines.media.MediaPipeline.SpiderInfo>`.
|
||||
|
||||
You can override this method to customize the download path of each file.
|
||||
|
||||
For example, if file URLs end like regular paths (e.g.
|
||||
``https://example.com/a/b/c/foo.png``), you can use the following
|
||||
approach to download all files into the ``files`` folder with their
|
||||
original filenames (e.g. ``files/foo.png``)::
|
||||
|
||||
import os
|
||||
from urllib.parse import urlparse
|
||||
|
||||
from scrapy.pipelines.images import ImagesPipeline
|
||||
|
||||
class MyImagesPipeline(ImagesPipeline):
|
||||
|
||||
def file_path(self, request, response, info):
|
||||
return 'files/' + os.path.basename(urlparse(request.url).path)
|
||||
|
||||
By default the :meth:`file_path` method returns
|
||||
``full/<request URL hash>.<extension>``.
|
||||
|
||||
.. method:: ImagesPipeline.get_media_requests(item, info)
|
||||
|
||||
Works the same way as :meth:`FilesPipeline.get_media_requests` method,
|
||||
|
Loading…
x
Reference in New Issue
Block a user