Merge pull request #3609 from Gallaecio/2253

Document FilesPipeline.file_path and ImagesPipeline.file_path
2025-02-24 15:43:48 +00:00 · 2019-07-04 22:05:46 +05:00 · 2019-07-04 22:05:46 +05:00 · 4d4bd0e823
commit 4d4bd0e823
parent 4c755f1ddc 431f18a9a1
1 changed files with 60 additions and 0 deletions
--- a/docs/topics/media-pipeline.rst
+++ b/docs/topics/media-pipeline.rst
@ -392,6 +392,36 @@ See here the methods that you can override in your custom Files Pipeline:

 .. class:: FilesPipeline

+   .. method:: file_path(request, response, info)
+
+      This method is called once per downloaded item. It returns the
+      download path of the file originating from the specified
+      :class:`response <scrapy.http.Response>`.
+
+      In addition to ``response``, this method receives the original
+      :class:`request <scrapy.Request>` and
+      :class:`info <scrapy.pipelines.media.MediaPipeline.SpiderInfo>`.
+
+      You can override this method to customize the download path of each file.
+
+      For example, if file URLs end like regular paths (e.g.
+      ``https://example.com/a/b/c/foo.png``), you can use the following
+      approach to download all files into the ``files`` folder with their
+      original filenames (e.g. ``files/foo.png``)::
+
+        import os
+        from urllib.parse import urlparse
+
+        from scrapy.pipelines.files import FilesPipeline
+
+        class MyFilesPipeline(FilesPipeline):
+
+            def file_path(self, request, response, info):
+                return 'files/' + os.path.basename(urlparse(request.url).path)
+
+      By default the :meth:`file_path` method returns
+      ``full/<request URL hash>.<extension>``.
+
   .. method:: FilesPipeline.get_media_requests(item, info)

      As seen on the workflow, the pipeline will get the URLs of the images to
@ -475,6 +505,36 @@ See here the methods that you can override in your custom Images Pipeline:
    The :class:`ImagesPipeline` is an extension of the :class:`FilesPipeline`,
    customizing the field names and adding custom behavior for images.

+   .. method:: file_path(request, response, info)
+
+      This method is called once per downloaded item. It returns the
+      download path of the file originating from the specified
+      :class:`response <scrapy.http.Response>`.
+
+      In addition to ``response``, this method receives the original
+      :class:`request <scrapy.Request>` and
+      :class:`info <scrapy.pipelines.media.MediaPipeline.SpiderInfo>`.
+
+      You can override this method to customize the download path of each file.
+
+      For example, if file URLs end like regular paths (e.g.
+      ``https://example.com/a/b/c/foo.png``), you can use the following
+      approach to download all files into the ``files`` folder with their
+      original filenames (e.g. ``files/foo.png``)::
+
+        import os
+        from urllib.parse import urlparse
+
+        from scrapy.pipelines.images import ImagesPipeline
+
+        class MyImagesPipeline(ImagesPipeline):
+
+            def file_path(self, request, response, info):
+                return 'files/' + os.path.basename(urlparse(request.url).path)
+
+      By default the :meth:`file_path` method returns
+      ``full/<request URL hash>.<extension>``.
+
   .. method:: ImagesPipeline.get_media_requests(item, info)

      Works the same way as :meth:`FilesPipeline.get_media_requests` method,