From 713e1eee9b14ef95515b20baacb83daba9c1277a Mon Sep 17 00:00:00 2001 From: Paul Tremberth Date: Tue, 26 Jan 2016 10:44:38 +0100 Subject: [PATCH] Update docs about local files support for "scrapy shell" --- docs/topics/commands.rst | 4 +++- docs/topics/shell.rst | 30 ++++++++++++++++++++++++++++++ 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/docs/topics/commands.rst b/docs/topics/commands.rst index 16af52eea..9a40a2c29 100644 --- a/docs/topics/commands.rst +++ b/docs/topics/commands.rst @@ -373,7 +373,9 @@ shell * Requires project: *no* Starts the Scrapy shell for the given URL (if given) or empty if no URL is -given. See :ref:`topics-shell` for more info. +given. Also supports UNIX-style local file paths, either relative with +``./`` or ``../`` prefixes or absolute file paths. +See :ref:`topics-shell` for more info. Usage example:: diff --git a/docs/topics/shell.rst b/docs/topics/shell.rst index 3569cbf37..4af11fbb6 100644 --- a/docs/topics/shell.rst +++ b/docs/topics/shell.rst @@ -53,6 +53,36 @@ this:: Where the ```` is the URL you want to scrape. +:command:`shell` also works for local files. This can be handy if you want +to play around with a local copy of a web page. :command:`shell` understands +the following syntaxes for local files:: + + # UNIX-style + scrapy shell ./path/to/file.html + scrapy shell ../other/path/to/file.html + scrapy shell /absolute/path/to/file.html + + # File URI + scrapy shell file:///absolute/path/to/file.html + +.. warning:: :command:`shell` will interpret ``index.html`` as a domain name, + not as a relative path to a local file, and will trigger a DNS lookup error:: + + $ scrapy shell index.html + [ ... scrapy shell starts ... ] + 2016-01-26 10:29:51 [scrapy] DEBUG: Gave up retrying + (failed 3 times): DNS lookup failed: + address 'index.html' not found: [Errno -5] No address associated with hostname. + [ ... traceback ... ] + twisted.internet.error.DNSLookupError: DNS lookup failed: + address 'index.html' not found: [Errno -5] No address associated with hostname. + + Use ``./`` prefix instead:: + + $ scrapy shell ./index.html + [ ... scrapy shell starts ... ] + + Using the shell ===============