mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-23 15:04:27 +00:00
Update documentation links
This commit is contained in:
parent
9f4fe5dc4a
commit
5876b9aa30
@ -120,7 +120,7 @@ Scrapy Contrib
|
|||||||
==============
|
==============
|
||||||
|
|
||||||
Scrapy contrib shares a similar rationale as Django contrib, which is explained
|
Scrapy contrib shares a similar rationale as Django contrib, which is explained
|
||||||
in `this post <http://jacobian.org/writing/what-is-django-contrib/>`_. If you
|
in `this post <https://jacobian.org/writing/what-is-django-contrib/>`_. If you
|
||||||
are working on a new functionality, please follow that rationale to decide
|
are working on a new functionality, please follow that rationale to decide
|
||||||
whether it should be a Scrapy contrib. If unsure, you can ask in
|
whether it should be a Scrapy contrib. If unsure, you can ask in
|
||||||
`scrapy-users`_.
|
`scrapy-users`_.
|
||||||
@ -189,7 +189,7 @@ And their unit-tests are in::
|
|||||||
|
|
||||||
.. _issue tracker: https://github.com/scrapy/scrapy/issues
|
.. _issue tracker: https://github.com/scrapy/scrapy/issues
|
||||||
.. _scrapy-users: https://groups.google.com/forum/#!forum/scrapy-users
|
.. _scrapy-users: https://groups.google.com/forum/#!forum/scrapy-users
|
||||||
.. _Twisted unit-testing framework: http://twistedmatrix.com/documents/current/core/development/policy/test-standard.html
|
.. _Twisted unit-testing framework: https://twistedmatrix.com/documents/current/core/development/policy/test-standard.html
|
||||||
.. _AUTHORS: https://github.com/scrapy/scrapy/blob/master/AUTHORS
|
.. _AUTHORS: https://github.com/scrapy/scrapy/blob/master/AUTHORS
|
||||||
.. _tests/: https://github.com/scrapy/scrapy/tree/master/tests
|
.. _tests/: https://github.com/scrapy/scrapy/tree/master/tests
|
||||||
.. _open issues: https://github.com/scrapy/scrapy/issues
|
.. _open issues: https://github.com/scrapy/scrapy/issues
|
||||||
|
14
docs/faq.rst
14
docs/faq.rst
@ -77,8 +77,8 @@ Scrapy crashes with: ImportError: No module named win32api
|
|||||||
|
|
||||||
You need to install `pywin32`_ because of `this Twisted bug`_.
|
You need to install `pywin32`_ because of `this Twisted bug`_.
|
||||||
|
|
||||||
.. _pywin32: http://sourceforge.net/projects/pywin32/
|
.. _pywin32: https://sourceforge.net/projects/pywin32/
|
||||||
.. _this Twisted bug: http://twistedmatrix.com/trac/ticket/3707
|
.. _this Twisted bug: https://twistedmatrix.com/trac/ticket/3707
|
||||||
|
|
||||||
How can I simulate a user login in my spider?
|
How can I simulate a user login in my spider?
|
||||||
---------------------------------------------
|
---------------------------------------------
|
||||||
@ -123,7 +123,7 @@ Why does Scrapy download pages in English instead of my native language?
|
|||||||
Try changing the default `Accept-Language`_ request header by overriding the
|
Try changing the default `Accept-Language`_ request header by overriding the
|
||||||
:setting:`DEFAULT_REQUEST_HEADERS` setting.
|
:setting:`DEFAULT_REQUEST_HEADERS` setting.
|
||||||
|
|
||||||
.. _Accept-Language: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4
|
.. _Accept-Language: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4
|
||||||
|
|
||||||
Where can I find some example Scrapy projects?
|
Where can I find some example Scrapy projects?
|
||||||
----------------------------------------------
|
----------------------------------------------
|
||||||
@ -282,7 +282,7 @@ I'm scraping a XML document and my XPath selector doesn't return any items
|
|||||||
|
|
||||||
You may need to remove namespaces. See :ref:`removing-namespaces`.
|
You may need to remove namespaces. See :ref:`removing-namespaces`.
|
||||||
|
|
||||||
.. _user agents: http://en.wikipedia.org/wiki/User_agent
|
.. _user agents: https://en.wikipedia.org/wiki/User_agent
|
||||||
.. _LIFO: http://en.wikipedia.org/wiki/LIFO
|
.. _LIFO: https://en.wikipedia.org/wiki/LIFO
|
||||||
.. _DFO order: http://en.wikipedia.org/wiki/Depth-first_search
|
.. _DFO order: https://en.wikipedia.org/wiki/Depth-first_search
|
||||||
.. _BFO order: http://en.wikipedia.org/wiki/Breadth-first_search
|
.. _BFO order: https://en.wikipedia.org/wiki/Breadth-first_search
|
||||||
|
@ -74,7 +74,7 @@ Windows
|
|||||||
Be sure you download the architecture (win32 or amd64) that matches your system
|
Be sure you download the architecture (win32 or amd64) that matches your system
|
||||||
|
|
||||||
* *(Only required for Python<2.7.9)* Install `pip`_ from
|
* *(Only required for Python<2.7.9)* Install `pip`_ from
|
||||||
https://pip.pypa.io/en/latest/installing.html
|
https://pip.pypa.io/en/latest/installing/
|
||||||
|
|
||||||
Now open a Command prompt to check ``pip`` is installed correctly::
|
Now open a Command prompt to check ``pip`` is installed correctly::
|
||||||
|
|
||||||
@ -171,9 +171,9 @@ After any of these workarounds you should be able to install Scrapy::
|
|||||||
pip install Scrapy
|
pip install Scrapy
|
||||||
|
|
||||||
.. _Python: https://www.python.org/
|
.. _Python: https://www.python.org/
|
||||||
.. _pip: https://pip.pypa.io/en/latest/installing.html
|
.. _pip: https://pip.pypa.io/en/latest/installing/
|
||||||
.. _easy_install: http://pypi.python.org/pypi/setuptools
|
.. _easy_install: https://pypi.python.org/pypi/setuptools
|
||||||
.. _Control Panel: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/sysdm_advancd_environmnt_addchange_variable.mspx
|
.. _Control Panel: https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/sysdm_advancd_environmnt_addchange_variable.mspx
|
||||||
.. _lxml: http://lxml.de/
|
.. _lxml: http://lxml.de/
|
||||||
.. _OpenSSL: https://pypi.python.org/pypi/pyOpenSSL
|
.. _OpenSSL: https://pypi.python.org/pypi/pyOpenSSL
|
||||||
.. _setuptools: https://pypi.python.org/pypi/setuptools
|
.. _setuptools: https://pypi.python.org/pypi/setuptools
|
||||||
|
@ -170,7 +170,7 @@ your code in Scrapy projects and `join the community`_. Thanks for your
|
|||||||
interest!
|
interest!
|
||||||
|
|
||||||
.. _join the community: http://scrapy.org/community/
|
.. _join the community: http://scrapy.org/community/
|
||||||
.. _web scraping: http://en.wikipedia.org/wiki/Web_scraping
|
.. _web scraping: https://en.wikipedia.org/wiki/Web_scraping
|
||||||
.. _Amazon Associates Web Services: http://aws.amazon.com/associates/
|
.. _Amazon Associates Web Services: https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
|
||||||
.. _Amazon S3: http://aws.amazon.com/s3/
|
.. _Amazon S3: https://aws.amazon.com/s3/
|
||||||
.. _Sitemaps: http://www.sitemaps.org
|
.. _Sitemaps: http://www.sitemaps.org
|
||||||
|
@ -7,7 +7,7 @@ Scrapy Tutorial
|
|||||||
In this tutorial, we'll assume that Scrapy is already installed on your system.
|
In this tutorial, we'll assume that Scrapy is already installed on your system.
|
||||||
If that's not the case, see :ref:`intro-install`.
|
If that's not the case, see :ref:`intro-install`.
|
||||||
|
|
||||||
We are going to use `Open directory project (dmoz) <http://www.dmoz.org/>`_ as
|
We are going to use `Open directory project (dmoz) <https://www.dmoz.org/>`_ as
|
||||||
our example domain to scrape.
|
our example domain to scrape.
|
||||||
|
|
||||||
This tutorial will walk you through these tasks:
|
This tutorial will walk you through these tasks:
|
||||||
@ -191,8 +191,8 @@ based on `XPath`_ or `CSS`_ expressions called :ref:`Scrapy Selectors
|
|||||||
<topics-selectors>`. For more information about selectors and other extraction
|
<topics-selectors>`. For more information about selectors and other extraction
|
||||||
mechanisms see the :ref:`Selectors documentation <topics-selectors>`.
|
mechanisms see the :ref:`Selectors documentation <topics-selectors>`.
|
||||||
|
|
||||||
.. _XPath: http://www.w3.org/TR/xpath
|
.. _XPath: https://www.w3.org/TR/xpath
|
||||||
.. _CSS: http://www.w3.org/TR/selectors
|
.. _CSS: https://www.w3.org/TR/selectors
|
||||||
|
|
||||||
Here are some examples of XPath expressions and their meanings:
|
Here are some examples of XPath expressions and their meanings:
|
||||||
|
|
||||||
@ -544,5 +544,5 @@ Then, we recommend you continue by playing with an example project (see
|
|||||||
:ref:`intro-examples`), and then continue with the section
|
:ref:`intro-examples`), and then continue with the section
|
||||||
:ref:`section-basics`.
|
:ref:`section-basics`.
|
||||||
|
|
||||||
.. _JSON: http://en.wikipedia.org/wiki/JSON
|
.. _JSON: https://en.wikipedia.org/wiki/JSON
|
||||||
.. _dirbot: https://github.com/scrapy/dirbot
|
.. _dirbot: https://github.com/scrapy/dirbot
|
||||||
|
@ -403,10 +403,11 @@ Outsourced packages
|
|||||||
| | :ref:`topics-deploy`) |
|
| | :ref:`topics-deploy`) |
|
||||||
+-------------------------------------+-------------------------------------+
|
+-------------------------------------+-------------------------------------+
|
||||||
| scrapy.contrib.djangoitem | `scrapy-djangoitem <https://github. |
|
| scrapy.contrib.djangoitem | `scrapy-djangoitem <https://github. |
|
||||||
| | com/scrapy/scrapy-djangoitem>`_ |
|
| | com/scrapy-plugins/scrapy-djangoite |
|
||||||
|
| | m>`_ |
|
||||||
+-------------------------------------+-------------------------------------+
|
+-------------------------------------+-------------------------------------+
|
||||||
| scrapy.webservice | `scrapy-jsonrpc <https://github.com |
|
| scrapy.webservice | `scrapy-jsonrpc <https://github.com |
|
||||||
| | /scrapy/scrapy-jsonrpc>`_ |
|
| | /scrapy-plugins/scrapy-jsonrpc>`_ |
|
||||||
+-------------------------------------+-------------------------------------+
|
+-------------------------------------+-------------------------------------+
|
||||||
|
|
||||||
`scrapy.contrib_exp` and `scrapy.contrib` dissolutions
|
`scrapy.contrib_exp` and `scrapy.contrib` dissolutions
|
||||||
@ -1186,7 +1187,7 @@ Scrapy changes:
|
|||||||
- nested items now fully supported in JSON and JSONLines exporters
|
- nested items now fully supported in JSON and JSONLines exporters
|
||||||
- added :reqmeta:`cookiejar` Request meta key to support multiple cookie sessions per spider
|
- added :reqmeta:`cookiejar` Request meta key to support multiple cookie sessions per spider
|
||||||
- decoupled encoding detection code to `w3lib.encoding`_, and ported Scrapy code to use that module
|
- decoupled encoding detection code to `w3lib.encoding`_, and ported Scrapy code to use that module
|
||||||
- dropped support for Python 2.5. See http://blog.scrapinghub.com/2012/02/27/scrapy-0-15-dropping-support-for-python-2-5/
|
- dropped support for Python 2.5. See https://blog.scrapinghub.com/2012/02/27/scrapy-0-15-dropping-support-for-python-2-5/
|
||||||
- dropped support for Twisted 2.5
|
- dropped support for Twisted 2.5
|
||||||
- added :setting:`REFERER_ENABLED` setting, to control referer middleware
|
- added :setting:`REFERER_ENABLED` setting, to control referer middleware
|
||||||
- changed default user agent to: ``Scrapy/VERSION (+http://scrapy.org)``
|
- changed default user agent to: ``Scrapy/VERSION (+http://scrapy.org)``
|
||||||
@ -1535,7 +1536,7 @@ First release of Scrapy.
|
|||||||
|
|
||||||
|
|
||||||
.. _AJAX crawleable urls: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?csw=1
|
.. _AJAX crawleable urls: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?csw=1
|
||||||
.. _chunked transfer encoding: http://en.wikipedia.org/wiki/Chunked_transfer_encoding
|
.. _chunked transfer encoding: https://en.wikipedia.org/wiki/Chunked_transfer_encoding
|
||||||
.. _w3lib: https://github.com/scrapy/w3lib
|
.. _w3lib: https://github.com/scrapy/w3lib
|
||||||
.. _scrapely: https://github.com/scrapy/scrapely
|
.. _scrapely: https://github.com/scrapy/scrapely
|
||||||
.. _marshal: https://docs.python.org/2/library/marshal.html
|
.. _marshal: https://docs.python.org/2/library/marshal.html
|
||||||
|
@ -271,4 +271,4 @@ class (which they all inherit from).
|
|||||||
Close the given spider. After this is called, no more specific stats
|
Close the given spider. After this is called, no more specific stats
|
||||||
can be accessed or collected.
|
can be accessed or collected.
|
||||||
|
|
||||||
.. _reactor: http://twistedmatrix.com/documents/current/core/howto/reactor-basics.html
|
.. _reactor: https://twistedmatrix.com/documents/current/core/howto/reactor-basics.html
|
||||||
|
@ -125,8 +125,8 @@ links:
|
|||||||
* `Twisted - hello, asynchronous programming`_
|
* `Twisted - hello, asynchronous programming`_
|
||||||
* `Twisted Introduction - Krondo`_
|
* `Twisted Introduction - Krondo`_
|
||||||
|
|
||||||
.. _Twisted: http://twistedmatrix.com/trac/
|
.. _Twisted: https://twistedmatrix.com/trac/
|
||||||
.. _Introduction to Deferreds in Twisted: http://twistedmatrix.com/documents/current/core/howto/defer-intro.html
|
.. _Introduction to Deferreds in Twisted: https://twistedmatrix.com/documents/current/core/howto/defer-intro.html
|
||||||
.. _Twisted - hello, asynchronous programming: http://jessenoller.com/2009/02/11/twisted-hello-asynchronous-programming/
|
.. _Twisted - hello, asynchronous programming: http://jessenoller.com/2009/02/11/twisted-hello-asynchronous-programming/
|
||||||
.. _Twisted Introduction - Krondo: http://krondo.com/blog/?page_id=1327/
|
.. _Twisted Introduction - Krondo: http://krondo.com/an-introduction-to-asynchronous-programming-and-twisted/
|
||||||
|
|
||||||
|
@ -10,4 +10,4 @@ DjangoItem has been moved into a separate project.
|
|||||||
|
|
||||||
It is hosted at:
|
It is hosted at:
|
||||||
|
|
||||||
https://github.com/scrapy/scrapy-djangoitem
|
https://github.com/scrapy-plugins/scrapy-djangoitem
|
||||||
|
@ -300,7 +300,7 @@ HttpAuthMiddleware
|
|||||||
|
|
||||||
# .. rest of the spider code omitted ...
|
# .. rest of the spider code omitted ...
|
||||||
|
|
||||||
.. _Basic access authentication: http://en.wikipedia.org/wiki/Basic_access_authentication
|
.. _Basic access authentication: https://en.wikipedia.org/wiki/Basic_access_authentication
|
||||||
|
|
||||||
|
|
||||||
HttpCacheMiddleware
|
HttpCacheMiddleware
|
||||||
@ -390,9 +390,9 @@ what is implemented:
|
|||||||
|
|
||||||
what is missing:
|
what is missing:
|
||||||
|
|
||||||
* `Pragma: no-cache` support http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1
|
* `Pragma: no-cache` support https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1
|
||||||
* `Vary` header support http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.6
|
* `Vary` header support https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.6
|
||||||
* Invalidation after updates or deletes http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.10
|
* Invalidation after updates or deletes https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.10
|
||||||
* ... probably others ..
|
* ... probably others ..
|
||||||
|
|
||||||
In order to use this policy, set:
|
In order to use this policy, set:
|
||||||
@ -464,7 +464,7 @@ In order to use this storage backend:
|
|||||||
* set :setting:`HTTPCACHE_STORAGE` to ``scrapy.extensions.httpcache.LeveldbCacheStorage``
|
* set :setting:`HTTPCACHE_STORAGE` to ``scrapy.extensions.httpcache.LeveldbCacheStorage``
|
||||||
* install `LevelDB python bindings`_ like ``pip install leveldb``
|
* install `LevelDB python bindings`_ like ``pip install leveldb``
|
||||||
|
|
||||||
.. _LevelDB: http://code.google.com/p/leveldb/
|
.. _LevelDB: https://github.com/google/leveldb
|
||||||
.. _leveldb python bindings: https://pypi.python.org/pypi/leveldb
|
.. _leveldb python bindings: https://pypi.python.org/pypi/leveldb
|
||||||
|
|
||||||
|
|
||||||
@ -964,6 +964,6 @@ Default: ``"latin-1"``
|
|||||||
The default encoding for proxy authentication on :class:`HttpProxyMiddleware`.
|
The default encoding for proxy authentication on :class:`HttpProxyMiddleware`.
|
||||||
|
|
||||||
|
|
||||||
.. _DBM: http://en.wikipedia.org/wiki/Dbm
|
.. _DBM: https://en.wikipedia.org/wiki/Dbm
|
||||||
.. _anydbm: https://docs.python.org/2/library/anydbm.html
|
.. _anydbm: https://docs.python.org/2/library/anydbm.html
|
||||||
.. _chunked transfer encoding: http://en.wikipedia.org/wiki/Chunked_transfer_encoding
|
.. _chunked transfer encoding: https://en.wikipedia.org/wiki/Chunked_transfer_encoding
|
||||||
|
@ -15,7 +15,7 @@ simple API for sending attachments and it's very easy to configure, with a few
|
|||||||
:ref:`settings <topics-email-settings>`.
|
:ref:`settings <topics-email-settings>`.
|
||||||
|
|
||||||
.. _smtplib: https://docs.python.org/2/library/smtplib.html
|
.. _smtplib: https://docs.python.org/2/library/smtplib.html
|
||||||
.. _Twisted non-blocking IO: http://twistedmatrix.com/documents/current/core/howto/defer-intro.html
|
.. _Twisted non-blocking IO: https://twistedmatrix.com/documents/current/core/howto/defer-intro.html
|
||||||
|
|
||||||
Quick example
|
Quick example
|
||||||
=============
|
=============
|
||||||
|
@ -21,7 +21,7 @@ avoid collision with existing (and future) extensions. For example, a
|
|||||||
hypothetic extension to handle `Google Sitemaps`_ would use settings like
|
hypothetic extension to handle `Google Sitemaps`_ would use settings like
|
||||||
`GOOGLESITEMAP_ENABLED`, `GOOGLESITEMAP_DEPTH`, and so on.
|
`GOOGLESITEMAP_ENABLED`, `GOOGLESITEMAP_DEPTH`, and so on.
|
||||||
|
|
||||||
.. _Google Sitemaps: http://en.wikipedia.org/wiki/Sitemaps
|
.. _Google Sitemaps: https://en.wikipedia.org/wiki/Sitemaps
|
||||||
|
|
||||||
Loading & activating extensions
|
Loading & activating extensions
|
||||||
===============================
|
===============================
|
||||||
@ -355,8 +355,8 @@ There are at least two ways to send Scrapy the `SIGQUIT`_ signal:
|
|||||||
|
|
||||||
kill -QUIT <pid>
|
kill -QUIT <pid>
|
||||||
|
|
||||||
.. _SIGUSR2: http://en.wikipedia.org/wiki/SIGUSR1_and_SIGUSR2
|
.. _SIGUSR2: https://en.wikipedia.org/wiki/SIGUSR1_and_SIGUSR2
|
||||||
.. _SIGQUIT: http://en.wikipedia.org/wiki/SIGQUIT
|
.. _SIGQUIT: https://en.wikipedia.org/wiki/SIGQUIT
|
||||||
|
|
||||||
Debugger extension
|
Debugger extension
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
@ -330,7 +330,7 @@ format in :setting:`FEED_EXPORTERS`. E.g., to disable the built-in CSV exporter
|
|||||||
'csv': None,
|
'csv': None,
|
||||||
}
|
}
|
||||||
|
|
||||||
.. _URI: http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
|
.. _URI: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
|
||||||
.. _Amazon S3: http://aws.amazon.com/s3/
|
.. _Amazon S3: https://aws.amazon.com/s3/
|
||||||
.. _boto: https://github.com/boto/boto
|
.. _boto: https://github.com/boto/boto
|
||||||
.. _botocore: https://github.com/boto/botocore
|
.. _botocore: https://github.com/boto/botocore
|
||||||
|
@ -164,4 +164,4 @@ elements.
|
|||||||
or tags which Therefer in page HTML
|
or tags which Therefer in page HTML
|
||||||
sources may on Firebug inspects the live DOM
|
sources may on Firebug inspects the live DOM
|
||||||
|
|
||||||
.. _has been shut down by Google: http://searchenginewatch.com/sew/news/2096661/google-directory-shut
|
.. _has been shut down by Google: https://searchenginewatch.com/sew/news/2096661/google-directory-shut
|
||||||
|
@ -160,8 +160,8 @@ method and how to clean up the resources properly.
|
|||||||
self.db[self.collection_name].insert(dict(item))
|
self.db[self.collection_name].insert(dict(item))
|
||||||
return item
|
return item
|
||||||
|
|
||||||
.. _MongoDB: http://www.mongodb.org/
|
.. _MongoDB: https://www.mongodb.org/
|
||||||
.. _pymongo: http://api.mongodb.org/python/current/
|
.. _pymongo: https://api.mongodb.org/python/current/
|
||||||
|
|
||||||
Duplicates filter
|
Duplicates filter
|
||||||
-----------------
|
-----------------
|
||||||
|
@ -143,7 +143,7 @@ Supported Storage
|
|||||||
File system is currently the only officially supported storage, but there is
|
File system is currently the only officially supported storage, but there is
|
||||||
also (undocumented) support for storing files in `Amazon S3`_.
|
also (undocumented) support for storing files in `Amazon S3`_.
|
||||||
|
|
||||||
.. _Amazon S3: http://aws.amazon.com/s3/
|
.. _Amazon S3: https://aws.amazon.com/s3/
|
||||||
|
|
||||||
File system storage
|
File system storage
|
||||||
-------------------
|
-------------------
|
||||||
@ -223,7 +223,7 @@ Where:
|
|||||||
|
|
||||||
* ``<image_id>`` is the `SHA1 hash`_ of the image url
|
* ``<image_id>`` is the `SHA1 hash`_ of the image url
|
||||||
|
|
||||||
.. _SHA1 hash: http://en.wikipedia.org/wiki/SHA_hash_functions
|
.. _SHA1 hash: https://en.wikipedia.org/wiki/SHA_hash_functions
|
||||||
|
|
||||||
Example of image files stored using ``small`` and ``big`` thumbnail names::
|
Example of image files stored using ``small`` and ``big`` thumbnail names::
|
||||||
|
|
||||||
@ -390,5 +390,5 @@ above::
|
|||||||
item['image_paths'] = image_paths
|
item['image_paths'] = image_paths
|
||||||
return item
|
return item
|
||||||
|
|
||||||
.. _Twisted Failure: http://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html
|
.. _Twisted Failure: https://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html
|
||||||
.. _MD5 hash: http://en.wikipedia.org/wiki/MD5
|
.. _MD5 hash: https://en.wikipedia.org/wiki/MD5
|
||||||
|
@ -251,5 +251,5 @@ If you are still unable to prevent your bot getting banned, consider contacting
|
|||||||
.. _ProxyMesh: http://proxymesh.com/
|
.. _ProxyMesh: http://proxymesh.com/
|
||||||
.. _Google cache: http://www.googleguide.com/cached_pages.html
|
.. _Google cache: http://www.googleguide.com/cached_pages.html
|
||||||
.. _testspiders: https://github.com/scrapinghub/testspiders
|
.. _testspiders: https://github.com/scrapinghub/testspiders
|
||||||
.. _Twisted Reactor Overview: http://twistedmatrix.com/documents/current/core/howto/reactor-basics.html
|
.. _Twisted Reactor Overview: https://twistedmatrix.com/documents/current/core/howto/reactor-basics.html
|
||||||
.. _Crawlera: http://scrapinghub.com/crawlera
|
.. _Crawlera: http://scrapinghub.com/crawlera
|
||||||
|
@ -621,4 +621,4 @@ XmlResponse objects
|
|||||||
adds encoding auto-discovering support by looking into the XML declaration
|
adds encoding auto-discovering support by looking into the XML declaration
|
||||||
line. See :attr:`TextResponse.encoding`.
|
line. See :attr:`TextResponse.encoding`.
|
||||||
|
|
||||||
.. _Twisted Failure: http://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html
|
.. _Twisted Failure: https://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html
|
||||||
|
@ -40,8 +40,8 @@ For a complete reference of the selectors API see
|
|||||||
.. _lxml: http://lxml.de/
|
.. _lxml: http://lxml.de/
|
||||||
.. _ElementTree: https://docs.python.org/2/library/xml.etree.elementtree.html
|
.. _ElementTree: https://docs.python.org/2/library/xml.etree.elementtree.html
|
||||||
.. _cssselect: https://pypi.python.org/pypi/cssselect/
|
.. _cssselect: https://pypi.python.org/pypi/cssselect/
|
||||||
.. _XPath: http://www.w3.org/TR/xpath
|
.. _XPath: https://www.w3.org/TR/xpath
|
||||||
.. _CSS: http://www.w3.org/TR/selectors
|
.. _CSS: https://www.w3.org/TR/selectors
|
||||||
|
|
||||||
|
|
||||||
Using selectors
|
Using selectors
|
||||||
@ -281,7 +281,7 @@ Another common case would be to extract all direct ``<p>`` children::
|
|||||||
For more details about relative XPaths see the `Location Paths`_ section in the
|
For more details about relative XPaths see the `Location Paths`_ section in the
|
||||||
XPath specification.
|
XPath specification.
|
||||||
|
|
||||||
.. _Location Paths: http://www.w3.org/TR/xpath#location-paths
|
.. _Location Paths: https://www.w3.org/TR/xpath#location-paths
|
||||||
|
|
||||||
Using EXSLT extensions
|
Using EXSLT extensions
|
||||||
----------------------
|
----------------------
|
||||||
@ -439,7 +439,7 @@ you may want to take a look first at this `XPath tutorial`_.
|
|||||||
|
|
||||||
|
|
||||||
.. _`XPath tutorial`: http://www.zvon.org/comp/r/tut-XPath_1.html
|
.. _`XPath tutorial`: http://www.zvon.org/comp/r/tut-XPath_1.html
|
||||||
.. _`this post from ScrapingHub's blog`: http://blog.scrapinghub.com/2014/07/17/xpath-tips-from-the-web-scraping-trenches/
|
.. _`this post from ScrapingHub's blog`: https://blog.scrapinghub.com/2014/07/17/xpath-tips-from-the-web-scraping-trenches/
|
||||||
|
|
||||||
|
|
||||||
Using text nodes in a condition
|
Using text nodes in a condition
|
||||||
@ -481,7 +481,7 @@ But using the ``.`` to mean the node, works::
|
|||||||
>>> sel.xpath("//a[contains(., 'Next Page')]").extract()
|
>>> sel.xpath("//a[contains(., 'Next Page')]").extract()
|
||||||
[u'<a href="#">Click here to go to the <strong>Next Page</strong></a>']
|
[u'<a href="#">Click here to go to the <strong>Next Page</strong></a>']
|
||||||
|
|
||||||
.. _`XPath string function`: http://www.w3.org/TR/xpath/#section-String-Functions
|
.. _`XPath string function`: https://www.w3.org/TR/xpath/#section-String-Functions
|
||||||
|
|
||||||
Beware of the difference between //node[1] and (//node)[1]
|
Beware of the difference between //node[1] and (//node)[1]
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
@ -1202,6 +1202,6 @@ case to see how to enable and use them.
|
|||||||
.. settingslist::
|
.. settingslist::
|
||||||
|
|
||||||
|
|
||||||
.. _Amazon web services: http://aws.amazon.com/
|
.. _Amazon web services: https://aws.amazon.com/
|
||||||
.. _breadth-first order: http://en.wikipedia.org/wiki/Breadth-first_search
|
.. _breadth-first order: https://en.wikipedia.org/wiki/Breadth-first_search
|
||||||
.. _depth-first order: http://en.wikipedia.org/wiki/Depth-first_search
|
.. _depth-first order: https://en.wikipedia.org/wiki/Depth-first_search
|
||||||
|
@ -138,7 +138,7 @@ Example of shell session
|
|||||||
========================
|
========================
|
||||||
|
|
||||||
Here's an example of a typical shell session where we start by scraping the
|
Here's an example of a typical shell session where we start by scraping the
|
||||||
http://scrapy.org page, and then proceed to scrape the http://reddit.com
|
http://scrapy.org page, and then proceed to scrape the https://reddit.com
|
||||||
page. Finally, we modify the (Reddit) request method to POST and re-fetch it
|
page. Finally, we modify the (Reddit) request method to POST and re-fetch it
|
||||||
getting an error. We end the session by typing Ctrl-D (in Unix systems) or
|
getting an error. We end the session by typing Ctrl-D (in Unix systems) or
|
||||||
Ctrl-Z in Windows.
|
Ctrl-Z in Windows.
|
||||||
|
@ -22,7 +22,7 @@ Deferred signal handlers
|
|||||||
Some signals support returning `Twisted deferreds`_ from their handlers, see
|
Some signals support returning `Twisted deferreds`_ from their handlers, see
|
||||||
the :ref:`topics-signals-ref` below to know which ones.
|
the :ref:`topics-signals-ref` below to know which ones.
|
||||||
|
|
||||||
.. _Twisted deferreds: http://twistedmatrix.com/documents/current/core/howto/defer.html
|
.. _Twisted deferreds: https://twistedmatrix.com/documents/current/core/howto/defer.html
|
||||||
|
|
||||||
.. _topics-signals-ref:
|
.. _topics-signals-ref:
|
||||||
|
|
||||||
@ -258,4 +258,4 @@ response_downloaded
|
|||||||
:param spider: the spider for which the response is intended
|
:param spider: the spider for which the response is intended
|
||||||
:type spider: :class:`~scrapy.spiders.Spider` object
|
:type spider: :class:`~scrapy.spiders.Spider` object
|
||||||
|
|
||||||
.. _Failure: http://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html
|
.. _Failure: https://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html
|
||||||
|
@ -211,7 +211,7 @@ HttpErrorMiddleware
|
|||||||
According to the `HTTP standard`_, successful responses are those whose
|
According to the `HTTP standard`_, successful responses are those whose
|
||||||
status codes are in the 200-300 range.
|
status codes are in the 200-300 range.
|
||||||
|
|
||||||
.. _HTTP standard: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
|
.. _HTTP standard: https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
|
||||||
|
|
||||||
If you still want to process response codes outside that range, you can
|
If you still want to process response codes outside that range, you can
|
||||||
specify which response codes the spider is able to handle using the
|
specify which response codes the spider is able to handle using the
|
||||||
@ -238,7 +238,7 @@ responses, unless you really know what you're doing.
|
|||||||
|
|
||||||
For more information see: `HTTP Status Code Definitions`_.
|
For more information see: `HTTP Status Code Definitions`_.
|
||||||
|
|
||||||
.. _HTTP Status Code Definitions: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
|
.. _HTTP Status Code Definitions: https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
|
||||||
|
|
||||||
HttpErrorMiddleware settings
|
HttpErrorMiddleware settings
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
@ -735,5 +735,5 @@ Combine SitemapSpider with other sources of urls::
|
|||||||
.. _Sitemaps: http://www.sitemaps.org
|
.. _Sitemaps: http://www.sitemaps.org
|
||||||
.. _Sitemap index files: http://www.sitemaps.org/protocol.html#index
|
.. _Sitemap index files: http://www.sitemaps.org/protocol.html#index
|
||||||
.. _robots.txt: http://www.robotstxt.org/
|
.. _robots.txt: http://www.robotstxt.org/
|
||||||
.. _TLD: http://en.wikipedia.org/wiki/Top-level_domain
|
.. _TLD: https://en.wikipedia.org/wiki/Top-level_domain
|
||||||
.. _Scrapyd documentation: http://scrapyd.readthedocs.org/en/latest/
|
.. _Scrapyd documentation: http://scrapyd.readthedocs.org/en/latest/
|
||||||
|
@ -8,4 +8,4 @@ webservice has been moved into a separate project.
|
|||||||
|
|
||||||
It is hosted at:
|
It is hosted at:
|
||||||
|
|
||||||
https://github.com/scrapy/scrapy-jsonrpc
|
https://github.com/scrapy-plugins/scrapy-jsonrpc
|
||||||
|
@ -36,5 +36,5 @@ new methods or functionality but the existing methods should keep working the
|
|||||||
same way.
|
same way.
|
||||||
|
|
||||||
|
|
||||||
.. _odd-numbered versions for development releases: http://en.wikipedia.org/wiki/Software_versioning#Odd-numbered_versions_for_development_releases
|
.. _odd-numbered versions for development releases: https://en.wikipedia.org/wiki/Software_versioning#Odd-numbered_versions_for_development_releases
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user