1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-22 18:03:51 +00:00

Update documentation links

This commit is contained in:
nyov 2016-03-02 01:13:02 +00:00
parent 9f4fe5dc4a
commit 5876b9aa30
26 changed files with 68 additions and 67 deletions

View File

@ -120,7 +120,7 @@ Scrapy Contrib
==============
Scrapy contrib shares a similar rationale as Django contrib, which is explained
in `this post <http://jacobian.org/writing/what-is-django-contrib/>`_. If you
in `this post <https://jacobian.org/writing/what-is-django-contrib/>`_. If you
are working on a new functionality, please follow that rationale to decide
whether it should be a Scrapy contrib. If unsure, you can ask in
`scrapy-users`_.
@ -189,7 +189,7 @@ And their unit-tests are in::
.. _issue tracker: https://github.com/scrapy/scrapy/issues
.. _scrapy-users: https://groups.google.com/forum/#!forum/scrapy-users
.. _Twisted unit-testing framework: http://twistedmatrix.com/documents/current/core/development/policy/test-standard.html
.. _Twisted unit-testing framework: https://twistedmatrix.com/documents/current/core/development/policy/test-standard.html
.. _AUTHORS: https://github.com/scrapy/scrapy/blob/master/AUTHORS
.. _tests/: https://github.com/scrapy/scrapy/tree/master/tests
.. _open issues: https://github.com/scrapy/scrapy/issues

View File

@ -77,8 +77,8 @@ Scrapy crashes with: ImportError: No module named win32api
You need to install `pywin32`_ because of `this Twisted bug`_.
.. _pywin32: http://sourceforge.net/projects/pywin32/
.. _this Twisted bug: http://twistedmatrix.com/trac/ticket/3707
.. _pywin32: https://sourceforge.net/projects/pywin32/
.. _this Twisted bug: https://twistedmatrix.com/trac/ticket/3707
How can I simulate a user login in my spider?
---------------------------------------------
@ -123,7 +123,7 @@ Why does Scrapy download pages in English instead of my native language?
Try changing the default `Accept-Language`_ request header by overriding the
:setting:`DEFAULT_REQUEST_HEADERS` setting.
.. _Accept-Language: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4
.. _Accept-Language: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4
Where can I find some example Scrapy projects?
----------------------------------------------
@ -282,7 +282,7 @@ I'm scraping a XML document and my XPath selector doesn't return any items
You may need to remove namespaces. See :ref:`removing-namespaces`.
.. _user agents: http://en.wikipedia.org/wiki/User_agent
.. _LIFO: http://en.wikipedia.org/wiki/LIFO
.. _DFO order: http://en.wikipedia.org/wiki/Depth-first_search
.. _BFO order: http://en.wikipedia.org/wiki/Breadth-first_search
.. _user agents: https://en.wikipedia.org/wiki/User_agent
.. _LIFO: https://en.wikipedia.org/wiki/LIFO
.. _DFO order: https://en.wikipedia.org/wiki/Depth-first_search
.. _BFO order: https://en.wikipedia.org/wiki/Breadth-first_search

View File

@ -74,7 +74,7 @@ Windows
Be sure you download the architecture (win32 or amd64) that matches your system
* *(Only required for Python<2.7.9)* Install `pip`_ from
https://pip.pypa.io/en/latest/installing.html
https://pip.pypa.io/en/latest/installing/
Now open a Command prompt to check ``pip`` is installed correctly::
@ -171,9 +171,9 @@ After any of these workarounds you should be able to install Scrapy::
pip install Scrapy
.. _Python: https://www.python.org/
.. _pip: https://pip.pypa.io/en/latest/installing.html
.. _easy_install: http://pypi.python.org/pypi/setuptools
.. _Control Panel: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/sysdm_advancd_environmnt_addchange_variable.mspx
.. _pip: https://pip.pypa.io/en/latest/installing/
.. _easy_install: https://pypi.python.org/pypi/setuptools
.. _Control Panel: https://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/sysdm_advancd_environmnt_addchange_variable.mspx
.. _lxml: http://lxml.de/
.. _OpenSSL: https://pypi.python.org/pypi/pyOpenSSL
.. _setuptools: https://pypi.python.org/pypi/setuptools

View File

@ -170,7 +170,7 @@ your code in Scrapy projects and `join the community`_. Thanks for your
interest!
.. _join the community: http://scrapy.org/community/
.. _web scraping: http://en.wikipedia.org/wiki/Web_scraping
.. _Amazon Associates Web Services: http://aws.amazon.com/associates/
.. _Amazon S3: http://aws.amazon.com/s3/
.. _web scraping: https://en.wikipedia.org/wiki/Web_scraping
.. _Amazon Associates Web Services: https://affiliate-program.amazon.com/gp/advertising/api/detail/main.html
.. _Amazon S3: https://aws.amazon.com/s3/
.. _Sitemaps: http://www.sitemaps.org

View File

@ -7,7 +7,7 @@ Scrapy Tutorial
In this tutorial, we'll assume that Scrapy is already installed on your system.
If that's not the case, see :ref:`intro-install`.
We are going to use `Open directory project (dmoz) <http://www.dmoz.org/>`_ as
We are going to use `Open directory project (dmoz) <https://www.dmoz.org/>`_ as
our example domain to scrape.
This tutorial will walk you through these tasks:
@ -191,8 +191,8 @@ based on `XPath`_ or `CSS`_ expressions called :ref:`Scrapy Selectors
<topics-selectors>`. For more information about selectors and other extraction
mechanisms see the :ref:`Selectors documentation <topics-selectors>`.
.. _XPath: http://www.w3.org/TR/xpath
.. _CSS: http://www.w3.org/TR/selectors
.. _XPath: https://www.w3.org/TR/xpath
.. _CSS: https://www.w3.org/TR/selectors
Here are some examples of XPath expressions and their meanings:
@ -544,5 +544,5 @@ Then, we recommend you continue by playing with an example project (see
:ref:`intro-examples`), and then continue with the section
:ref:`section-basics`.
.. _JSON: http://en.wikipedia.org/wiki/JSON
.. _JSON: https://en.wikipedia.org/wiki/JSON
.. _dirbot: https://github.com/scrapy/dirbot

View File

@ -403,10 +403,11 @@ Outsourced packages
| | :ref:`topics-deploy`) |
+-------------------------------------+-------------------------------------+
| scrapy.contrib.djangoitem | `scrapy-djangoitem <https://github. |
| | com/scrapy/scrapy-djangoitem>`_ |
| | com/scrapy-plugins/scrapy-djangoite |
| | m>`_ |
+-------------------------------------+-------------------------------------+
| scrapy.webservice | `scrapy-jsonrpc <https://github.com |
| | /scrapy/scrapy-jsonrpc>`_ |
| | /scrapy-plugins/scrapy-jsonrpc>`_ |
+-------------------------------------+-------------------------------------+
`scrapy.contrib_exp` and `scrapy.contrib` dissolutions
@ -1186,7 +1187,7 @@ Scrapy changes:
- nested items now fully supported in JSON and JSONLines exporters
- added :reqmeta:`cookiejar` Request meta key to support multiple cookie sessions per spider
- decoupled encoding detection code to `w3lib.encoding`_, and ported Scrapy code to use that module
- dropped support for Python 2.5. See http://blog.scrapinghub.com/2012/02/27/scrapy-0-15-dropping-support-for-python-2-5/
- dropped support for Python 2.5. See https://blog.scrapinghub.com/2012/02/27/scrapy-0-15-dropping-support-for-python-2-5/
- dropped support for Twisted 2.5
- added :setting:`REFERER_ENABLED` setting, to control referer middleware
- changed default user agent to: ``Scrapy/VERSION (+http://scrapy.org)``
@ -1535,7 +1536,7 @@ First release of Scrapy.
.. _AJAX crawleable urls: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?csw=1
.. _chunked transfer encoding: http://en.wikipedia.org/wiki/Chunked_transfer_encoding
.. _chunked transfer encoding: https://en.wikipedia.org/wiki/Chunked_transfer_encoding
.. _w3lib: https://github.com/scrapy/w3lib
.. _scrapely: https://github.com/scrapy/scrapely
.. _marshal: https://docs.python.org/2/library/marshal.html

View File

@ -271,4 +271,4 @@ class (which they all inherit from).
Close the given spider. After this is called, no more specific stats
can be accessed or collected.
.. _reactor: http://twistedmatrix.com/documents/current/core/howto/reactor-basics.html
.. _reactor: https://twistedmatrix.com/documents/current/core/howto/reactor-basics.html

View File

@ -125,8 +125,8 @@ links:
* `Twisted - hello, asynchronous programming`_
* `Twisted Introduction - Krondo`_
.. _Twisted: http://twistedmatrix.com/trac/
.. _Introduction to Deferreds in Twisted: http://twistedmatrix.com/documents/current/core/howto/defer-intro.html
.. _Twisted: https://twistedmatrix.com/trac/
.. _Introduction to Deferreds in Twisted: https://twistedmatrix.com/documents/current/core/howto/defer-intro.html
.. _Twisted - hello, asynchronous programming: http://jessenoller.com/2009/02/11/twisted-hello-asynchronous-programming/
.. _Twisted Introduction - Krondo: http://krondo.com/blog/?page_id=1327/
.. _Twisted Introduction - Krondo: http://krondo.com/an-introduction-to-asynchronous-programming-and-twisted/

View File

@ -10,4 +10,4 @@ DjangoItem has been moved into a separate project.
It is hosted at:
https://github.com/scrapy/scrapy-djangoitem
https://github.com/scrapy-plugins/scrapy-djangoitem

View File

@ -300,7 +300,7 @@ HttpAuthMiddleware
# .. rest of the spider code omitted ...
.. _Basic access authentication: http://en.wikipedia.org/wiki/Basic_access_authentication
.. _Basic access authentication: https://en.wikipedia.org/wiki/Basic_access_authentication
HttpCacheMiddleware
@ -390,9 +390,9 @@ what is implemented:
what is missing:
* `Pragma: no-cache` support http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1
* `Vary` header support http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.6
* Invalidation after updates or deletes http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.10
* `Pragma: no-cache` support https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1
* `Vary` header support https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.6
* Invalidation after updates or deletes https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.10
* ... probably others ..
In order to use this policy, set:
@ -464,7 +464,7 @@ In order to use this storage backend:
* set :setting:`HTTPCACHE_STORAGE` to ``scrapy.extensions.httpcache.LeveldbCacheStorage``
* install `LevelDB python bindings`_ like ``pip install leveldb``
.. _LevelDB: http://code.google.com/p/leveldb/
.. _LevelDB: https://github.com/google/leveldb
.. _leveldb python bindings: https://pypi.python.org/pypi/leveldb
@ -964,6 +964,6 @@ Default: ``"latin-1"``
The default encoding for proxy authentication on :class:`HttpProxyMiddleware`.
.. _DBM: http://en.wikipedia.org/wiki/Dbm
.. _DBM: https://en.wikipedia.org/wiki/Dbm
.. _anydbm: https://docs.python.org/2/library/anydbm.html
.. _chunked transfer encoding: http://en.wikipedia.org/wiki/Chunked_transfer_encoding
.. _chunked transfer encoding: https://en.wikipedia.org/wiki/Chunked_transfer_encoding

View File

@ -15,7 +15,7 @@ simple API for sending attachments and it's very easy to configure, with a few
:ref:`settings <topics-email-settings>`.
.. _smtplib: https://docs.python.org/2/library/smtplib.html
.. _Twisted non-blocking IO: http://twistedmatrix.com/documents/current/core/howto/defer-intro.html
.. _Twisted non-blocking IO: https://twistedmatrix.com/documents/current/core/howto/defer-intro.html
Quick example
=============

View File

@ -21,7 +21,7 @@ avoid collision with existing (and future) extensions. For example, a
hypothetic extension to handle `Google Sitemaps`_ would use settings like
`GOOGLESITEMAP_ENABLED`, `GOOGLESITEMAP_DEPTH`, and so on.
.. _Google Sitemaps: http://en.wikipedia.org/wiki/Sitemaps
.. _Google Sitemaps: https://en.wikipedia.org/wiki/Sitemaps
Loading & activating extensions
===============================
@ -355,8 +355,8 @@ There are at least two ways to send Scrapy the `SIGQUIT`_ signal:
kill -QUIT <pid>
.. _SIGUSR2: http://en.wikipedia.org/wiki/SIGUSR1_and_SIGUSR2
.. _SIGQUIT: http://en.wikipedia.org/wiki/SIGQUIT
.. _SIGUSR2: https://en.wikipedia.org/wiki/SIGUSR1_and_SIGUSR2
.. _SIGQUIT: https://en.wikipedia.org/wiki/SIGQUIT
Debugger extension
~~~~~~~~~~~~~~~~~~

View File

@ -330,7 +330,7 @@ format in :setting:`FEED_EXPORTERS`. E.g., to disable the built-in CSV exporter
'csv': None,
}
.. _URI: http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
.. _Amazon S3: http://aws.amazon.com/s3/
.. _URI: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier
.. _Amazon S3: https://aws.amazon.com/s3/
.. _boto: https://github.com/boto/boto
.. _botocore: https://github.com/boto/botocore

View File

@ -164,4 +164,4 @@ elements.
or tags which Therefer in page HTML
sources may on Firebug inspects the live DOM
.. _has been shut down by Google: http://searchenginewatch.com/sew/news/2096661/google-directory-shut
.. _has been shut down by Google: https://searchenginewatch.com/sew/news/2096661/google-directory-shut

View File

@ -160,8 +160,8 @@ method and how to clean up the resources properly.
self.db[self.collection_name].insert(dict(item))
return item
.. _MongoDB: http://www.mongodb.org/
.. _pymongo: http://api.mongodb.org/python/current/
.. _MongoDB: https://www.mongodb.org/
.. _pymongo: https://api.mongodb.org/python/current/
Duplicates filter
-----------------

View File

@ -143,7 +143,7 @@ Supported Storage
File system is currently the only officially supported storage, but there is
also (undocumented) support for storing files in `Amazon S3`_.
.. _Amazon S3: http://aws.amazon.com/s3/
.. _Amazon S3: https://aws.amazon.com/s3/
File system storage
-------------------
@ -223,7 +223,7 @@ Where:
* ``<image_id>`` is the `SHA1 hash`_ of the image url
.. _SHA1 hash: http://en.wikipedia.org/wiki/SHA_hash_functions
.. _SHA1 hash: https://en.wikipedia.org/wiki/SHA_hash_functions
Example of image files stored using ``small`` and ``big`` thumbnail names::
@ -390,5 +390,5 @@ above::
item['image_paths'] = image_paths
return item
.. _Twisted Failure: http://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html
.. _MD5 hash: http://en.wikipedia.org/wiki/MD5
.. _Twisted Failure: https://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html
.. _MD5 hash: https://en.wikipedia.org/wiki/MD5

View File

@ -251,5 +251,5 @@ If you are still unable to prevent your bot getting banned, consider contacting
.. _ProxyMesh: http://proxymesh.com/
.. _Google cache: http://www.googleguide.com/cached_pages.html
.. _testspiders: https://github.com/scrapinghub/testspiders
.. _Twisted Reactor Overview: http://twistedmatrix.com/documents/current/core/howto/reactor-basics.html
.. _Twisted Reactor Overview: https://twistedmatrix.com/documents/current/core/howto/reactor-basics.html
.. _Crawlera: http://scrapinghub.com/crawlera

View File

@ -621,4 +621,4 @@ XmlResponse objects
adds encoding auto-discovering support by looking into the XML declaration
line. See :attr:`TextResponse.encoding`.
.. _Twisted Failure: http://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html
.. _Twisted Failure: https://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html

View File

@ -40,8 +40,8 @@ For a complete reference of the selectors API see
.. _lxml: http://lxml.de/
.. _ElementTree: https://docs.python.org/2/library/xml.etree.elementtree.html
.. _cssselect: https://pypi.python.org/pypi/cssselect/
.. _XPath: http://www.w3.org/TR/xpath
.. _CSS: http://www.w3.org/TR/selectors
.. _XPath: https://www.w3.org/TR/xpath
.. _CSS: https://www.w3.org/TR/selectors
Using selectors
@ -281,7 +281,7 @@ Another common case would be to extract all direct ``<p>`` children::
For more details about relative XPaths see the `Location Paths`_ section in the
XPath specification.
.. _Location Paths: http://www.w3.org/TR/xpath#location-paths
.. _Location Paths: https://www.w3.org/TR/xpath#location-paths
Using EXSLT extensions
----------------------
@ -439,7 +439,7 @@ you may want to take a look first at this `XPath tutorial`_.
.. _`XPath tutorial`: http://www.zvon.org/comp/r/tut-XPath_1.html
.. _`this post from ScrapingHub's blog`: http://blog.scrapinghub.com/2014/07/17/xpath-tips-from-the-web-scraping-trenches/
.. _`this post from ScrapingHub's blog`: https://blog.scrapinghub.com/2014/07/17/xpath-tips-from-the-web-scraping-trenches/
Using text nodes in a condition
@ -481,7 +481,7 @@ But using the ``.`` to mean the node, works::
>>> sel.xpath("//a[contains(., 'Next Page')]").extract()
[u'<a href="#">Click here to go to the <strong>Next Page</strong></a>']
.. _`XPath string function`: http://www.w3.org/TR/xpath/#section-String-Functions
.. _`XPath string function`: https://www.w3.org/TR/xpath/#section-String-Functions
Beware of the difference between //node[1] and (//node)[1]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -1202,6 +1202,6 @@ case to see how to enable and use them.
.. settingslist::
.. _Amazon web services: http://aws.amazon.com/
.. _breadth-first order: http://en.wikipedia.org/wiki/Breadth-first_search
.. _depth-first order: http://en.wikipedia.org/wiki/Depth-first_search
.. _Amazon web services: https://aws.amazon.com/
.. _breadth-first order: https://en.wikipedia.org/wiki/Breadth-first_search
.. _depth-first order: https://en.wikipedia.org/wiki/Depth-first_search

View File

@ -138,7 +138,7 @@ Example of shell session
========================
Here's an example of a typical shell session where we start by scraping the
http://scrapy.org page, and then proceed to scrape the http://reddit.com
http://scrapy.org page, and then proceed to scrape the https://reddit.com
page. Finally, we modify the (Reddit) request method to POST and re-fetch it
getting an error. We end the session by typing Ctrl-D (in Unix systems) or
Ctrl-Z in Windows.

View File

@ -22,7 +22,7 @@ Deferred signal handlers
Some signals support returning `Twisted deferreds`_ from their handlers, see
the :ref:`topics-signals-ref` below to know which ones.
.. _Twisted deferreds: http://twistedmatrix.com/documents/current/core/howto/defer.html
.. _Twisted deferreds: https://twistedmatrix.com/documents/current/core/howto/defer.html
.. _topics-signals-ref:
@ -258,4 +258,4 @@ response_downloaded
:param spider: the spider for which the response is intended
:type spider: :class:`~scrapy.spiders.Spider` object
.. _Failure: http://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html
.. _Failure: https://twistedmatrix.com/documents/current/api/twisted.python.failure.Failure.html

View File

@ -211,7 +211,7 @@ HttpErrorMiddleware
According to the `HTTP standard`_, successful responses are those whose
status codes are in the 200-300 range.
.. _HTTP standard: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
.. _HTTP standard: https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
If you still want to process response codes outside that range, you can
specify which response codes the spider is able to handle using the
@ -238,7 +238,7 @@ responses, unless you really know what you're doing.
For more information see: `HTTP Status Code Definitions`_.
.. _HTTP Status Code Definitions: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
.. _HTTP Status Code Definitions: https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
HttpErrorMiddleware settings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@ -735,5 +735,5 @@ Combine SitemapSpider with other sources of urls::
.. _Sitemaps: http://www.sitemaps.org
.. _Sitemap index files: http://www.sitemaps.org/protocol.html#index
.. _robots.txt: http://www.robotstxt.org/
.. _TLD: http://en.wikipedia.org/wiki/Top-level_domain
.. _TLD: https://en.wikipedia.org/wiki/Top-level_domain
.. _Scrapyd documentation: http://scrapyd.readthedocs.org/en/latest/

View File

@ -8,4 +8,4 @@ webservice has been moved into a separate project.
It is hosted at:
https://github.com/scrapy/scrapy-jsonrpc
https://github.com/scrapy-plugins/scrapy-jsonrpc

View File

@ -36,5 +36,5 @@ new methods or functionality but the existing methods should keep working the
same way.
.. _odd-numbered versions for development releases: http://en.wikipedia.org/wiki/Software_versioning#Odd-numbered_versions_for_development_releases
.. _odd-numbered versions for development releases: https://en.wikipedia.org/wiki/Software_versioning#Odd-numbered_versions_for_development_releases