Merge pull request #687 from Curita/docs-linkcheck

Docs build and linkcheck in tox env
2025-02-23 16:03:56 +00:00 · 2014-04-11 15:36:00 -04:00 · 2014-04-11 15:36:00 -04:00 · 21bff7b3b3
commit 21bff7b3b3
parent 96ef4c603c 80081054a2
10 changed files with 39 additions and 18 deletions
--- a/.travis.yml
+++ b/.travis.yml
@ -6,10 +6,12 @@ env:
 - TOXENV=trunk
 - TOXENV=pypy
 - TOXENV=py33
+- TOXENV=docs
 matrix:
  allow_failures:
  - env: TOXENV=pypy
  - env: TOXENV=py33
+  - env: TOXENV=docs
 install:
 - ./.travis-workarounds.sh
 - pip install tox
--- a/docs/conf.py
+++ b/docs/conf.py
@ -192,3 +192,14 @@ latex_documents = [

 # If false, no module index is generated.
 #latex_use_modindex = True
+
+
+# Options for the linkcheck builder
+# ---------------------------------
+
+# A list of regular expressions that match URIs that should not be checked when
+# doing a linkcheck build.
+linkcheck_ignore = [
+    'http://localhost:\d+', 'http://hg.scrapy.org',
+    'http://directory.google.com/'
+]
--- a/docs/faq.rst
+++ b/docs/faq.rst
@ -20,7 +20,7 @@ In other words, comparing `BeautifulSoup`_ (or `lxml`_) to Scrapy is like
 comparing `jinja2`_ to `Django`_.

 .. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
-.. _lxml: http://codespeak.net/lxml/
+.. _lxml: http://lxml.de/
 .. _jinja2: http://jinja.pocoo.org/2/
 .. _Django: http://www.djangoproject.com

--- a/docs/intro/overview.rst
+++ b/docs/intro/overview.rst
@ -59,7 +59,7 @@ The next thing is to write a Spider which defines the start URL
 for extracting the data from pages.

 If we take a look at that page content we'll see that all torrent URLs are like
-http://www.mininova.org/tor/NUMBER where ``NUMBER`` is an integer. We'll use
+``http://www.mininova.org/tor/NUMBER`` where ``NUMBER`` is an integer. We'll use
 that to construct the regular expression for the links to follow: ``/tor/\d+``.

 We'll use `XPath`_ for selecting the data to extract from the web page HTML
--- a/docs/news.rst
+++ b/docs/news.rst
@ -47,7 +47,7 @@ Enhancements

 - [**Backwards incompatible**] Switched HTTPCacheMiddleware backend to filesystem (:issue:`541`)
  To restore old backend set `HTTPCACHE_STORAGE` to `scrapy.contrib.httpcache.DbmCacheStorage`
- Proxy https:// urls using CONNECT method (:issue:`392`, :issue:`397`)
+- Proxy \https:// urls using CONNECT method (:issue:`392`, :issue:`397`)
 - Add a middleware to crawl ajax crawleable pages as defined by google (:issue:`343`)
 - Rename scrapy.spider.BaseSpider to scrapy.spider.Spider (:issue:`510`, :issue:`519`)
 - Selectors register EXSLT namespaces by default (:issue:`472`)
@ -394,7 +394,7 @@ Scrapy changes:
 - nested items now fully supported in JSON and JSONLines exporters
 - added :reqmeta:`cookiejar` Request meta key to support multiple cookie sessions per spider
 - decoupled encoding detection code to `w3lib.encoding`_, and ported Scrapy code to use that mdule
- dropped support for Python 2.5. See http://blog.scrapy.org/scrapy-dropping-support-for-python-25
+- dropped support for Python 2.5. See http://blog.scrapinghub.com/2012/02/27/scrapy-0-15-dropping-support-for-python-2-5/
 - dropped support for Twisted 2.5
 - added :setting:`REFERER_ENABLED` setting, to control referer middleware
 - changed default user agent to: ``Scrapy/VERSION (+http://scrapy.org)``
@ -744,7 +744,7 @@ First release of Scrapy.

 .. _AJAX crawleable urls: http://code.google.com/web/ajaxcrawling/docs/getting-started.html
 .. _chunked transfer encoding: http://en.wikipedia.org/wiki/Chunked_transfer_encoding
-.. _w3lib: http://https://github.com/scrapy/w3lib
+.. _w3lib: https://github.com/scrapy/w3lib
 .. _scrapely: https://github.com/scrapy/scrapely
 .. _marshal: http://docs.python.org/library/marshal.html
 .. _w3lib.encoding: https://github.com/scrapy/w3lib/blob/master/w3lib/encoding.py
--- a/docs/topics/architecture.rst
+++ b/docs/topics/architecture.rst
@ -121,10 +121,10 @@ for concurrency.
 For more information about asynchronous programming and Twisted see these
 links:

-* `Asynchronous Programming with Twisted`_
+* `Introduction to Deferreds in Twisted`_
 * `Twisted - hello, asynchronous programming`_

 .. _Twisted: http://twistedmatrix.com/trac/
-.. _Asynchronous Programming with Twisted: http://twistedmatrix.com/projects/core/documentation/howto/async.html
+.. _Introduction to Deferreds in Twisted: http://twistedmatrix.com/documents/current/core/howto/defer-intro.html
 .. _Twisted - hello, asynchronous programming: http://jessenoller.com/2009/02/11/twisted-hello-asynchronous-programming/

--- a/docs/topics/commands.rst
+++ b/docs/topics/commands.rst
@ -499,4 +499,4 @@ Example::

    COMMANDS_MODULE = 'mybot.commands'

-.. _Deploying your project: http://scrapyd.readthedocs.org/en/latest/#deploying-your-project
+.. _Deploying your project: http://scrapyd.readthedocs.org/en/latest/deploy.html
--- a/docs/topics/email.rst
+++ b/docs/topics/email.rst
@ -15,7 +15,7 @@ simple API for sending attachments and it's very easy to configure, with a few
 :ref:`settings <topics-email-settings>`.

 .. _smtplib: http://docs.python.org/library/smtplib.html
-.. _Twisted non-blocking IO: http://twistedmatrix.com/projects/core/documentation/howto/async.html
+.. _Twisted non-blocking IO: http://twistedmatrix.com/documents/current/core/howto/defer-intro.html

 Quick example
 =============
--- a/docs/topics/selectors.rst
+++ b/docs/topics/selectors.rst
@ -37,7 +37,7 @@ For a complete reference of the selectors API see
 :ref:`Selector reference <topics-selectors-ref>`

 .. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
-.. _lxml: http://codespeak.net/lxml/
+.. _lxml: http://lxml.de/
 .. _ElementTree: http://docs.python.org/library/xml.etree.elementtree.html
 .. _cssselect: https://pypi.python.org/pypi/cssselect/
 .. _XPath: http://www.w3.org/TR/xpath
@ -247,12 +247,12 @@ Being built atop `lxml`_, Scrapy selectors also support some `EXSLT`_ extensions
 and come with these pre-registered namespaces to use in XPath expressions:


-======  ====================================    =======================
-prefix  namespace                               usage
-======  ====================================    =======================
-re      http://exslt.org/regular-expressions    `regular expressions`_
-set     http://exslt.org/sets                   `set manipulation`_
-======  ====================================    =======================
+======  =====================================    =======================
+prefix  namespace                                usage
+======  =====================================    =======================
+re      \http://exslt.org/regular-expressions    `regular expressions`_
+set     \http://exslt.org/sets                   `set manipulation`_
+======  =====================================    =======================

 Regular expressions
 ~~~~~~~~~~~~~~~~~~~
@ -594,4 +594,4 @@ of relevance, are:
   case some element names clash between namespaces. These cases are very rare
   though.

-.. _Google Base XML feed: http://base.google.com/support/bin/answer.py?hl=en&answer=59461
+.. _Google Base XML feed: https://support.google.com/merchants/answer/160589?hl=en&ref_topic=2473799
--- a/tox.ini
+++ b/tox.ini
@ -4,7 +4,7 @@
 # and then run "tox" from this directory.

 [tox]
-envlist = py27, pypy, precise, trunk, py33
+envlist = py27, pypy, precise, trunk, py33, docs
 indexserver =
    HPK = https://devpi.net/hpk/dev/

@ -56,3 +56,11 @@ deps =
 commands =
    bin/runtests.bat []
 sitepackages = False
+
+[testenv:docs]
+changedir = docs
+deps =
+    Sphinx
+commands =
+    sphinx-build -W -b html . build/html
+    sphinx-build -W -b linkcheck . build/linkcheck