1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-23 16:03:56 +00:00

Merge pull request #687 from Curita/docs-linkcheck

Docs build and linkcheck in tox env
This commit is contained in:
Daniel Graña 2014-04-11 15:36:00 -04:00
commit 21bff7b3b3
10 changed files with 39 additions and 18 deletions

View File

@ -6,10 +6,12 @@ env:
- TOXENV=trunk
- TOXENV=pypy
- TOXENV=py33
- TOXENV=docs
matrix:
allow_failures:
- env: TOXENV=pypy
- env: TOXENV=py33
- env: TOXENV=docs
install:
- ./.travis-workarounds.sh
- pip install tox

View File

@ -192,3 +192,14 @@ latex_documents = [
# If false, no module index is generated.
#latex_use_modindex = True
# Options for the linkcheck builder
# ---------------------------------
# A list of regular expressions that match URIs that should not be checked when
# doing a linkcheck build.
linkcheck_ignore = [
'http://localhost:\d+', 'http://hg.scrapy.org',
'http://directory.google.com/'
]

View File

@ -20,7 +20,7 @@ In other words, comparing `BeautifulSoup`_ (or `lxml`_) to Scrapy is like
comparing `jinja2`_ to `Django`_.
.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
.. _lxml: http://codespeak.net/lxml/
.. _lxml: http://lxml.de/
.. _jinja2: http://jinja.pocoo.org/2/
.. _Django: http://www.djangoproject.com

View File

@ -59,7 +59,7 @@ The next thing is to write a Spider which defines the start URL
for extracting the data from pages.
If we take a look at that page content we'll see that all torrent URLs are like
http://www.mininova.org/tor/NUMBER where ``NUMBER`` is an integer. We'll use
``http://www.mininova.org/tor/NUMBER`` where ``NUMBER`` is an integer. We'll use
that to construct the regular expression for the links to follow: ``/tor/\d+``.
We'll use `XPath`_ for selecting the data to extract from the web page HTML

View File

@ -47,7 +47,7 @@ Enhancements
- [**Backwards incompatible**] Switched HTTPCacheMiddleware backend to filesystem (:issue:`541`)
To restore old backend set `HTTPCACHE_STORAGE` to `scrapy.contrib.httpcache.DbmCacheStorage`
- Proxy https:// urls using CONNECT method (:issue:`392`, :issue:`397`)
- Proxy \https:// urls using CONNECT method (:issue:`392`, :issue:`397`)
- Add a middleware to crawl ajax crawleable pages as defined by google (:issue:`343`)
- Rename scrapy.spider.BaseSpider to scrapy.spider.Spider (:issue:`510`, :issue:`519`)
- Selectors register EXSLT namespaces by default (:issue:`472`)
@ -394,7 +394,7 @@ Scrapy changes:
- nested items now fully supported in JSON and JSONLines exporters
- added :reqmeta:`cookiejar` Request meta key to support multiple cookie sessions per spider
- decoupled encoding detection code to `w3lib.encoding`_, and ported Scrapy code to use that mdule
- dropped support for Python 2.5. See http://blog.scrapy.org/scrapy-dropping-support-for-python-25
- dropped support for Python 2.5. See http://blog.scrapinghub.com/2012/02/27/scrapy-0-15-dropping-support-for-python-2-5/
- dropped support for Twisted 2.5
- added :setting:`REFERER_ENABLED` setting, to control referer middleware
- changed default user agent to: ``Scrapy/VERSION (+http://scrapy.org)``
@ -744,7 +744,7 @@ First release of Scrapy.
.. _AJAX crawleable urls: http://code.google.com/web/ajaxcrawling/docs/getting-started.html
.. _chunked transfer encoding: http://en.wikipedia.org/wiki/Chunked_transfer_encoding
.. _w3lib: http://https://github.com/scrapy/w3lib
.. _w3lib: https://github.com/scrapy/w3lib
.. _scrapely: https://github.com/scrapy/scrapely
.. _marshal: http://docs.python.org/library/marshal.html
.. _w3lib.encoding: https://github.com/scrapy/w3lib/blob/master/w3lib/encoding.py

View File

@ -121,10 +121,10 @@ for concurrency.
For more information about asynchronous programming and Twisted see these
links:
* `Asynchronous Programming with Twisted`_
* `Introduction to Deferreds in Twisted`_
* `Twisted - hello, asynchronous programming`_
.. _Twisted: http://twistedmatrix.com/trac/
.. _Asynchronous Programming with Twisted: http://twistedmatrix.com/projects/core/documentation/howto/async.html
.. _Introduction to Deferreds in Twisted: http://twistedmatrix.com/documents/current/core/howto/defer-intro.html
.. _Twisted - hello, asynchronous programming: http://jessenoller.com/2009/02/11/twisted-hello-asynchronous-programming/

View File

@ -499,4 +499,4 @@ Example::
COMMANDS_MODULE = 'mybot.commands'
.. _Deploying your project: http://scrapyd.readthedocs.org/en/latest/#deploying-your-project
.. _Deploying your project: http://scrapyd.readthedocs.org/en/latest/deploy.html

View File

@ -15,7 +15,7 @@ simple API for sending attachments and it's very easy to configure, with a few
:ref:`settings <topics-email-settings>`.
.. _smtplib: http://docs.python.org/library/smtplib.html
.. _Twisted non-blocking IO: http://twistedmatrix.com/projects/core/documentation/howto/async.html
.. _Twisted non-blocking IO: http://twistedmatrix.com/documents/current/core/howto/defer-intro.html
Quick example
=============

View File

@ -37,7 +37,7 @@ For a complete reference of the selectors API see
:ref:`Selector reference <topics-selectors-ref>`
.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
.. _lxml: http://codespeak.net/lxml/
.. _lxml: http://lxml.de/
.. _ElementTree: http://docs.python.org/library/xml.etree.elementtree.html
.. _cssselect: https://pypi.python.org/pypi/cssselect/
.. _XPath: http://www.w3.org/TR/xpath
@ -247,12 +247,12 @@ Being built atop `lxml`_, Scrapy selectors also support some `EXSLT`_ extensions
and come with these pre-registered namespaces to use in XPath expressions:
====== ==================================== =======================
prefix namespace usage
====== ==================================== =======================
re http://exslt.org/regular-expressions `regular expressions`_
set http://exslt.org/sets `set manipulation`_
====== ==================================== =======================
====== ===================================== =======================
prefix namespace usage
====== ===================================== =======================
re \http://exslt.org/regular-expressions `regular expressions`_
set \http://exslt.org/sets `set manipulation`_
====== ===================================== =======================
Regular expressions
~~~~~~~~~~~~~~~~~~~
@ -594,4 +594,4 @@ of relevance, are:
case some element names clash between namespaces. These cases are very rare
though.
.. _Google Base XML feed: http://base.google.com/support/bin/answer.py?hl=en&answer=59461
.. _Google Base XML feed: https://support.google.com/merchants/answer/160589?hl=en&ref_topic=2473799

10
tox.ini
View File

@ -4,7 +4,7 @@
# and then run "tox" from this directory.
[tox]
envlist = py27, pypy, precise, trunk, py33
envlist = py27, pypy, precise, trunk, py33, docs
indexserver =
HPK = https://devpi.net/hpk/dev/
@ -56,3 +56,11 @@ deps =
commands =
bin/runtests.bat []
sitepackages = False
[testenv:docs]
changedir = docs
deps =
Sphinx
commands =
sphinx-build -W -b html . build/html
sphinx-build -W -b linkcheck . build/linkcheck