1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-21 04:53:19 +00:00

Merge branch 'master' into backward

This commit is contained in:
Mikhail Korobov 2019-03-14 22:21:09 +05:00 committed by GitHub
commit 5dc94db847
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
31 changed files with 165 additions and 162 deletions

View File

@ -55,7 +55,7 @@ guidelines when you're going to report a new bug.
* search the `scrapy-users`_ list and `Scrapy subreddit`_ to see if it has
been discussed there, or if you're not sure if what you're seeing is a bug.
You can also ask in the `#scrapy` IRC channel.
You can also ask in the ``#scrapy`` IRC channel.
* write **complete, reproducible, specific bug reports**. The smaller the test
case, the better. Remember that other developers won't have your project to

View File

@ -4,7 +4,13 @@
Scrapy |version| documentation
==============================
This documentation contains everything you need to know about Scrapy.
Scrapy is a fast high-level `web crawling`_ and `web scraping`_ framework, used
to crawl websites and extract structured data from their pages. It can be used
for a wide range of purposes, from data mining to monitoring and automated
testing.
.. _web crawling: https://en.wikipedia.org/wiki/Web_crawler
.. _web scraping: https://en.wikipedia.org/wiki/Web_scraping
Getting help
============

View File

@ -149,7 +149,7 @@ Documentation improvements
* improved links to beginner resources in the tutorial
(:issue:`3367`, :issue:`3468`);
* fixed :setting:`RETRY_HTTP_CODES` default values in docs (:issue:`3335`);
* remove unused `DEPTH_STATS` option from docs (:issue:`3245`);
* remove unused ``DEPTH_STATS`` option from docs (:issue:`3245`);
* other cleanups (:issue:`3347`, :issue:`3350`, :issue:`3445`, :issue:`3544`,
:issue:`3605`).
@ -1313,7 +1313,7 @@ Module Relocations
Theres been a large rearrangement of modules trying to improve the general
structure of Scrapy. Main changes were separating various subpackages into
new projects and dissolving both `scrapy.contrib` and `scrapy.contrib_exp`
new projects and dissolving both ``scrapy.contrib`` and ``scrapy.contrib_exp``
into top level packages. Backward compatibility was kept among internal
relocations, while importing deprecated modules expect warnings indicating
their new place.
@ -1344,7 +1344,7 @@ Outsourced packages
| | /scrapy-plugins/scrapy-jsonrpc>`_ |
+-------------------------------------+-------------------------------------+
`scrapy.contrib_exp` and `scrapy.contrib` dissolutions
``scrapy.contrib_exp`` and ``scrapy.contrib`` dissolutions
+-------------------------------------+-------------------------------------+
| Old location | New location |
@ -1556,7 +1556,7 @@ Code refactoring
(:issue:`1078`)
- Pydispatch pep8 (:issue:`992`)
- Removed unused 'load=False' parameter from walk_modules() (:issue:`871`)
- For consistency, use `job_dir` helper in `SpiderState` extension.
- For consistency, use ``job_dir`` helper in ``SpiderState`` extension.
(:issue:`805`)
- rename "sflo" local variables to less cryptic "log_observer" (:issue:`775`)
@ -1669,10 +1669,10 @@ Enhancements
cache middleware (:issue:`541`, :issue:`500`, :issue:`571`)
- Expose current crawler in Scrapy shell (:issue:`557`)
- Improve testsuite comparing CSV and XML exporters (:issue:`570`)
- New `offsite/filtered` and `offsite/domains` stats (:issue:`566`)
- New ``offsite/filtered`` and ``offsite/domains`` stats (:issue:`566`)
- Support process_links as generator in CrawlSpider (:issue:`555`)
- Verbose logging and new stats counters for DupeFilter (:issue:`553`)
- Add a mimetype parameter to `MailSender.send()` (:issue:`602`)
- Add a mimetype parameter to ``MailSender.send()`` (:issue:`602`)
- Generalize file pipeline log messages (:issue:`622`)
- Replace unencodeable codepoints with html entities in SGMLLinkExtractor (:issue:`565`)
- Converted SEP documents to rst format (:issue:`629`, :issue:`630`,
@ -1691,20 +1691,20 @@ Enhancements
- Make scrapy.version_info a tuple of integers (:issue:`681`, :issue:`692`)
- Infer exporter's output format from filename extensions
(:issue:`546`, :issue:`659`, :issue:`760`)
- Support case-insensitive domains in `url_is_from_any_domain()` (:issue:`693`)
- Support case-insensitive domains in ``url_is_from_any_domain()`` (:issue:`693`)
- Remove pep8 warnings in project and spider templates (:issue:`698`)
- Tests and docs for `request_fingerprint` function (:issue:`597`)
- Update SEP-19 for GSoC project `per-spider settings` (:issue:`705`)
- Tests and docs for ``request_fingerprint`` function (:issue:`597`)
- Update SEP-19 for GSoC project ``per-spider settings`` (:issue:`705`)
- Set exit code to non-zero when contracts fails (:issue:`727`)
- Add a setting to control what class is instanciated as Downloader component
(:issue:`738`)
- Pass response in `item_dropped` signal (:issue:`724`)
- Improve `scrapy check` contracts command (:issue:`733`, :issue:`752`)
- Document `spider.closed()` shortcut (:issue:`719`)
- Document `request_scheduled` signal (:issue:`746`)
- Pass response in ``item_dropped`` signal (:issue:`724`)
- Improve ``scrapy check`` contracts command (:issue:`733`, :issue:`752`)
- Document ``spider.closed()`` shortcut (:issue:`719`)
- Document ``request_scheduled`` signal (:issue:`746`)
- Add a note about reporting security issues (:issue:`697`)
- Add LevelDB http cache storage backend (:issue:`626`, :issue:`500`)
- Sort spider list output of `scrapy list` command (:issue:`742`)
- Sort spider list output of ``scrapy list`` command (:issue:`742`)
- Multiple documentation enhancemens and fixes
(:issue:`575`, :issue:`587`, :issue:`590`, :issue:`596`, :issue:`610`,
:issue:`617`, :issue:`618`, :issue:`627`, :issue:`613`, :issue:`643`,
@ -1773,22 +1773,22 @@ Enhancements
~~~~~~~~~~~~
- [**Backward incompatible**] Switched HTTPCacheMiddleware backend to filesystem (:issue:`541`)
To restore old backend set `HTTPCACHE_STORAGE` to `scrapy.contrib.httpcache.DbmCacheStorage`
To restore old backend set ``HTTPCACHE_STORAGE`` to ``scrapy.contrib.httpcache.DbmCacheStorage``
- Proxy \https:// urls using CONNECT method (:issue:`392`, :issue:`397`)
- Add a middleware to crawl ajax crawleable pages as defined by google (:issue:`343`)
- Rename scrapy.spider.BaseSpider to scrapy.spider.Spider (:issue:`510`, :issue:`519`)
- Selectors register EXSLT namespaces by default (:issue:`472`)
- Unify item loaders similar to selectors renaming (:issue:`461`)
- Make `RFPDupeFilter` class easily subclassable (:issue:`533`)
- Make ``RFPDupeFilter`` class easily subclassable (:issue:`533`)
- Improve test coverage and forthcoming Python 3 support (:issue:`525`)
- Promote startup info on settings and middleware to INFO level (:issue:`520`)
- Support partials in `get_func_args` util (:issue:`506`, issue:`504`)
- Support partials in ``get_func_args`` util (:issue:`506`, issue:`504`)
- Allow running indiviual tests via tox (:issue:`503`)
- Update extensions ignored by link extractors (:issue:`498`)
- Add middleware methods to get files/images/thumbs paths (:issue:`490`)
- Improve offsite middleware tests (:issue:`478`)
- Add a way to skip default Referer header set by RefererMiddleware (:issue:`475`)
- Do not send `x-gzip` in default `Accept-Encoding` header (:issue:`469`)
- Do not send ``x-gzip`` in default ``Accept-Encoding`` header (:issue:`469`)
- Support defining http error handling using settings (:issue:`466`)
- Use modern python idioms wherever you find legacies (:issue:`497`)
- Improve and correct documentation
@ -1799,14 +1799,14 @@ Fixes
~~~~~
- Update Selector class imports in CrawlSpider template (:issue:`484`)
- Fix unexistent reference to `engine.slots` (:issue:`464`)
- Do not try to call `body_as_unicode()` on a non-TextResponse instance (:issue:`462`)
- Fix unexistent reference to ``engine.slots`` (:issue:`464`)
- Do not try to call ``body_as_unicode()`` on a non-TextResponse instance (:issue:`462`)
- Warn when subclassing XPathItemLoader, previously it only warned on
instantiation. (:issue:`523`)
- Warn when subclassing XPathSelector, previously it only warned on
instantiation. (:issue:`537`)
- Multiple fixes to memory stats (:issue:`531`, :issue:`530`, :issue:`529`)
- Fix overriding url in `FormRequest.from_response()` (:issue:`507`)
- Fix overriding url in ``FormRequest.from_response()`` (:issue:`507`)
- Fix tests runner under pip 1.5 (:issue:`513`)
- Fix logging error when spider name is unicode (:issue:`479`)
@ -1833,7 +1833,7 @@ Enhancements
(modifying them had been deprecated for a long time)
- :setting:`ITEM_PIPELINES` is now defined as a dict (instead of a list)
- Sitemap spider can fetch alternate URLs (:issue:`360`)
- `Selector.remove_namespaces()` now remove namespaces from element's attributes. (:issue:`416`)
- ``Selector.remove_namespaces()`` now remove namespaces from element's attributes. (:issue:`416`)
- Paved the road for Python 3.3+ (:issue:`435`, :issue:`436`, :issue:`431`, :issue:`452`)
- New item exporter using native python types with nesting support (:issue:`366`)
- Tune HTTP1.1 pool size so it matches concurrency defined by settings (:commit:`b43b5f575`)
@ -1844,13 +1844,13 @@ Enhancements
- Mock server (used for tests) can listen for HTTPS requests (:issue:`410`)
- Remove multi spider support from multiple core components
(:issue:`422`, :issue:`421`, :issue:`420`, :issue:`419`, :issue:`423`, :issue:`418`)
- Travis-CI now tests Scrapy changes against development versions of `w3lib` and `queuelib` python packages.
- Travis-CI now tests Scrapy changes against development versions of ``w3lib`` and ``queuelib`` python packages.
- Add pypy 2.1 to continuous integration tests (:commit:`ecfa7431`)
- Pylinted, pep8 and removed old-style exceptions from source (:issue:`430`, :issue:`432`)
- Use importlib for parametric imports (:issue:`445`)
- Handle a regression introduced in Python 2.7.5 that affects XmlItemExporter (:issue:`372`)
- Bugfix crawling shutdown on SIGINT (:issue:`450`)
- Do not submit `reset` type inputs in FormRequest.from_response (:commit:`b326b87`)
- Do not submit ``reset`` type inputs in FormRequest.from_response (:commit:`b326b87`)
- Do not silence download errors when request errback raises an exception (:commit:`684cfc0`)
Bugfixes
@ -1865,8 +1865,8 @@ Bugfixes
- Improve request-response docs (:issue:`391`)
- Improve best practices docs (:issue:`399`, :issue:`400`, :issue:`401`, :issue:`402`)
- Improve django integration docs (:issue:`404`)
- Document `bindaddress` request meta (:commit:`37c24e01d7`)
- Improve `Request` class documentation (:issue:`226`)
- Document ``bindaddress`` request meta (:commit:`37c24e01d7`)
- Improve ``Request`` class documentation (:issue:`226`)
Other
~~~~~
@ -1875,7 +1875,7 @@ Other
- Add `cssselect`_ python package as install dependency
- Drop libxml2 and multi selector's backend support, `lxml`_ is required from now on.
- Minimum Twisted version increased to 10.0.0, dropped Twisted 8.0 support.
- Running test suite now requires `mock` python library (:issue:`390`)
- Running test suite now requires ``mock`` python library (:issue:`390`)
Thanks
@ -1929,7 +1929,7 @@ Scrapy 0.18.3 (released 2013-10-03)
Scrapy 0.18.2 (released 2013-09-03)
-----------------------------------
- Backport `scrapy check` command fixes and backward compatible multi
- Backport ``scrapy check`` command fixes and backward compatible multi
crawler process(:issue:`339`)
Scrapy 0.18.1 (released 2013-08-27)
@ -1958,31 +1958,31 @@ Scrapy 0.18.0 (released 2013-08-09)
- Handle GET parameters for AJAX crawleable urls (:commit:`3fe2a32`)
- Use lxml recover option to parse sitemaps (:issue:`347`)
- Bugfix cookie merging by hostname and not by netloc (:issue:`352`)
- Support disabling `HttpCompressionMiddleware` using a flag setting (:issue:`359`)
- Support xml namespaces using `iternodes` parser in `XMLFeedSpider` (:issue:`12`)
- Support `dont_cache` request meta flag (:issue:`19`)
- Bugfix `scrapy.utils.gz.gunzip` broken by changes in python 2.7.4 (:commit:`4dc76e`)
- Bugfix url encoding on `SgmlLinkExtractor` (:issue:`24`)
- Bugfix `TakeFirst` processor shouldn't discard zero (0) value (:issue:`59`)
- Support disabling ``HttpCompressionMiddleware`` using a flag setting (:issue:`359`)
- Support xml namespaces using ``iternodes`` parser in ``XMLFeedSpider`` (:issue:`12`)
- Support ``dont_cache`` request meta flag (:issue:`19`)
- Bugfix ``scrapy.utils.gz.gunzip`` broken by changes in python 2.7.4 (:commit:`4dc76e`)
- Bugfix url encoding on ``SgmlLinkExtractor`` (:issue:`24`)
- Bugfix ``TakeFirst`` processor shouldn't discard zero (0) value (:issue:`59`)
- Support nested items in xml exporter (:issue:`66`)
- Improve cookies handling performance (:issue:`77`)
- Log dupe filtered requests once (:issue:`105`)
- Split redirection middleware into status and meta based middlewares (:issue:`78`)
- Use HTTP1.1 as default downloader handler (:issue:`109` and :issue:`318`)
- Support xpath form selection on `FormRequest.from_response` (:issue:`185`)
- Bugfix unicode decoding error on `SgmlLinkExtractor` (:issue:`199`)
- Support xpath form selection on ``FormRequest.from_response`` (:issue:`185`)
- Bugfix unicode decoding error on ``SgmlLinkExtractor`` (:issue:`199`)
- Bugfix signal dispatching on pypi interpreter (:issue:`205`)
- Improve request delay and concurrency handling (:issue:`206`)
- Add RFC2616 cache policy to `HttpCacheMiddleware` (:issue:`212`)
- Add RFC2616 cache policy to ``HttpCacheMiddleware`` (:issue:`212`)
- Allow customization of messages logged by engine (:issue:`214`)
- Multiples improvements to `DjangoItem` (:issue:`217`, :issue:`218`, :issue:`221`)
- Multiples improvements to ``DjangoItem`` (:issue:`217`, :issue:`218`, :issue:`221`)
- Extend Scrapy commands using setuptools entry points (:issue:`260`)
- Allow spider `allowed_domains` value to be set/tuple (:issue:`261`)
- Support `settings.getdict` (:issue:`269`)
- Simplify internal `scrapy.core.scraper` slot handling (:issue:`271`)
- Added `Item.copy` (:issue:`290`)
- Allow spider ``allowed_domains`` value to be set/tuple (:issue:`261`)
- Support ``settings.getdict`` (:issue:`269`)
- Simplify internal ``scrapy.core.scraper`` slot handling (:issue:`271`)
- Added ``Item.copy`` (:issue:`290`)
- Collect idle downloader slots (:issue:`297`)
- Add `ftp://` scheme downloader handler (:issue:`329`)
- Add ``ftp://`` scheme downloader handler (:issue:`329`)
- Added downloader benchmark webserver and spider tools :ref:`benchmarking`
- Moved persistent (on disk) queues to a separate project (queuelib_) which scrapy now depends on
- Add scrapy commands using external libraries (:issue:`260`)
@ -2113,7 +2113,7 @@ Scrapy changes:
- dropped Signals singleton. Signals should now be accesed through the Crawler.signals attribute. See the signals documentation for more info.
- dropped Stats Collector singleton. Stats can now be accessed through the Crawler.stats attribute. See the stats collection documentation for more info.
- documented :ref:`topics-api`
- `lxml` is now the default selectors backend instead of `libxml2`
- ``lxml`` is now the default selectors backend instead of ``libxml2``
- ported FormRequest.from_response() to use `lxml`_ instead of `ClientForm`_
- removed modules: ``scrapy.xlib.BeautifulSoup`` and ``scrapy.xlib.ClientForm``
- SitemapSpider: added support for sitemap urls ending in .xml and .xml.gz, even if they advertise a wrong content type (:commit:`10ed28b`)
@ -2206,16 +2206,16 @@ New features and settings
- New ``ChunkedTransferMiddleware`` (enabled by default) to support `chunked transfer encoding`_ (:rev:`2769`)
- Add boto 2.0 support for S3 downloader handler (:rev:`2763`)
- Added `marshal`_ to formats supported by feed exports (:rev:`2744`)
- In request errbacks, offending requests are now received in `failure.request` attribute (:rev:`2738`)
- In request errbacks, offending requests are now received in ``failure.request`` attribute (:rev:`2738`)
- Big downloader refactoring to support per domain/ip concurrency limits (:rev:`2732`)
- ``CONCURRENT_REQUESTS_PER_SPIDER`` setting has been deprecated and replaced by:
- :setting:`CONCURRENT_REQUESTS`, :setting:`CONCURRENT_REQUESTS_PER_DOMAIN`, :setting:`CONCURRENT_REQUESTS_PER_IP`
- check the documentation for more details
- Added builtin caching DNS resolver (:rev:`2728`)
- Moved Amazon AWS-related components/extensions (SQS spider queue, SimpleDB stats collector) to a separate project: [scaws](https://github.com/scrapinghub/scaws) (:rev:`2706`, :rev:`2714`)
- Moved spider queues to scrapyd: `scrapy.spiderqueue` -> `scrapyd.spiderqueue` (:rev:`2708`)
- Moved sqlite utils to scrapyd: `scrapy.utils.sqlite` -> `scrapyd.sqlite` (:rev:`2781`)
- Real support for returning iterators on `start_requests()` method. The iterator is now consumed during the crawl when the spider is getting idle (:rev:`2704`)
- Moved spider queues to scrapyd: ``scrapy.spiderqueue`` -> ``scrapyd.spiderqueue`` (:rev:`2708`)
- Moved sqlite utils to scrapyd: ``scrapy.utils.sqlite`` -> ``scrapyd.sqlite`` (:rev:`2781`)
- Real support for returning iterators on ``start_requests()`` method. The iterator is now consumed during the crawl when the spider is getting idle (:rev:`2704`)
- Added :setting:`REDIRECT_ENABLED` setting to quickly enable/disable the redirect middleware (:rev:`2697`)
- Added :setting:`RETRY_ENABLED` setting to quickly enable/disable the retry middleware (:rev:`2694`)
- Added ``CloseSpider`` exception to manually close spiders (:rev:`2691`)
@ -2223,19 +2223,19 @@ New features and settings
- Refactored close spider behavior to wait for all downloads to finish and be processed by spiders, before closing the spider (:rev:`2688`)
- Added ``SitemapSpider`` (see documentation in Spiders page) (:rev:`2658`)
- Added ``LogStats`` extension for periodically logging basic stats (like crawled pages and scraped items) (:rev:`2657`)
- Make handling of gzipped responses more robust (#319, :rev:`2643`). Now Scrapy will try and decompress as much as possible from a gzipped response, instead of failing with an `IOError`.
- Make handling of gzipped responses more robust (#319, :rev:`2643`). Now Scrapy will try and decompress as much as possible from a gzipped response, instead of failing with an ``IOError``.
- Simplified !MemoryDebugger extension to use stats for dumping memory debugging info (:rev:`2639`)
- Added new command to edit spiders: ``scrapy edit`` (:rev:`2636`) and `-e` flag to `genspider` command that uses it (:rev:`2653`)
- Added new command to edit spiders: ``scrapy edit`` (:rev:`2636`) and ``-e`` flag to ``genspider`` command that uses it (:rev:`2653`)
- Changed default representation of items to pretty-printed dicts. (:rev:`2631`). This improves default logging by making log more readable in the default case, for both Scraped and Dropped lines.
- Added :signal:`spider_error` signal (:rev:`2628`)
- Added :setting:`COOKIES_ENABLED` setting (:rev:`2625`)
- Stats are now dumped to Scrapy log (default value of :setting:`STATS_DUMP` setting has been changed to `True`). This is to make Scrapy users more aware of Scrapy stats and the data that is collected there.
- Stats are now dumped to Scrapy log (default value of :setting:`STATS_DUMP` setting has been changed to ``True``). This is to make Scrapy users more aware of Scrapy stats and the data that is collected there.
- Added support for dynamically adjusting download delay and maximum concurrent requests (:rev:`2599`)
- Added new DBM HTTP cache storage backend (:rev:`2576`)
- Added ``listjobs.json`` API to Scrapyd (:rev:`2571`)
- ``CsvItemExporter``: added ``join_multivalued`` parameter (:rev:`2578`)
- Added namespace support to ``xmliter_lxml`` (:rev:`2552`)
- Improved cookies middleware by making `COOKIES_DEBUG` nicer and documenting it (:rev:`2579`)
- Improved cookies middleware by making ``COOKIES_DEBUG`` nicer and documenting it (:rev:`2579`)
- Several improvements to Scrapyd and Link extractors
Code rearranged and removed
@ -2249,11 +2249,11 @@ Code rearranged and removed
- Reduced Scrapy codebase by striping part of Scrapy code into two new libraries:
- `w3lib`_ (several functions from ``scrapy.utils.{http,markup,multipart,response,url}``, done in :rev:`2584`)
- `scrapely`_ (was ``scrapy.contrib.ibl``, done in :rev:`2586`)
- Removed unused function: `scrapy.utils.request.request_info()` (:rev:`2577`)
- Removed googledir project from `examples/googledir`. There's now a new example project called `dirbot` available on github: https://github.com/scrapy/dirbot
- Removed unused function: ``scrapy.utils.request.request_info()`` (:rev:`2577`)
- Removed googledir project from ``examples/googledir``. There's now a new example project called ``dirbot`` available on github: https://github.com/scrapy/dirbot
- Removed support for default field values in Scrapy items (:rev:`2616`)
- Removed experimental crawlspider v2 (:rev:`2632`)
- Removed scheduler middleware to simplify architecture. Duplicates filter is now done in the scheduler itself, using the same dupe fltering class as before (`DUPEFILTER_CLASS` setting) (:rev:`2640`)
- Removed scheduler middleware to simplify architecture. Duplicates filter is now done in the scheduler itself, using the same dupe fltering class as before (``DUPEFILTER_CLASS`` setting) (:rev:`2640`)
- Removed support for passing urls to ``scrapy crawl`` command (use ``scrapy parse`` instead) (:rev:`2704`)
- Removed deprecated Execution Queue (:rev:`2704`)
- Removed (undocumented) spider context extension (from scrapy.contrib.spidercontext) (:rev:`2780`)
@ -2289,13 +2289,13 @@ Scrapyd changes
- Scrapyd now uses one process per spider
- It stores one log file per spider run, and rotate them keeping the lastest 5 logs per spider (by default)
- A minimal web ui was added, available at http://localhost:6800 by default
- There is now a `scrapy server` command to start a Scrapyd server of the current project
- There is now a ``scrapy server`` command to start a Scrapyd server of the current project
Changes to settings
~~~~~~~~~~~~~~~~~~~
- added `HTTPCACHE_ENABLED` setting (False by default) to enable HTTP cache middleware
- changed `HTTPCACHE_EXPIRATION_SECS` semantics: now zero means "never expire".
- added ``HTTPCACHE_ENABLED`` setting (False by default) to enable HTTP cache middleware
- changed ``HTTPCACHE_EXPIRATION_SECS`` semantics: now zero means "never expire".
Deprecated/obsoleted functionality
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -2326,17 +2326,17 @@ New features and improvements
- Splitted Debian package into two packages - the library and the service (#187)
- Scrapy log refactoring (#188)
- New extension for keeping persistent spider contexts among different runs (#203)
- Added `dont_redirect` request.meta key for avoiding redirects (#233)
- Added `dont_retry` request.meta key for avoiding retries (#234)
- Added ``dont_redirect`` request.meta key for avoiding redirects (#233)
- Added ``dont_retry`` request.meta key for avoiding retries (#234)
Command-line tool changes
~~~~~~~~~~~~~~~~~~~~~~~~~
- New `scrapy` command which replaces the old `scrapy-ctl.py` (#199)
- there is only one global `scrapy` command now, instead of one `scrapy-ctl.py` per project
- Added `scrapy.bat` script for running more conveniently from Windows
- New ``scrapy`` command which replaces the old ``scrapy-ctl.py`` (#199)
- there is only one global ``scrapy`` command now, instead of one ``scrapy-ctl.py`` per project
- Added ``scrapy.bat`` script for running more conveniently from Windows
- Added bash completion to command-line tool (#210)
- Renamed command `start` to `runserver` (#209)
- Renamed command ``start`` to ``runserver`` (#209)
API changes
~~~~~~~~~~~

View File

@ -94,7 +94,7 @@ how you :ref:`configure the downloader middlewares
.. method:: crawl(\*args, \**kwargs)
Starts the crawler by instantiating its spider class with the given
`args` and `kwargs` arguments, while setting the execution engine in
``args`` and ``kwargs`` arguments, while setting the execution engine in
motion.
Returns a deferred that is fired when the crawl is finished.
@ -180,7 +180,7 @@ SpiderLoader API
.. method:: load(spider_name)
Get the Spider class with the given name. It'll look into the previously
loaded spiders for a spider class with name `spider_name` and will raise
loaded spiders for a spider class with name ``spider_name`` and will raise
a KeyError if not found.
:param spider_name: spider class name

View File

@ -172,5 +172,5 @@ links:
.. _Twisted: https://twistedmatrix.com/trac/
.. _Introduction to Deferreds in Twisted: https://twistedmatrix.com/documents/current/core/howto/defer-intro.html
.. _Twisted - hello, asynchronous programming: http://jessenoller.com/2009/02/11/twisted-hello-asynchronous-programming/
.. _Twisted - hello, asynchronous programming: http://jessenoller.com/blog/2009/02/11/twisted-hello-asynchronous-programming/
.. _Twisted Introduction - Krondo: http://krondo.com/an-introduction-to-asynchronous-programming-and-twisted/

View File

@ -233,7 +233,7 @@ also request each page to get every quote on the site::
name = 'quote'
allowed_domains = ['quotes.toscrape.com']
page = 1
start_urls = ['http://quotes.toscrape.com/api/quotes?page=1]
start_urls = ['http://quotes.toscrape.com/api/quotes?page=1']
def parse(self, response):
data = json.loads(response.text)

View File

@ -41,7 +41,7 @@ previous (or subsequent) middleware being applied.
If you want to disable a built-in middleware (the ones defined in
:setting:`DOWNLOADER_MIDDLEWARES_BASE` and enabled by default) you must define it
in your project's :setting:`DOWNLOADER_MIDDLEWARES` setting and assign `None`
in your project's :setting:`DOWNLOADER_MIDDLEWARES` setting and assign ``None``
as its value. For example, if you want to disable the user-agent middleware::
DOWNLOADER_MIDDLEWARES = {
@ -357,7 +357,7 @@ HttpCacheMiddleware
.. reqmeta:: dont_cache
You can also avoid caching a response on every policy using :reqmeta:`dont_cache` meta key equals `True`.
You can also avoid caching a response on every policy using :reqmeta:`dont_cache` meta key equals ``True``.
.. _httpcache-policy-dummy:
@ -390,17 +390,17 @@ runs to avoid downloading unmodified data (to save bandwidth and speed up crawls
what is implemented:
* Do not attempt to store responses/requests with `no-store` cache-control directive set
* Do not serve responses from cache if `no-cache` cache-control directive is set even for fresh responses
* Compute freshness lifetime from `max-age` cache-control directive
* Compute freshness lifetime from `Expires` response header
* Compute freshness lifetime from `Last-Modified` response header (heuristic used by Firefox)
* Compute current age from `Age` response header
* Compute current age from `Date` header
* Revalidate stale responses based on `Last-Modified` response header
* Revalidate stale responses based on `ETag` response header
* Set `Date` header for any received response missing it
* Support `max-stale` cache-control directive in requests
* Do not attempt to store responses/requests with ``no-store`` cache-control directive set
* Do not serve responses from cache if ``no-cache`` cache-control directive is set even for fresh responses
* Compute freshness lifetime from ``max-age`` cache-control directive
* Compute freshness lifetime from ``Expires`` response header
* Compute freshness lifetime from ``Last-Modified`` response header (heuristic used by Firefox)
* Compute current age from ``Age`` response header
* Compute current age from ``Date`` header
* Revalidate stale responses based on ``Last-Modified`` response header
* Revalidate stale responses based on ``ETag`` response header
* Set ``Date`` header for any received response missing it
* Support ``max-stale`` cache-control directive in requests
This allows spiders to be configured with the full RFC2616 cache policy,
but avoid revalidation on a request-by-request basis, while remaining
@ -408,15 +408,15 @@ what is implemented:
Example:
Add `Cache-Control: max-stale=600` to Request headers to accept responses that
Add ``Cache-Control: max-stale=600`` to Request headers to accept responses that
have exceeded their expiration time by no more than 600 seconds.
See also: RFC2616, 14.9.3
what is missing:
* `Pragma: no-cache` support https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1
* `Vary` header support https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.6
* ``Pragma: no-cache`` support https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1
* ``Vary`` header support https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.6
* Invalidation after updates or deletes https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.10
* ... probably others ..
@ -626,12 +626,12 @@ Default: ``False``
If enabled, will cache pages unconditionally.
A spider may wish to have all responses available in the cache, for
future use with `Cache-Control: max-stale`, for instance. The
future use with ``Cache-Control: max-stale``, for instance. The
DummyPolicy caches all responses but never revalidates them, and
sometimes a more nuanced policy is desirable.
This setting still respects `Cache-Control: no-store` directives in responses.
If you don't want that, filter `no-store` out of the Cache-Control headers in
This setting still respects ``Cache-Control: no-store`` directives in responses.
If you don't want that, filter ``no-store`` out of the Cache-Control headers in
responses you feedto the cache middleware.
.. setting:: HTTPCACHE_IGNORE_RESPONSE_CACHE_CONTROLS
@ -834,8 +834,6 @@ RetryMiddleware
Failed pages are collected on the scraping process and rescheduled at the
end, once the spider has finished crawling all regular (non failed) pages.
Once there are no more failed pages to retry, this middleware sends a signal
(retry_complete), so other extensions could connect to that signal.
The :class:`RetryMiddleware` can be configured through the following
settings (see the settings documentation for more info):
@ -940,7 +938,7 @@ UserAgentMiddleware
Middleware that allows spiders to override the default user agent.
In order for a spider to override the default user agent, its `user_agent`
In order for a spider to override the default user agent, its ``user_agent``
attribute must be set.
.. _ajaxcrawl-middleware:

View File

@ -303,7 +303,7 @@ CsvItemExporter
The additional keyword arguments of this constructor are passed to the
:class:`BaseItemExporter` constructor, and the leftover arguments to the
`csv.writer`_ constructor, so you can use any `csv.writer` constructor
`csv.writer`_ constructor, so you can use any ``csv.writer`` constructor
argument to customize this exporter.
A typical output of this exporter would be::

View File

@ -19,7 +19,7 @@ settings, just like any other Scrapy code.
It is customary for extensions to prefix their settings with their own name, to
avoid collision with existing (and future) extensions. For example, a
hypothetic extension to handle `Google Sitemaps`_ would use settings like
`GOOGLESITEMAP_ENABLED`, `GOOGLESITEMAP_DEPTH`, and so on.
``GOOGLESITEMAP_ENABLED``, ``GOOGLESITEMAP_DEPTH``, and so on.
.. _Google Sitemaps: https://en.wikipedia.org/wiki/Sitemaps
@ -368,7 +368,7 @@ Invokes a `Python debugger`_ inside a running Scrapy process when a `SIGUSR2`_
signal is received. After the debugger is exited, the Scrapy process continues
running normally.
For more info see `Debugging in Python`.
For more info see `Debugging in Python`_.
This extension only works on POSIX-compliant platforms (ie. not Windows).

View File

@ -71,7 +71,7 @@ on cookies.
Request serialization
---------------------
Requests must be serializable by the `pickle` module, in order for persistence
Requests must be serializable by the ``pickle`` module, in order for persistence
to work, so you should make sure that your requests are serializable.
The most common issue here is to use ``lambda`` functions on request callbacks that

View File

@ -286,7 +286,7 @@ ItemLoader objects
given, one is instantiated automatically using the class in
:attr:`default_item_class`.
When instantiated with a `selector` or a `response` parameters
When instantiated with a ``selector`` or a ``response`` parameters
the :class:`ItemLoader` class provides convenient mechanisms for extracting
data from web pages using :ref:`selectors <topics-selectors>`.

View File

@ -243,7 +243,7 @@ scrapy.utils.log module
case, its usage is not required but it's recommended.
If you plan on configuring the handlers yourself is still recommended you
call this function, passing `install_root_handler=False`. Bear in mind
call this function, passing ``install_root_handler=False``. Bear in mind
there won't be any log output set by default in that case.
To get you started on manually configuring logging's output, you can use

View File

@ -132,7 +132,7 @@ For example, the following image URL::
http://www.example.com/image.jpg
Whose `SHA1 hash` is::
Whose ``SHA1 hash`` is::
3afec3b4765f8f0a07b78f98c07b83f013567a0a

View File

@ -80,7 +80,7 @@ returned by the :meth:`CrawlerRunner.crawl
<scrapy.crawler.CrawlerRunner.crawl>` method.
Here's an example of its usage, along with a callback to manually stop the
reactor after `MySpider` has finished running.
reactor after ``MySpider`` has finished running.
::

View File

@ -50,7 +50,7 @@ Request objects
:type meta: dict
:param body: the request body. If a ``unicode`` is passed, then it's encoded to
``str`` using the `encoding` passed (which defaults to ``utf-8``). If
``str`` using the ``encoding`` passed (which defaults to ``utf-8``). If
``body`` is not given, an empty string is stored. Regardless of the
type of this argument, the final value stored will be a ``str`` (never
``unicode`` or ``None``).
@ -610,7 +610,7 @@ Response objects
.. attribute:: Response.flags
A list that contains flags for this response. Flags are labels used for
tagging Responses. For example: `'cached'`, `'redirected`', etc. And
tagging Responses. For example: ``'cached'``, ``'redirected``', etc. And
they're shown on the string representation of the Response (`__str__`
method) which is used by the engine for logging.
@ -682,7 +682,7 @@ TextResponse objects
``unicode(response.body)`` is not a correct way to convert response
body to unicode: you would be using the system default encoding
(typically `ascii`) instead of the response encoding.
(typically ``ascii``) instead of the response encoding.
.. attribute:: TextResponse.encoding
@ -690,7 +690,7 @@ TextResponse objects
A string with the encoding of this response. The encoding is resolved by
trying the following mechanisms, in order:
1. the encoding passed in the constructor `encoding` argument
1. the encoding passed in the constructor ``encoding`` argument
2. the encoding declared in the Content-Type HTTP header. If this
encoding is not valid (ie. unknown), it is ignored and the next

View File

@ -96,7 +96,7 @@ Constructing from response - :class:`~scrapy.http.HtmlResponse` is one of
Using selectors
---------------
To explain how to use the selectors we'll use the `Scrapy shell` (which
To explain how to use the selectors we'll use the ``Scrapy shell`` (which
provides interactive testing) and an example page located in the Scrapy
documentation server:

View File

@ -331,16 +331,16 @@ Default: ``0``
Scope: ``scrapy.spidermiddlewares.depth.DepthMiddleware``
An integer that is used to adjust the request priority based on its depth:
An integer that is used to adjust the :attr:`~scrapy.http.Request.priority` of
a :class:`~scrapy.http.Request` based on its depth.
- if zero (default), no priority adjustment is made from depth
- **a positive value will decrease the priority, i.e. higher depth
requests will be processed later** ; this is commonly used when doing
breadth-first crawls (BFO)
- a negative value will increase priority, i.e., higher depth requests
will be processed sooner (DFO)
The priority of a request is adjusted as follows::
See also: :ref:`faq-bfo-dfo` about tuning Scrapy for BFO or DFO.
request.priority = request.priority - ( depth * DEPTH_PRIORITY )
As depth increases, positive values of ``DEPTH_PRIORITY`` decrease request
priority (BFO), while negative values increase request priority (DFO). See
also :ref:`faq-bfo-dfo`.
.. note::
@ -599,7 +599,7 @@ The amount of time (in secs) that the downloader will wait before timing out.
DOWNLOAD_MAXSIZE
----------------
Default: `1073741824` (1024MB)
Default: ``1073741824`` (1024MB)
The maximum response size (in bytes) that downloader will download.
@ -620,7 +620,7 @@ If you want to disable it set to 0.
DOWNLOAD_WARNSIZE
-----------------
Default: `33554432` (32MB)
Default: ``33554432`` (32MB)
The response size (in bytes) that downloader will start to warn.

View File

@ -43,7 +43,7 @@ previous (or subsequent) middleware being applied.
If you want to disable a builtin middleware (the ones defined in
:setting:`SPIDER_MIDDLEWARES_BASE`, and enabled by default) you must define it
in your project :setting:`SPIDER_MIDDLEWARES` setting and assign `None` as its
in your project :setting:`SPIDER_MIDDLEWARES` setting and assign ``None`` as its
value. For example, if you want to disable the off-site middleware::
SPIDER_MIDDLEWARES = {
@ -200,7 +200,7 @@ DepthMiddleware
.. class:: DepthMiddleware
DepthMiddleware is used for tracking the depth of each Request inside the
site being scraped. It works by setting `request.meta['depth'] = 0` whenever
site being scraped. It works by setting ``request.meta['depth'] = 0`` whenever
there is no value previously set (usually just the first Request) and
incrementing it by 1 otherwise.

View File

@ -129,7 +129,7 @@ scrapy.Spider
You probably won't need to override this directly because the default
implementation acts as a proxy to the :meth:`__init__` method, calling
it with the given arguments `args` and named arguments `kwargs`.
it with the given arguments ``args`` and named arguments ``kwargs``.
Nonetheless, this method sets the :attr:`crawler` and :attr:`settings`
attributes in the new instance so they can be accessed later inside the
@ -298,13 +298,13 @@ The above example can also be written as follows::
Keep in mind that spider arguments are only strings.
The spider will not do any parsing on its own.
If you were to set the `start_urls` attribute from the command line,
If you were to set the ``start_urls`` attribute from the command line,
you would have to parse it on your own into a list
using something like
`ast.literal_eval <https://docs.python.org/library/ast.html#ast.literal_eval>`_
or `json.loads <https://docs.python.org/library/json.html#json.loads>`_
and then set it as an attribute.
Otherwise, you would cause iteration over a `start_urls` string
Otherwise, you would cause iteration over a ``start_urls`` string
(a very common python pitfall)
resulting in each character being seen as a separate url.

View File

@ -22,7 +22,7 @@ To use the packages:
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 627220E7
2. Create `/etc/apt/sources.list.d/scrapy.list` file using the following command::
2. Create ``/etc/apt/sources.list.d/scrapy.list`` file using the following command::
echo 'deb http://archive.scrapy.org/ubuntu scrapy main' | sudo tee /etc/apt/sources.list.d/scrapy.list
@ -34,7 +34,7 @@ To use the packages:
.. note:: Repeat step 3 if you are trying to upgrade Scrapy.
.. warning:: `python-scrapy` is a different package provided by official debian
.. warning:: ``python-scrapy`` is a different package provided by official debian
repositories, it's very outdated and it isn't supported by Scrapy team.
.. _Scrapinghub: https://scrapinghub.com/

View File

@ -153,7 +153,7 @@ class CrawlerRunner(object):
It will call the given Crawler's :meth:`~Crawler.crawl` method, while
keeping track of it so it can be stopped later.
If `crawler_or_spidercls` isn't a :class:`~scrapy.crawler.Crawler`
If ``crawler_or_spidercls`` isn't a :class:`~scrapy.crawler.Crawler`
instance, this method will try to create one using this parameter as
the spider class given to it.
@ -188,10 +188,10 @@ class CrawlerRunner(object):
"""
Return a :class:`~scrapy.crawler.Crawler` object.
* If `crawler_or_spidercls` is a Crawler, it is returned as-is.
* If `crawler_or_spidercls` is a Spider subclass, a new Crawler
* If ``crawler_or_spidercls`` is a Crawler, it is returned as-is.
* If ``crawler_or_spidercls`` is a Spider subclass, a new Crawler
is constructed for it.
* If `crawler_or_spidercls` is a string, this function finds
* If ``crawler_or_spidercls`` is a string, this function finds
a spider with this name in a Scrapy project (using spider loader),
then creates a Crawler instance for it.
"""
@ -273,7 +273,7 @@ class CrawlerProcess(CrawlerRunner):
:setting:`REACTOR_THREADPOOL_MAXSIZE`, and installs a DNS cache based
on :setting:`DNSCACHE_ENABLED` and :setting:`DNSCACHE_SIZE`.
If `stop_after_crawl` is True, the reactor will be stopped after all
If ``stop_after_crawl`` is True, the reactor will be stopped after all
crawlers have finished, using :meth:`join`.
:param boolean stop_after_crawl: stop or not the reactor when all

View File

@ -7,9 +7,7 @@ RETRY_TIMES - how many times to retry a failed page
RETRY_HTTP_CODES - which HTTP response codes to retry
Failed pages are collected on the scraping process and rescheduled at the end,
once the spider has finished crawling all regular (non failed) pages. Once
there is no more failed pages to retry this middleware sends a signal
(retry_complete), so other extensions could connect to that signal.
once the spider has finished crawling all regular (non failed) pages.
"""
import logging

View File

@ -13,21 +13,21 @@ CRAWLEDMSG = u"Crawled (%(status)s) %(request)s%(request_flags)s (referer: %(ref
class LogFormatter(object):
"""Class for generating log messages for different actions.
All methods must return a dictionary listing the parameters `level`, `msg`
and `args` which are going to be used for constructing the log message when
calling logging.log.
All methods must return a dictionary listing the parameters ``level``,
``msg`` and ``args`` which are going to be used for constructing the log
message when calling logging.log.
Dictionary keys for the method outputs:
* `level` should be the log level for that action, you can use those
* ``level`` should be the log level for that action, you can use those
from the python logging library: logging.DEBUG, logging.INFO,
logging.WARNING, logging.ERROR and logging.CRITICAL.
* `msg` should be a string that can contain different formatting
placeholders. This string, formatted with the provided `args`, is going
to be the log message for that action.
* ``msg`` should be a string that can contain different formatting
placeholders. This string, formatted with the provided ``args``, is
going to be the log message for that action.
* `args` should be a tuple or dict with the formatting placeholders for
`msg`. The final log message is computed as output['msg'] %
* ``args`` should be a tuple or dict with the formatting placeholders
for ``msg``. The final log message is computed as output['msg'] %
output['args'].
"""

View File

@ -255,13 +255,13 @@ class FilesPipeline(MediaPipeline):
doing stat of the files and determining if file is new, uptodate or
expired.
`new` files are those that pipeline never processed and needs to be
``new`` files are those that pipeline never processed and needs to be
downloaded from supplier site the first time.
`uptodate` files are the ones that the pipeline processed and are still
``uptodate`` files are the ones that the pipeline processed and are still
valid files.
`expired` files are those that pipeline already processed but the last
``expired`` files are those that pipeline already processed but the last
modification was made long time ago, so a reprocessing is recommended to
refresh it in case of change.

View File

@ -2,7 +2,7 @@ from ftplib import error_perm
from posixpath import dirname
def ftp_makedirs_cwd(ftp, path, first_call=True):
"""Set the current directory of the FTP connection given in the `ftp`
"""Set the current directory of the FTP connection given in the ``ftp``
argument (as a ftplib.FTP object), creating all parent directories if they
don't exist. The ftplib.FTP object must be already connected and logged in.
"""

View File

@ -32,7 +32,7 @@ class TopLevelFormatter(logging.Filter):
Since it can't be set for just one logger (it won't propagate for its
children), it's going to be set in the root handler, with a parametrized
`loggers` list where it should act.
``loggers`` list where it should act.
"""
def __init__(self, loggers=None):

View File

@ -97,8 +97,8 @@ def unicode_to_str(text, encoding=None, errors='strict'):
def to_unicode(text, encoding=None, errors='strict'):
"""Return the unicode representation of a bytes object `text`. If `text`
is already an unicode object, return it as-is."""
"""Return the unicode representation of a bytes object ``text``. If
``text`` is already an unicode object, return it as-is."""
if isinstance(text, six.text_type):
return text
if not isinstance(text, (bytes, six.text_type)):
@ -110,7 +110,7 @@ def to_unicode(text, encoding=None, errors='strict'):
def to_bytes(text, encoding=None, errors='strict'):
"""Return the binary representation of `text`. If `text`
"""Return the binary representation of ``text``. If ``text``
is already a bytes object, return it as-is."""
if isinstance(text, bytes):
return text
@ -123,7 +123,7 @@ def to_bytes(text, encoding=None, errors='strict'):
def to_native_str(text, encoding=None, errors='strict'):
""" Return str representation of `text`
""" Return str representation of ``text``
(bytes in Python 2.x and unicode in Python 3.x). """
if six.PY2:
return to_bytes(text, encoding, errors)
@ -189,7 +189,7 @@ def isbinarytext(text):
def binary_is_text(data):
""" Returns `True` if the given ``data`` argument (a ``bytes`` object)
""" Returns ``True`` if the given ``data`` argument (a ``bytes`` object)
does not contain unprintable control characters.
"""
if not isinstance(data, bytes):
@ -314,7 +314,7 @@ class WeakKeyCache(object):
@deprecated
def stringify_dict(dct_or_tuples, encoding='utf-8', keys_only=True):
"""Return a (new) dict with unicode keys (and values when "keys_only" is
False) of the given dict converted to strings. `dct_or_tuples` can be a
False) of the given dict converted to strings. ``dct_or_tuples`` can be a
dict or a list of tuples, like any dict constructor supports.
"""
d = {}
@ -357,10 +357,10 @@ def retry_on_eintr(function, *args, **kw):
def without_none_values(iterable):
"""Return a copy of `iterable` with all `None` entries removed.
"""Return a copy of ``iterable`` with all ``None`` entries removed.
If `iterable` is a mapping, return a dictionary where all pairs that have
value `None` have been removed.
If ``iterable`` is a mapping, return a dictionary where all pairs that have
value ``None`` have been removed.
"""
try:
return {k: v for k, v in six.iteritems(iterable) if v is not None}

View File

@ -109,12 +109,12 @@ def strip_url(url, strip_credentials=True, strip_default_port=True, origin_only=
"""Strip URL string from some of its components:
- `strip_credentials` removes "user:password@"
- `strip_default_port` removes ":80" (resp. ":443", ":21")
- ``strip_credentials`` removes "user:password@"
- ``strip_default_port`` removes ":80" (resp. ":443", ":21")
from http:// (resp. https://, ftp://) URLs
- `origin_only` replaces path component with "/", also dropping
- ``origin_only`` replaces path component with "/", also dropping
query and fragment components ; it also strips credentials
- `strip_fragment` drops any #fragment component
- ``strip_fragment`` drops any #fragment component
"""
parsed_url = urlparse(url)

View File

@ -10,7 +10,8 @@ Status Obsolete (discarded)
SEP-006: Rename of Selectors to Extractors
==========================================
This SEP proposes a more meaningful naming of XPathSelectors or "Selectors" and their `x` method.
This SEP proposes a more meaningful naming of XPathSelectors or "Selectors" and
their ``x`` method.
Motivation
==========
@ -57,7 +58,7 @@ Additional changes
As the name of the method for performing selection (the ``x`` method) is not
descriptive nor mnemotechnic enough and clearly clashes with ``extract`` method
(x sounds like a short for extract in english), we propose to rename it to
`select`, `sel` (is shortness if required), or `xpath` after `lxml's
``select``, ``sel`` (is shortness if required), or ``xpath`` after `lxml's
<http://lxml.de/xpathxslt.html>`_ ``xpath`` method.
Bonus (ItemBuilder)

View File

@ -16,7 +16,7 @@ _DATABASES = collections.defaultdict(DummyDB)
def open(file, flag='r', mode=0o666):
"""Open or create a dummy database compatible.
Arguments `flag` and `mode` are ignored.
Arguments ``flag`` and ``mode`` are ignored.
"""
# return same instance for same file argument
return _DATABASES[file]

View File

@ -61,7 +61,7 @@ class ShellTest(ProcessTest, SiteTest, unittest.TestCase):
@defer.inlineCallbacks
def test_fetch_redirect_follow_302(self):
"""Test that calling `fetch(url)` follows HTTP redirects by default."""
"""Test that calling ``fetch(url)`` follows HTTP redirects by default."""
url = self.url('/redirect-no-meta-refresh')
code = "fetch('{0}')"
errcode, out, errout = yield self.execute(['-c', code.format(url)])
@ -71,7 +71,7 @@ class ShellTest(ProcessTest, SiteTest, unittest.TestCase):
@defer.inlineCallbacks
def test_fetch_redirect_not_follow_302(self):
"""Test that calling `fetch(url, redirect=False)` disables automatic redirects."""
"""Test that calling ``fetch(url, redirect=False)`` disables automatic redirects."""
url = self.url('/redirect-no-meta-refresh')
code = "fetch('{0}', redirect=False)"
errcode, out, errout = yield self.execute(['-c', code.format(url)])