DOC update examples with long longger names

2025-02-22 12:33:04 +00:00 · 2016-12-16 21:47:58 +05:00 · 2016-12-16 21:47:58 +05:00 · 0fc73a9d55
commit 0fc73a9d55
parent 05b4555f39
5 changed files with 81 additions and 55 deletions
--- a/docs/intro/tutorial.rst
+++ b/docs/intro/tutorial.rst
@ -130,15 +130,15 @@ will send some requests for the ``quotes.toscrape.com`` domain. You will get an
 similar to this::

    ... (omitted for brevity)
-    2016-09-20 14:48:00 [scrapy] INFO: Spider opened
-    2016-09-20 14:48:00 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
-    2016-09-20 14:48:00 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
-    2016-09-20 14:48:00 [scrapy] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
-    2016-09-20 14:48:00 [scrapy] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
-    2016-09-20 14:48:01 [quotes] DEBUG: Saved file quotes-1.html
-    2016-09-20 14:48:01 [scrapy] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: None)
-    2016-09-20 14:48:01 [quotes] DEBUG: Saved file quotes-2.html
-    2016-09-20 14:48:01 [scrapy] INFO: Closing spider (finished)
+    2016-12-16 21:24:05 [scrapy.core.engine] INFO: Spider opened
+    2016-12-16 21:24:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:24:05 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
+    2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
+    2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
+    2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: None)
+    2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-1.html
+    2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-2.html
+    2016-12-16 21:24:05 [scrapy.core.engine] INFO: Closing spider (finished)
    ...

 Now, check the files in the current directory. You should notice that two new
@ -212,7 +212,7 @@ using the shell :ref:`Scrapy shell <topics-shell>`. Run::
 You will see something like::

    [ ... Scrapy log here ... ]
-    2016-09-19 12:09:27 [scrapy] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
+    2016-09-19 12:09:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
    [s] Available Scrapy objects:
    [s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
    [s]   crawler    <scrapy.crawler.Crawler object at 0x7fa91d888c90>
@ -429,9 +429,9 @@ in the callback, as you can see below::

 If you run this spider, it will output the extracted data with the log::

-    2016-09-19 18:57:19 [scrapy] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
+    2016-09-19 18:57:19 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
    {'tags': ['life', 'love'], 'author': 'André Gide', 'text': '“It is better to be hated for what you are than to be loved for what you are not.”'}
-    2016-09-19 18:57:19 [scrapy] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
+    2016-09-19 18:57:19 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
    {'tags': ['edison', 'failure', 'inspirational', 'paraphrased'], 'author': 'Thomas A. Edison', 'text': "“I have not failed. I've just found 10,000 ways that won't work.”"}


--- a/docs/topics/benchmarking.rst
+++ b/docs/topics/benchmarking.rst
@ -18,40 +18,66 @@ To run it use::

 You should see an output like this::

-    2013-05-16 13:08:46-0300 [scrapy] INFO: Scrapy 0.17.0 started (bot: scrapybot)
-    2013-05-16 13:08:47-0300 [scrapy] INFO: Spider opened
-    2013-05-16 13:08:47-0300 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
-    2013-05-16 13:08:48-0300 [scrapy] INFO: Crawled 74 pages (at 4440 pages/min), scraped 0 items (at 0 items/min)
-    2013-05-16 13:08:49-0300 [scrapy] INFO: Crawled 143 pages (at 4140 pages/min), scraped 0 items (at 0 items/min)
-    2013-05-16 13:08:50-0300 [scrapy] INFO: Crawled 210 pages (at 4020 pages/min), scraped 0 items (at 0 items/min)
-    2013-05-16 13:08:51-0300 [scrapy] INFO: Crawled 274 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
-    2013-05-16 13:08:52-0300 [scrapy] INFO: Crawled 343 pages (at 4140 pages/min), scraped 0 items (at 0 items/min)
-    2013-05-16 13:08:53-0300 [scrapy] INFO: Crawled 410 pages (at 4020 pages/min), scraped 0 items (at 0 items/min)
-    2013-05-16 13:08:54-0300 [scrapy] INFO: Crawled 474 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
-    2013-05-16 13:08:55-0300 [scrapy] INFO: Crawled 538 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
-    2013-05-16 13:08:56-0300 [scrapy] INFO: Crawled 602 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
-    2013-05-16 13:08:57-0300 [scrapy] INFO: Closing spider (closespider_timeout)
-    2013-05-16 13:08:57-0300 [scrapy] INFO: Crawled 666 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
-    2013-05-16 13:08:57-0300 [scrapy] INFO: Dumping Scrapy stats:
-        {'downloader/request_bytes': 231508,
-         'downloader/request_count': 682,
-         'downloader/request_method_count/GET': 682,
-         'downloader/response_bytes': 1172802,
-         'downloader/response_count': 682,
-         'downloader/response_status_count/200': 682,
-         'finish_reason': 'closespider_timeout',
-         'finish_time': datetime.datetime(2013, 5, 16, 16, 8, 57, 985539),
-         'log_count/INFO': 14,
-         'request_depth_max': 34,
-         'response_received_count': 682,
-         'scheduler/dequeued': 682,
-         'scheduler/dequeued/memory': 682,
-         'scheduler/enqueued': 12767,
-         'scheduler/enqueued/memory': 12767,
-         'start_time': datetime.datetime(2013, 5, 16, 16, 8, 47, 676539)}
-    2013-05-16 13:08:57-0300 [scrapy] INFO: Spider closed (closespider_timeout)
+    2016-12-16 21:18:48 [scrapy.utils.log] INFO: Scrapy 1.2.2 started (bot: quotesbot)
+    2016-12-16 21:18:48 [scrapy.utils.log] INFO: Overridden settings: {'CLOSESPIDER_TIMEOUT': 10, 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['quotesbot.spiders'], 'LOGSTATS_INTERVAL': 1, 'BOT_NAME': 'quotesbot', 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'quotesbot.spiders'}
+    2016-12-16 21:18:49 [scrapy.middleware] INFO: Enabled extensions:
+    ['scrapy.extensions.closespider.CloseSpider',
+     'scrapy.extensions.logstats.LogStats',
+     'scrapy.extensions.telnet.TelnetConsole',
+     'scrapy.extensions.corestats.CoreStats']
+    2016-12-16 21:18:49 [scrapy.middleware] INFO: Enabled downloader middlewares:
+    ['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
+     'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
+     'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
+     'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
+     'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
+     'scrapy.downloadermiddlewares.retry.RetryMiddleware',
+     'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
+     'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
+     'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
+     'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
+     'scrapy.downloadermiddlewares.stats.DownloaderStats']
+    2016-12-16 21:18:49 [scrapy.middleware] INFO: Enabled spider middlewares:
+    ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
+     'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
+     'scrapy.spidermiddlewares.referer.RefererMiddleware',
+     'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
+     'scrapy.spidermiddlewares.depth.DepthMiddleware']
+    2016-12-16 21:18:49 [scrapy.middleware] INFO: Enabled item pipelines:
+    []
+    2016-12-16 21:18:49 [scrapy.core.engine] INFO: Spider opened
+    2016-12-16 21:18:49 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:18:50 [scrapy.extensions.logstats] INFO: Crawled 70 pages (at 4200 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:18:51 [scrapy.extensions.logstats] INFO: Crawled 134 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:18:52 [scrapy.extensions.logstats] INFO: Crawled 198 pages (at 3840 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:18:53 [scrapy.extensions.logstats] INFO: Crawled 254 pages (at 3360 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:18:54 [scrapy.extensions.logstats] INFO: Crawled 302 pages (at 2880 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:18:55 [scrapy.extensions.logstats] INFO: Crawled 358 pages (at 3360 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:18:56 [scrapy.extensions.logstats] INFO: Crawled 406 pages (at 2880 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:18:57 [scrapy.extensions.logstats] INFO: Crawled 438 pages (at 1920 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:18:58 [scrapy.extensions.logstats] INFO: Crawled 470 pages (at 1920 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:18:59 [scrapy.core.engine] INFO: Closing spider (closespider_timeout)
+    2016-12-16 21:18:59 [scrapy.extensions.logstats] INFO: Crawled 518 pages (at 2880 pages/min), scraped 0 items (at 0 items/min)
+    2016-12-16 21:19:00 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
+    {'downloader/request_bytes': 229995,
+     'downloader/request_count': 534,
+     'downloader/request_method_count/GET': 534,
+     'downloader/response_bytes': 1565504,
+     'downloader/response_count': 534,
+     'downloader/response_status_count/200': 534,
+     'finish_reason': 'closespider_timeout',
+     'finish_time': datetime.datetime(2016, 12, 16, 16, 19, 0, 647725),
+     'log_count/INFO': 17,
+     'request_depth_max': 19,
+     'response_received_count': 534,
+     'scheduler/dequeued': 533,
+     'scheduler/dequeued/memory': 533,
+     'scheduler/enqueued': 10661,
+     'scheduler/enqueued/memory': 10661,
+     'start_time': datetime.datetime(2016, 12, 16, 16, 18, 49, 799869)}
+    2016-12-16 21:19:00 [scrapy.core.engine] INFO: Spider closed (closespider_timeout)

-That tells you that Scrapy is able to crawl about 3900 pages per minute in the
+That tells you that Scrapy is able to crawl about 3000 pages per minute in the
 hardware where you run it. Note that this is a very simple spider intended to
 follow links, any custom spider you write will probably do more stuff which
 results in slower crawl rates. How slower depends on how much your spider does
--- a/docs/topics/downloader-middleware.rst
+++ b/docs/topics/downloader-middleware.rst
@ -238,14 +238,14 @@ header) and all cookies received in responses (ie. ``Set-Cookie`` header).

 Here's an example of a log with :setting:`COOKIES_DEBUG` enabled::

-    2011-04-06 14:35:10-0300 [scrapy] INFO: Spider opened
-    2011-04-06 14:35:10-0300 [scrapy] DEBUG: Sending cookies to: <GET http://www.diningcity.com/netherlands/index.html>
+    2011-04-06 14:35:10-0300 [scrapy.core.engine] INFO: Spider opened
+    2011-04-06 14:35:10-0300 [scrapy.downloadermiddlewares.cookies] DEBUG: Sending cookies to: <GET http://www.diningcity.com/netherlands/index.html>
            Cookie: clientlanguage_nl=en_EN
-    2011-04-06 14:35:14-0300 [scrapy] DEBUG: Received cookies from: <200 http://www.diningcity.com/netherlands/index.html>
+    2011-04-06 14:35:14-0300 [scrapy.downloadermiddlewares.cookies] DEBUG: Received cookies from: <200 http://www.diningcity.com/netherlands/index.html>
            Set-Cookie: JSESSIONID=B~FA4DC0C496C8762AE4F1A620EAB34F38; Path=/
            Set-Cookie: ip_isocode=US
            Set-Cookie: clientlanguage_nl=en_EN; Expires=Thu, 07-Apr-2011 21:21:34 GMT; Path=/
-    2011-04-06 14:49:50-0300 [scrapy] DEBUG: Crawled (200) <GET http://www.diningcity.com/netherlands/index.html> (referer: None)
+    2011-04-06 14:49:50-0300 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.diningcity.com/netherlands/index.html> (referer: None)
    [...]


--- a/docs/topics/settings.rst
+++ b/docs/topics/settings.rst
@ -1037,7 +1037,7 @@ Stats counter (``scheduler/unserializable``) tracks the number of times this hap

 Example entry in logs::

-    1956-01-31 00:00:00+0800 [scrapy] ERROR: Unable to serialize request:
+    1956-01-31 00:00:00+0800 [scrapy.core.scheduler] ERROR: Unable to serialize request:
    <GET http://example.com> - reason: cannot serialize <Request at 0x9a7c7ec>
    (type Request)> - no more unserializable requests will be logged
    (see 'scheduler/unserializable' stats counter)
--- a/docs/topics/shell.rst
+++ b/docs/topics/shell.rst
@ -173,7 +173,7 @@ all start with the ``[s]`` prefix)::
 After that, we can start playing with the objects::

    >>> response.xpath('//title/text()').extract_first()
-    u'Scrapy | A Fast and Powerful Scraping and Web Crawling Framework'
+    'Scrapy | A Fast and Powerful Scraping and Web Crawling Framework'

    >>> fetch("http://reddit.com")
    [s] Available Scrapy objects:
@ -189,7 +189,7 @@ After that, we can start playing with the objects::
    [s]   view(response)    View response in a browser

    >>> response.xpath('//title/text()').extract()
-    [u'reddit: the front page of the internet']
+    ['reddit: the front page of the internet']

    >>> request = request.replace(method="POST")

@ -234,8 +234,8 @@ Here's an example of how you would call it from your spider::

 When you run the spider, you will get something similar to this::

-    2014-01-23 17:48:31-0400 [scrapy] DEBUG: Crawled (200) <GET http://example.com> (referer: None)
-    2014-01-23 17:48:31-0400 [scrapy] DEBUG: Crawled (200) <GET http://example.org> (referer: None)
+    2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.com> (referer: None)
+    2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.org> (referer: None)
    [s] Available Scrapy objects:
    [s]   crawler    <scrapy.crawler.Crawler object at 0x1e16b50>
    ...
@ -258,7 +258,7 @@ Finally you hit Ctrl-D (or Ctrl-Z in Windows) to exit the shell and resume the
 crawling::

    >>> ^D
-    2014-01-23 17:50:03-0400 [scrapy] DEBUG: Crawled (200) <GET http://example.net> (referer: None)
+    2014-01-23 17:50:03-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.net> (referer: None)
    ...

 Note that you can't use the ``fetch`` shortcut here since the Scrapy engine is