1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 17:24:14 +00:00

Update deprecated examples

* update the scrapy.org example to deal with the new layout.
* replaced slashdot.org by reddit.com, because it seems that slashdot is blocking requests.
This commit is contained in:
Valdir Stumm Jr 2016-01-06 10:29:45 -02:00
parent 1b435b2887
commit 2abc9bc901

View File

@ -106,10 +106,10 @@ Example of shell session
========================
Here's an example of a typical shell session where we start by scraping the
http://scrapy.org page, and then proceed to scrape the http://slashdot.org
page. Finally, we modify the (Slashdot) request method to POST and re-fetch it
getting a HTTP 405 (method not allowed) error. We end the session by typing
Ctrl-D (in Unix systems) or Ctrl-Z in Windows.
http://scrapy.org page, and then proceed to scrape the http://reddit.com
page. Finally, we modify the (Reddit) request method to POST and re-fetch it
getting an error. We end the session by typing Ctrl-D (in Unix systems) or
Ctrl-Z in Windows.
Keep in mind that the data extracted here may not be the same when you try it,
as those pages are not static and could have changed by the time you test this.
@ -140,24 +140,24 @@ all start with the ``[s]`` prefix)::
After that, we can start playing with the objects::
>>> response.xpath("//h1/text()").extract()[0]
u'Meet Scrapy'
>>> response.xpath('//title/text()').extract_first()
u'Scrapy | A Fast and Powerful Scraping and Web Crawling Framework'
>>> fetch("http://slashdot.org")
>>> fetch("http://reddit.com")
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x1a13b50>
[s] crawler <scrapy.crawler.Crawler object at 0x7fb3ed9c9c90>
[s] item {}
[s] request <GET http://slashdot.org>
[s] response <200 http://slashdot.org>
[s] settings <scrapy.settings.Settings object at 0x2bfd650>
[s] spider <Spider 'default' at 0x20c6f50>
[s] request <GET http://reddit.com>
[s] response <200 https://www.reddit.com/>
[s] settings <scrapy.settings.Settings object at 0x7fb3ed9c9c10>
[s] spider <DefaultSpider 'default' at 0x7fb3ecdd3390>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
>>> response.xpath('//title/text()').extract()
[u'Slashdot: News for nerds, stuff that matters']
[u'reddit: the front page of the internet']
>>> request = request.replace(method="POST")