mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-24 17:24:14 +00:00
Update deprecated examples
* update the scrapy.org example to deal with the new layout. * replaced slashdot.org by reddit.com, because it seems that slashdot is blocking requests.
This commit is contained in:
parent
1b435b2887
commit
2abc9bc901
@ -106,10 +106,10 @@ Example of shell session
|
||||
========================
|
||||
|
||||
Here's an example of a typical shell session where we start by scraping the
|
||||
http://scrapy.org page, and then proceed to scrape the http://slashdot.org
|
||||
page. Finally, we modify the (Slashdot) request method to POST and re-fetch it
|
||||
getting a HTTP 405 (method not allowed) error. We end the session by typing
|
||||
Ctrl-D (in Unix systems) or Ctrl-Z in Windows.
|
||||
http://scrapy.org page, and then proceed to scrape the http://reddit.com
|
||||
page. Finally, we modify the (Reddit) request method to POST and re-fetch it
|
||||
getting an error. We end the session by typing Ctrl-D (in Unix systems) or
|
||||
Ctrl-Z in Windows.
|
||||
|
||||
Keep in mind that the data extracted here may not be the same when you try it,
|
||||
as those pages are not static and could have changed by the time you test this.
|
||||
@ -140,24 +140,24 @@ all start with the ``[s]`` prefix)::
|
||||
|
||||
After that, we can start playing with the objects::
|
||||
|
||||
>>> response.xpath("//h1/text()").extract()[0]
|
||||
u'Meet Scrapy'
|
||||
>>> response.xpath('//title/text()').extract_first()
|
||||
u'Scrapy | A Fast and Powerful Scraping and Web Crawling Framework'
|
||||
|
||||
>>> fetch("http://slashdot.org")
|
||||
>>> fetch("http://reddit.com")
|
||||
[s] Available Scrapy objects:
|
||||
[s] crawler <scrapy.crawler.Crawler object at 0x1a13b50>
|
||||
[s] crawler <scrapy.crawler.Crawler object at 0x7fb3ed9c9c90>
|
||||
[s] item {}
|
||||
[s] request <GET http://slashdot.org>
|
||||
[s] response <200 http://slashdot.org>
|
||||
[s] settings <scrapy.settings.Settings object at 0x2bfd650>
|
||||
[s] spider <Spider 'default' at 0x20c6f50>
|
||||
[s] request <GET http://reddit.com>
|
||||
[s] response <200 https://www.reddit.com/>
|
||||
[s] settings <scrapy.settings.Settings object at 0x7fb3ed9c9c10>
|
||||
[s] spider <DefaultSpider 'default' at 0x7fb3ecdd3390>
|
||||
[s] Useful shortcuts:
|
||||
[s] shelp() Shell help (print this help)
|
||||
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
|
||||
[s] view(response) View response in a browser
|
||||
|
||||
>>> response.xpath('//title/text()').extract()
|
||||
[u'Slashdot: News for nerds, stuff that matters']
|
||||
[u'reddit: the front page of the internet']
|
||||
|
||||
>>> request = request.replace(method="POST")
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user