mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-24 05:04:25 +00:00
DOC selectors.rst cleanup
This commit is contained in:
parent
5f18816428
commit
bdea071af3
@ -199,8 +199,8 @@ too. Here's an example::
|
|||||||
u'<a href="image5.html">Name: My image 5 <br><img src="image5_thumb.jpg"></a>']
|
u'<a href="image5.html">Name: My image 5 <br><img src="image5_thumb.jpg"></a>']
|
||||||
|
|
||||||
>>> for index, link in enumerate(links):
|
>>> for index, link in enumerate(links):
|
||||||
args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract())
|
... args = (index, link.xpath('@href').extract(), link.xpath('img/@src').extract())
|
||||||
print 'Link number %d points to url %s and image %s' % args
|
... print 'Link number %d points to url %s and image %s' % args
|
||||||
|
|
||||||
Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg']
|
Link number 0 points to url [u'image1.html'] and image [u'image1_thumb.jpg']
|
||||||
Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg']
|
Link number 1 points to url [u'image2.html'] and image [u'image2_thumb.jpg']
|
||||||
@ -245,17 +245,17 @@ it actually extracts all ``<p>`` elements from the document, not only those
|
|||||||
inside ``<div>`` elements::
|
inside ``<div>`` elements::
|
||||||
|
|
||||||
>>> for p in divs.xpath('//p'): # this is wrong - gets all <p> from the whole document
|
>>> for p in divs.xpath('//p'): # this is wrong - gets all <p> from the whole document
|
||||||
>>> print p.extract()
|
... print p.extract()
|
||||||
|
|
||||||
This is the proper way to do it (note the dot prefixing the ``.//p`` XPath)::
|
This is the proper way to do it (note the dot prefixing the ``.//p`` XPath)::
|
||||||
|
|
||||||
>>> for p in divs.xpath('.//p'): # extracts all <p> inside
|
>>> for p in divs.xpath('.//p'): # extracts all <p> inside
|
||||||
>>> print p.extract()
|
... print p.extract()
|
||||||
|
|
||||||
Another common case would be to extract all direct ``<p>`` children::
|
Another common case would be to extract all direct ``<p>`` children::
|
||||||
|
|
||||||
>>> for p in divs.xpath('p'):
|
>>> for p in divs.xpath('p'):
|
||||||
>>> print p.extract()
|
... print p.extract()
|
||||||
|
|
||||||
For more details about relative XPaths see the `Location Paths`_ section in the
|
For more details about relative XPaths see the `Location Paths`_ section in the
|
||||||
XPath specification.
|
XPath specification.
|
||||||
@ -375,7 +375,7 @@ with groups of itemscopes and corresponding itemprops::
|
|||||||
... .//*[@itemscope]/*/@itemprop)''')
|
... .//*[@itemscope]/*/@itemprop)''')
|
||||||
... print " properties:", props.extract()
|
... print " properties:", props.extract()
|
||||||
... print
|
... print
|
||||||
...
|
|
||||||
current scope: [u'http://schema.org/Product']
|
current scope: [u'http://schema.org/Product']
|
||||||
properties: [u'name', u'aggregateRating', u'offers', u'description', u'review', u'review']
|
properties: [u'name', u'aggregateRating', u'offers', u'description', u'review', u'review']
|
||||||
|
|
||||||
@ -560,7 +560,7 @@ a :class:`~scrapy.http.HtmlResponse` object like this::
|
|||||||
3. Iterate over all ``<p>`` tags and print their class attribute::
|
3. Iterate over all ``<p>`` tags and print their class attribute::
|
||||||
|
|
||||||
for node in sel.xpath("//p"):
|
for node in sel.xpath("//p"):
|
||||||
... print node.xpath("@class").extract()
|
print node.xpath("@class").extract()
|
||||||
|
|
||||||
Selector examples on XML response
|
Selector examples on XML response
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
Loading…
x
Reference in New Issue
Block a user