mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-21 06:52:59 +00:00
Merge pull request #4190 from Gallaecio/doctest
Make developer-tools doctests pass
This commit is contained in:
commit
8f7faaa63d
281
docs/_tests/quotes.html
Normal file
281
docs/_tests/quotes.html
Normal file
@ -0,0 +1,281 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<title>Quotes to Scrape</title>
|
||||
<link rel="stylesheet" href="/static/bootstrap.min.css">
|
||||
<link rel="stylesheet" href="/static/main.css">
|
||||
</head>
|
||||
<body>
|
||||
<div class="container">
|
||||
<div class="row header-box">
|
||||
<div class="col-md-8">
|
||||
<h1>
|
||||
<a href="/" style="text-decoration: none">Quotes to Scrape</a>
|
||||
</h1>
|
||||
</div>
|
||||
<div class="col-md-4">
|
||||
<p>
|
||||
|
||||
<a href="/login">Login</a>
|
||||
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
<div class="row">
|
||||
<div class="col-md-8">
|
||||
|
||||
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
|
||||
<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
|
||||
<span>by <small class="author" itemprop="author">Albert Einstein</small>
|
||||
<a href="/author/Albert-Einstein">(about)</a>
|
||||
</span>
|
||||
<div class="tags">
|
||||
Tags:
|
||||
<meta class="keywords" itemprop="keywords" content="change,deep-thoughts,thinking,world" / >
|
||||
|
||||
<a class="tag" href="/tag/change/page/1/">change</a>
|
||||
|
||||
<a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
|
||||
|
||||
<a class="tag" href="/tag/thinking/page/1/">thinking</a>
|
||||
|
||||
<a class="tag" href="/tag/world/page/1/">world</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
|
||||
<span class="text" itemprop="text">“It is our choices, Harry, that show what we truly are, far more than our abilities.”</span>
|
||||
<span>by <small class="author" itemprop="author">J.K. Rowling</small>
|
||||
<a href="/author/J-K-Rowling">(about)</a>
|
||||
</span>
|
||||
<div class="tags">
|
||||
Tags:
|
||||
<meta class="keywords" itemprop="keywords" content="abilities,choices" / >
|
||||
|
||||
<a class="tag" href="/tag/abilities/page/1/">abilities</a>
|
||||
|
||||
<a class="tag" href="/tag/choices/page/1/">choices</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
|
||||
<span class="text" itemprop="text">“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”</span>
|
||||
<span>by <small class="author" itemprop="author">Albert Einstein</small>
|
||||
<a href="/author/Albert-Einstein">(about)</a>
|
||||
</span>
|
||||
<div class="tags">
|
||||
Tags:
|
||||
<meta class="keywords" itemprop="keywords" content="inspirational,life,live,miracle,miracles" / >
|
||||
|
||||
<a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
|
||||
|
||||
<a class="tag" href="/tag/life/page/1/">life</a>
|
||||
|
||||
<a class="tag" href="/tag/live/page/1/">live</a>
|
||||
|
||||
<a class="tag" href="/tag/miracle/page/1/">miracle</a>
|
||||
|
||||
<a class="tag" href="/tag/miracles/page/1/">miracles</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
|
||||
<span class="text" itemprop="text">“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”</span>
|
||||
<span>by <small class="author" itemprop="author">Jane Austen</small>
|
||||
<a href="/author/Jane-Austen">(about)</a>
|
||||
</span>
|
||||
<div class="tags">
|
||||
Tags:
|
||||
<meta class="keywords" itemprop="keywords" content="aliteracy,books,classic,humor" / >
|
||||
|
||||
<a class="tag" href="/tag/aliteracy/page/1/">aliteracy</a>
|
||||
|
||||
<a class="tag" href="/tag/books/page/1/">books</a>
|
||||
|
||||
<a class="tag" href="/tag/classic/page/1/">classic</a>
|
||||
|
||||
<a class="tag" href="/tag/humor/page/1/">humor</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
|
||||
<span class="text" itemprop="text">“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”</span>
|
||||
<span>by <small class="author" itemprop="author">Marilyn Monroe</small>
|
||||
<a href="/author/Marilyn-Monroe">(about)</a>
|
||||
</span>
|
||||
<div class="tags">
|
||||
Tags:
|
||||
<meta class="keywords" itemprop="keywords" content="be-yourself,inspirational" / >
|
||||
|
||||
<a class="tag" href="/tag/be-yourself/page/1/">be-yourself</a>
|
||||
|
||||
<a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
|
||||
<span class="text" itemprop="text">“Try not to become a man of success. Rather become a man of value.”</span>
|
||||
<span>by <small class="author" itemprop="author">Albert Einstein</small>
|
||||
<a href="/author/Albert-Einstein">(about)</a>
|
||||
</span>
|
||||
<div class="tags">
|
||||
Tags:
|
||||
<meta class="keywords" itemprop="keywords" content="adulthood,success,value" / >
|
||||
|
||||
<a class="tag" href="/tag/adulthood/page/1/">adulthood</a>
|
||||
|
||||
<a class="tag" href="/tag/success/page/1/">success</a>
|
||||
|
||||
<a class="tag" href="/tag/value/page/1/">value</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
|
||||
<span class="text" itemprop="text">“It is better to be hated for what you are than to be loved for what you are not.”</span>
|
||||
<span>by <small class="author" itemprop="author">André Gide</small>
|
||||
<a href="/author/Andre-Gide">(about)</a>
|
||||
</span>
|
||||
<div class="tags">
|
||||
Tags:
|
||||
<meta class="keywords" itemprop="keywords" content="life,love" / >
|
||||
|
||||
<a class="tag" href="/tag/life/page/1/">life</a>
|
||||
|
||||
<a class="tag" href="/tag/love/page/1/">love</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
|
||||
<span class="text" itemprop="text">“I have not failed. I've just found 10,000 ways that won't work.”</span>
|
||||
<span>by <small class="author" itemprop="author">Thomas A. Edison</small>
|
||||
<a href="/author/Thomas-A-Edison">(about)</a>
|
||||
</span>
|
||||
<div class="tags">
|
||||
Tags:
|
||||
<meta class="keywords" itemprop="keywords" content="edison,failure,inspirational,paraphrased" / >
|
||||
|
||||
<a class="tag" href="/tag/edison/page/1/">edison</a>
|
||||
|
||||
<a class="tag" href="/tag/failure/page/1/">failure</a>
|
||||
|
||||
<a class="tag" href="/tag/inspirational/page/1/">inspirational</a>
|
||||
|
||||
<a class="tag" href="/tag/paraphrased/page/1/">paraphrased</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
|
||||
<span class="text" itemprop="text">“A woman is like a tea bag; you never know how strong it is until it's in hot water.”</span>
|
||||
<span>by <small class="author" itemprop="author">Eleanor Roosevelt</small>
|
||||
<a href="/author/Eleanor-Roosevelt">(about)</a>
|
||||
</span>
|
||||
<div class="tags">
|
||||
Tags:
|
||||
<meta class="keywords" itemprop="keywords" content="misattributed-eleanor-roosevelt" / >
|
||||
|
||||
<a class="tag" href="/tag/misattributed-eleanor-roosevelt/page/1/">misattributed-eleanor-roosevelt</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="quote" itemscope itemtype="http://schema.org/CreativeWork">
|
||||
<span class="text" itemprop="text">“A day without sunshine is like, you know, night.”</span>
|
||||
<span>by <small class="author" itemprop="author">Steve Martin</small>
|
||||
<a href="/author/Steve-Martin">(about)</a>
|
||||
</span>
|
||||
<div class="tags">
|
||||
Tags:
|
||||
<meta class="keywords" itemprop="keywords" content="humor,obvious,simile" / >
|
||||
|
||||
<a class="tag" href="/tag/humor/page/1/">humor</a>
|
||||
|
||||
<a class="tag" href="/tag/obvious/page/1/">obvious</a>
|
||||
|
||||
<a class="tag" href="/tag/simile/page/1/">simile</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<nav>
|
||||
<ul class="pager">
|
||||
|
||||
|
||||
<li class="next">
|
||||
<a href="/page/2/">Next <span aria-hidden="true">→</span></a>
|
||||
</li>
|
||||
|
||||
</ul>
|
||||
</nav>
|
||||
</div>
|
||||
<div class="col-md-4 tags-box">
|
||||
|
||||
<h2>Top Ten tags</h2>
|
||||
|
||||
<span class="tag-item">
|
||||
<a class="tag" style="font-size: 28px" href="/tag/love/">love</a>
|
||||
</span>
|
||||
|
||||
<span class="tag-item">
|
||||
<a class="tag" style="font-size: 26px" href="/tag/inspirational/">inspirational</a>
|
||||
</span>
|
||||
|
||||
<span class="tag-item">
|
||||
<a class="tag" style="font-size: 26px" href="/tag/life/">life</a>
|
||||
</span>
|
||||
|
||||
<span class="tag-item">
|
||||
<a class="tag" style="font-size: 24px" href="/tag/humor/">humor</a>
|
||||
</span>
|
||||
|
||||
<span class="tag-item">
|
||||
<a class="tag" style="font-size: 22px" href="/tag/books/">books</a>
|
||||
</span>
|
||||
|
||||
<span class="tag-item">
|
||||
<a class="tag" style="font-size: 14px" href="/tag/reading/">reading</a>
|
||||
</span>
|
||||
|
||||
<span class="tag-item">
|
||||
<a class="tag" style="font-size: 10px" href="/tag/friendship/">friendship</a>
|
||||
</span>
|
||||
|
||||
<span class="tag-item">
|
||||
<a class="tag" style="font-size: 8px" href="/tag/friends/">friends</a>
|
||||
</span>
|
||||
|
||||
<span class="tag-item">
|
||||
<a class="tag" style="font-size: 8px" href="/tag/truth/">truth</a>
|
||||
</span>
|
||||
|
||||
<span class="tag-item">
|
||||
<a class="tag" style="font-size: 6px" href="/tag/simile/">simile</a>
|
||||
</span>
|
||||
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
<footer class="footer">
|
||||
<div class="container">
|
||||
<p class="text-muted">
|
||||
Quotes by: <a href="https://www.goodreads.com/quotes">GoodReads.com</a>
|
||||
</p>
|
||||
<p class="copyright">
|
||||
Made with <span class='sh-red'>❤</span> by <a href="https://scrapinghub.com">Scrapinghub</a>
|
||||
</p>
|
||||
</div>
|
||||
</footer>
|
||||
</body>
|
||||
</html>
|
@ -39,7 +39,7 @@ Therefore, you should keep in mind the following things:
|
||||
.. _topics-inspector:
|
||||
|
||||
Inspecting a website
|
||||
===================================
|
||||
====================
|
||||
|
||||
By far the most handy feature of the Developer Tools is the `Inspector`
|
||||
feature, which allows you to inspect the underlying HTML code of
|
||||
@ -79,13 +79,23 @@ sections and tags of a webpage, which greatly improves readability. You can
|
||||
expand and collapse a tag by clicking on the arrow in front of it or by double
|
||||
clicking directly on the tag. If we expand the ``span`` tag with the ``class=
|
||||
"text"`` we will see the quote-text we clicked on. The `Inspector` lets you
|
||||
copy XPaths to selected elements. Let's try it out: Right-click on the ``span``
|
||||
tag, select ``Copy > XPath`` and paste it in the scrapy shell like so::
|
||||
copy XPaths to selected elements. Let's try it out.
|
||||
|
||||
First open the Scrapy shell at http://quotes.toscrape.com/ in a terminal:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
$ scrapy shell "http://quotes.toscrape.com/"
|
||||
(...)
|
||||
>>> response.xpath('/html/body/div/div[2]/div[1]/div[1]/span[1]/text()').getall()
|
||||
['"The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”]
|
||||
|
||||
Then, back to your web browser, right-click on the ``span`` tag, select
|
||||
``Copy > XPath`` and paste it in the Scrapy shell like so:
|
||||
|
||||
.. invisible-code-block: python
|
||||
|
||||
response = load_response('http://quotes.toscrape.com/', 'quotes.html')
|
||||
|
||||
>>> response.xpath('/html/body/div/div[2]/div[1]/div[1]/span[1]/text()').getall()
|
||||
['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”']
|
||||
|
||||
Adding ``text()`` at the end we are able to extract the first quote with this
|
||||
basic selector. But this XPath is not really that clever. All it does is
|
||||
@ -112,13 +122,13 @@ see each quote:
|
||||
|
||||
With this knowledge we can refine our XPath: Instead of a path to follow,
|
||||
we'll simply select all ``span`` tags with the ``class="text"`` by using
|
||||
the `has-class-extension`_::
|
||||
the `has-class-extension`_:
|
||||
|
||||
>>> response.xpath('//span[has-class("text")]/text()').getall()
|
||||
['"The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”,
|
||||
'“It is our choices, Harry, that show what we truly are, far more than our abilities.”',
|
||||
'“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”',
|
||||
(...)]
|
||||
>>> response.xpath('//span[has-class("text")]/text()').getall()
|
||||
['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”',
|
||||
'“It is our choices, Harry, that show what we truly are, far more than our abilities.”',
|
||||
'“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”',
|
||||
...]
|
||||
|
||||
And with one simple, cleverer XPath we are able to extract all quotes from
|
||||
the page. We could have constructed a loop over our first XPath to increase
|
||||
@ -159,7 +169,11 @@ The page is quite similar to the basic `quotes.toscrape.com`_-page,
|
||||
but instead of the above-mentioned ``Next`` button, the page
|
||||
automatically loads new quotes when you scroll to the bottom. We
|
||||
could go ahead and try out different XPaths directly, but instead
|
||||
we'll check another quite useful command from the scrapy shell::
|
||||
we'll check another quite useful command from the scrapy shell:
|
||||
|
||||
.. skip: next
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
$ scrapy shell "quotes.toscrape.com/scroll"
|
||||
(...)
|
||||
|
@ -8,7 +8,6 @@ addopts =
|
||||
--ignore=docs/_ext
|
||||
--ignore=docs/conf.py
|
||||
--ignore=docs/news.rst
|
||||
--ignore=docs/topics/developer-tools.rst
|
||||
--ignore=docs/topics/dynamic-content.rst
|
||||
--ignore=docs/topics/items.rst
|
||||
--ignore=docs/topics/leaks.rst
|
||||
|
Loading…
x
Reference in New Issue
Block a user