2009-08-21 21:49:54 -03:00
|
|
|
.. _topics-index:
|
|
|
|
|
2009-12-12 16:51:59 -02:00
|
|
|
==============================
|
|
|
|
Scrapy |version| documentation
|
|
|
|
==============================
|
2009-08-21 21:49:54 -03:00
|
|
|
|
|
|
|
This documentation contains everything you need to know about Scrapy.
|
|
|
|
|
|
|
|
Getting help
|
|
|
|
============
|
|
|
|
|
|
|
|
Having trouble? We'd like to help!
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
* Try the :doc:`FAQ <faq>` -- it's got answers to some common questions.
|
2009-08-21 21:49:54 -03:00
|
|
|
* Looking for specific information? Try the :ref:`genindex` or :ref:`modindex`.
|
|
|
|
* Search for information in the `archives of the scrapy-users mailing list`_, or
|
|
|
|
`post a question`_.
|
2009-08-21 21:54:10 -03:00
|
|
|
* Ask a question in the `#scrapy IRC channel`_.
|
2011-09-25 13:06:24 -03:00
|
|
|
* Report bugs with Scrapy in our `issue tracker`_.
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2015-02-06 22:46:18 +05:30
|
|
|
.. _archives of the scrapy-users mailing list: https://groups.google.com/forum/#!forum/scrapy-users
|
|
|
|
.. _post a question: https://groups.google.com/forum/#!forum/scrapy-users
|
2009-08-21 21:49:54 -03:00
|
|
|
.. _#scrapy IRC channel: irc://irc.freenode.net/scrapy
|
2011-09-25 13:06:24 -03:00
|
|
|
.. _issue tracker: https://github.com/scrapy/scrapy/issues
|
2009-08-21 21:49:54 -03:00
|
|
|
|
|
|
|
|
|
|
|
First steps
|
|
|
|
===========
|
|
|
|
|
|
|
|
.. toctree::
|
|
|
|
:hidden:
|
|
|
|
|
|
|
|
intro/overview
|
|
|
|
intro/install
|
|
|
|
intro/tutorial
|
2011-04-28 02:28:39 -03:00
|
|
|
intro/examples
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`intro/overview`
|
2009-08-21 21:49:54 -03:00
|
|
|
Understand what Scrapy is and how it can help you.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`intro/install`
|
2009-08-21 21:49:54 -03:00
|
|
|
Get Scrapy installed on your computer.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`intro/tutorial`
|
2009-08-21 21:54:10 -03:00
|
|
|
Write your first Scrapy project.
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2011-04-28 02:28:39 -03:00
|
|
|
:doc:`intro/examples`
|
|
|
|
Learn more by playing with a pre-made Scrapy project.
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2011-04-28 02:28:39 -03:00
|
|
|
.. _section-basics:
|
|
|
|
|
|
|
|
Basic concepts
|
|
|
|
==============
|
2009-08-21 21:49:54 -03:00
|
|
|
|
|
|
|
.. toctree::
|
|
|
|
:hidden:
|
|
|
|
|
2010-08-19 00:04:52 -03:00
|
|
|
topics/commands
|
2009-08-21 21:49:54 -03:00
|
|
|
topics/spiders
|
|
|
|
topics/selectors
|
2015-03-19 05:25:15 +05:00
|
|
|
topics/items
|
2009-08-21 21:49:54 -03:00
|
|
|
topics/loaders
|
|
|
|
topics/shell
|
|
|
|
topics/item-pipeline
|
2010-08-17 14:27:48 -03:00
|
|
|
topics/feed-exports
|
2015-01-22 22:58:10 +05:00
|
|
|
topics/request-response
|
2011-05-18 12:32:34 -03:00
|
|
|
topics/link-extractors
|
2015-01-22 22:58:10 +05:00
|
|
|
topics/settings
|
|
|
|
topics/exceptions
|
|
|
|
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2010-08-19 00:04:52 -03:00
|
|
|
:doc:`topics/commands`
|
|
|
|
Learn about the command-line tool used to manage your Scrapy project.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/spiders`
|
2009-08-21 21:49:54 -03:00
|
|
|
Write the rules to crawl your websites.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/selectors`
|
2012-09-13 15:24:44 -03:00
|
|
|
Extract the data from web pages using XPath.
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/shell`
|
2009-08-27 18:24:08 -03:00
|
|
|
Test your extraction code in an interactive environment.
|
|
|
|
|
2015-03-19 05:25:15 +05:00
|
|
|
:doc:`topics/items`
|
|
|
|
Define the data you want to scrape.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/loaders`
|
2009-08-21 21:49:54 -03:00
|
|
|
Populate your items with the extracted data.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/item-pipeline`
|
2009-08-21 21:49:54 -03:00
|
|
|
Post-process and store your scraped data.
|
|
|
|
|
2010-08-17 14:27:48 -03:00
|
|
|
:doc:`topics/feed-exports`
|
|
|
|
Output your scraped data using different formats and storages.
|
|
|
|
|
2015-01-22 22:58:10 +05:00
|
|
|
:doc:`topics/request-response`
|
|
|
|
Understand the classes used to represent HTTP requests and responses.
|
|
|
|
|
2011-05-18 12:32:34 -03:00
|
|
|
:doc:`topics/link-extractors`
|
|
|
|
Convenient classes to extract links to follow from pages.
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2015-01-22 22:58:10 +05:00
|
|
|
:doc:`topics/settings`
|
|
|
|
Learn how to configure Scrapy and see all :ref:`available settings <topics-settings-ref>`.
|
|
|
|
|
|
|
|
:doc:`topics/exceptions`
|
|
|
|
See all available exceptions and their meaning.
|
|
|
|
|
|
|
|
|
2009-08-21 21:49:54 -03:00
|
|
|
Built-in services
|
|
|
|
=================
|
|
|
|
|
|
|
|
.. toctree::
|
|
|
|
:hidden:
|
|
|
|
|
|
|
|
topics/logging
|
|
|
|
topics/stats
|
|
|
|
topics/email
|
|
|
|
topics/telnetconsole
|
2010-06-09 13:46:22 -03:00
|
|
|
topics/webservice
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/logging`
|
2009-08-21 21:49:54 -03:00
|
|
|
Understand the simple logging facility provided by Scrapy.
|
2015-01-22 22:58:10 +05:00
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/stats`
|
2009-08-21 21:49:54 -03:00
|
|
|
Collect statistics about your scraping crawler.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/email`
|
2009-08-21 21:49:54 -03:00
|
|
|
Send email notifications when certain events occur.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/telnetconsole`
|
2009-08-21 21:49:54 -03:00
|
|
|
Inspect a running crawler using a built-in Python console.
|
|
|
|
|
2010-06-09 13:46:22 -03:00
|
|
|
:doc:`topics/webservice`
|
|
|
|
Monitor and control a crawler using a web service.
|
2009-08-21 21:49:54 -03:00
|
|
|
|
|
|
|
|
|
|
|
Solving specific problems
|
|
|
|
=========================
|
|
|
|
|
|
|
|
.. toctree::
|
|
|
|
:hidden:
|
|
|
|
|
|
|
|
faq
|
2012-06-21 20:03:33 +02:00
|
|
|
topics/debug
|
2012-09-21 00:12:46 +02:00
|
|
|
topics/contracts
|
2012-12-26 14:02:13 -02:00
|
|
|
topics/practices
|
|
|
|
topics/broad-crawls
|
2009-08-21 21:49:54 -03:00
|
|
|
topics/firefox
|
|
|
|
topics/firebug
|
|
|
|
topics/leaks
|
2015-04-15 14:26:08 -03:00
|
|
|
topics/media-pipeline
|
2010-08-23 21:28:32 -03:00
|
|
|
topics/ubuntu
|
2015-04-09 17:39:01 -03:00
|
|
|
topics/deploy
|
2012-09-20 18:50:59 -03:00
|
|
|
topics/autothrottle
|
2013-05-16 13:15:25 -03:00
|
|
|
topics/benchmarking
|
2011-09-02 13:12:27 -03:00
|
|
|
topics/jobs
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`faq`
|
2009-08-21 21:49:54 -03:00
|
|
|
Get answers to most frequently asked questions.
|
|
|
|
|
2012-06-21 20:03:33 +02:00
|
|
|
:doc:`topics/debug`
|
|
|
|
Learn how to debug common problems of your scrapy spider.
|
|
|
|
|
2012-09-21 00:12:46 +02:00
|
|
|
:doc:`topics/contracts`
|
2012-12-26 14:02:13 -02:00
|
|
|
Learn how to use contracts for testing your spiders.
|
|
|
|
|
|
|
|
:doc:`topics/practices`
|
|
|
|
Get familiar with some Scrapy common practices.
|
|
|
|
|
|
|
|
:doc:`topics/broad-crawls`
|
|
|
|
Tune Scrapy for crawling a lot domains in parallel.
|
2012-09-10 23:17:27 +02:00
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/firefox`
|
2009-08-21 21:49:54 -03:00
|
|
|
Learn how to scrape with Firefox and some useful add-ons.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/firebug`
|
2009-08-21 21:49:54 -03:00
|
|
|
Learn how to scrape efficiently using Firebug.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/leaks`
|
2009-08-21 21:49:54 -03:00
|
|
|
Learn how to find and get rid of memory leaks in your crawler.
|
|
|
|
|
2015-04-15 14:26:08 -03:00
|
|
|
:doc:`topics/media-pipeline`
|
2015-04-11 13:57:55 -03:00
|
|
|
Download files and/or images associated with your scraped items.
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2010-08-23 21:28:32 -03:00
|
|
|
:doc:`topics/ubuntu`
|
|
|
|
Install latest Scrapy packages easily on Ubuntu
|
|
|
|
|
2015-04-09 17:39:01 -03:00
|
|
|
:doc:`topics/deploy`
|
|
|
|
Deploying your Scrapy spiders and run them in a remote server.
|
2010-09-03 15:54:42 -03:00
|
|
|
|
2012-09-20 18:50:59 -03:00
|
|
|
:doc:`topics/autothrottle`
|
|
|
|
Adjust crawl rate dynamically based on load.
|
|
|
|
|
2013-05-16 13:15:25 -03:00
|
|
|
:doc:`topics/benchmarking`
|
|
|
|
Check how Scrapy performs on your hardware.
|
|
|
|
|
2011-09-02 13:12:27 -03:00
|
|
|
:doc:`topics/jobs`
|
|
|
|
Learn how to pause and resume crawls for large spiders.
|
|
|
|
|
2010-03-20 20:24:18 -03:00
|
|
|
.. _extending-scrapy:
|
|
|
|
|
2009-08-21 21:49:54 -03:00
|
|
|
Extending Scrapy
|
|
|
|
================
|
|
|
|
|
|
|
|
.. toctree::
|
|
|
|
:hidden:
|
|
|
|
|
|
|
|
topics/architecture
|
|
|
|
topics/downloader-middleware
|
|
|
|
topics/spider-middleware
|
|
|
|
topics/extensions
|
2012-08-28 18:31:03 -03:00
|
|
|
topics/api
|
2015-01-22 22:58:10 +05:00
|
|
|
topics/signals
|
|
|
|
topics/exporters
|
|
|
|
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/architecture`
|
2009-08-21 21:49:54 -03:00
|
|
|
Understand the Scrapy architecture.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/downloader-middleware`
|
2009-08-21 21:49:54 -03:00
|
|
|
Customize how pages get requested and downloaded.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/spider-middleware`
|
2009-08-21 21:49:54 -03:00
|
|
|
Customize the input and output of your spiders.
|
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/extensions`
|
2012-08-28 18:31:03 -03:00
|
|
|
Extend Scrapy with your custom functionality
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2012-08-28 18:31:03 -03:00
|
|
|
:doc:`topics/api`
|
|
|
|
Use it on extensions and middlewares to extend Scrapy functionality
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`topics/signals`
|
2009-08-21 21:49:54 -03:00
|
|
|
See all available signals and how to work with them.
|
|
|
|
|
2009-08-31 20:47:12 -03:00
|
|
|
:doc:`topics/exporters`
|
|
|
|
Quickly export your scraped items to a file (XML, CSV, etc).
|
|
|
|
|
2009-08-21 21:49:54 -03:00
|
|
|
|
|
|
|
All the rest
|
|
|
|
============
|
|
|
|
|
|
|
|
.. toctree::
|
|
|
|
:hidden:
|
|
|
|
|
2012-04-11 15:53:23 -03:00
|
|
|
news
|
2010-05-17 20:10:46 -03:00
|
|
|
contributing
|
2011-07-12 19:53:23 -03:00
|
|
|
versioning
|
2009-08-21 21:49:54 -03:00
|
|
|
experimental/index
|
|
|
|
|
2012-04-11 15:53:23 -03:00
|
|
|
:doc:`news`
|
|
|
|
See what has changed in recent Scrapy versions.
|
|
|
|
|
2010-05-17 20:10:46 -03:00
|
|
|
:doc:`contributing`
|
|
|
|
Learn how to contribute to the Scrapy project.
|
|
|
|
|
2011-07-12 19:53:23 -03:00
|
|
|
:doc:`versioning`
|
2009-08-21 21:54:10 -03:00
|
|
|
Understand Scrapy versioning and API stability.
|
2009-08-21 21:49:54 -03:00
|
|
|
|
2009-08-29 03:37:59 -03:00
|
|
|
:doc:`experimental/index`
|
2009-08-21 21:54:10 -03:00
|
|
|
Learn about bleeding-edge features.
|