mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-27 21:04:12 +00:00
52 lines
1.7 KiB
HTML
52 lines
1.7 KiB
HTML
{% extends "base_home.html" %}
|
|
|
|
{% block main-content %}
|
|
<h2>Welcome to Scrapy</h2>
|
|
|
|
<blockquote><p>
|
|
Scrapy is a high level scraping and web crawling framework for writing
|
|
spiders to crawl and parse web pages for all kinds of purposes, from
|
|
information retrieval to monitoring or testing web sites.
|
|
</p></blockquote>
|
|
|
|
<h3>Features</h3>
|
|
|
|
<dl>
|
|
<dt>Productive</dt>
|
|
<dd>Just write the rules to extract data from pages and let Scrapy crawl the entire web site for you</dd>
|
|
|
|
<dt>Scalable</dt>
|
|
<dd>Scrapy is being used in production to scrape more than 500 sites daily, all in one server</dd>
|
|
|
|
<dt>Distributed<dt>
|
|
<dd>If you need more processing/bandwith power Scrapy comes bundled with a master/slave cluster that lets you scrape using as many servers as possible</li>
|
|
|
|
<dt>Extensible</dt>
|
|
<dd>Scrapy was designed with extensibility in mind and so it provides several mechanisms to plug new code without having to touch the framework core</li>
|
|
|
|
<dt>Portable</dt>
|
|
<dd>Scrapy runs on Linux, Windows and Mac</dd>
|
|
|
|
<dt>100% Python</dt>
|
|
<dd>Scrapy is completely written in Python, which makes it very easy to hack it</dd>
|
|
</dl>
|
|
|
|
<h3>Project status</h3>
|
|
|
|
<p>
|
|
We're currently preparing the first official release of Scrapy with a
|
|
very stable API. At the moment we consider the API quite stable, and
|
|
we're writing documentation, tutorials and examples to make it easier
|
|
to start using it.
|
|
</p>
|
|
|
|
<h3>Where to start?</h3>
|
|
|
|
<p>
|
|
Please start by reading <a href="/docs/">the documentation</a> and checking
|
|
out the community resources where you can ask for further help while we
|
|
finish improving the documentation.
|
|
</p>
|
|
|
|
{% endblock %}
|