mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-24 18:24:00 +00:00
Merge pull request #1188 from eliasdorneles/favoring_web_scraping_over_screen_scraping
[MRG+1] Favoring web scraping over screen scraping in the descriptions
This commit is contained in:
commit
3d2b74a6ff
@ -17,7 +17,7 @@ Scrapy
|
||||
Overview
|
||||
========
|
||||
|
||||
Scrapy is a fast high-level web crawling and screen scraping framework, used to
|
||||
Scrapy is a fast high-level web crawling and web scraping framework, used to
|
||||
crawl websites and extract structured data from their pages. It can be used for
|
||||
a wide range of purposes, from data mining to monitoring and automated testing.
|
||||
|
||||
|
4
debian/control
vendored
4
debian/control
vendored
@ -13,8 +13,8 @@ Depends: ${python:Depends}, python-lxml, python-twisted, python-openssl,
|
||||
Recommends: python-setuptools
|
||||
Conflicts: python-scrapy, scrapy, scrapy-0.11
|
||||
Provides: python-scrapy, scrapy
|
||||
Description: Python web crawling and screen scraping framework
|
||||
Scrapy is a fast high-level web crawling and screen scraping framework,
|
||||
Description: Python web crawling and web scraping framework
|
||||
Scrapy is a fast high-level web crawling and web scraping framework,
|
||||
used to crawl websites and extract structured data from their pages.
|
||||
It can be used for a wide range of purposes, from data mining to
|
||||
monitoring and automated testing.
|
||||
|
@ -8,10 +8,9 @@ Scrapy is an application framework for crawling web sites and extracting
|
||||
structured data which can be used for a wide range of useful applications, like
|
||||
data mining, information processing or historical archival.
|
||||
|
||||
Even though Scrapy was originally designed for `screen scraping`_ (more
|
||||
precisely, `web scraping`_), it can also be used to extract data using APIs
|
||||
(such as `Amazon Associates Web Services`_) or as a general purpose web
|
||||
crawler.
|
||||
Even though Scrapy was originally designed for `web scraping`_, it can also be
|
||||
used to extract data using APIs (such as `Amazon Associates Web Services`_) or
|
||||
as a general purpose web crawler.
|
||||
|
||||
|
||||
Walk-through of an example spider
|
||||
@ -171,7 +170,6 @@ your code in Scrapy projects and `join the community`_. Thanks for your
|
||||
interest!
|
||||
|
||||
.. _join the community: http://scrapy.org/community/
|
||||
.. _screen scraping: http://en.wikipedia.org/wiki/Screen_scraping
|
||||
.. _web scraping: http://en.wikipedia.org/wiki/Web_scraping
|
||||
.. _Amazon Associates Web Services: http://aws.amazon.com/associates/
|
||||
.. _Amazon S3: http://aws.amazon.com/s3/
|
||||
|
@ -8,7 +8,7 @@ When you're scraping web pages, the most common task you need to perform is
|
||||
to extract data from the HTML source. There are several libraries available to
|
||||
achieve this:
|
||||
|
||||
* `BeautifulSoup`_ is a very popular screen scraping library among Python
|
||||
* `BeautifulSoup`_ is a very popular web scraping library among Python
|
||||
programmers which constructs a Python object based on the structure of the
|
||||
HTML code and also deals with bad markup reasonably well, but it has one
|
||||
drawback: it's slow.
|
||||
|
@ -1,5 +1,5 @@
|
||||
"""
|
||||
Scrapy - a web crawling and screen scraping framework written for Python
|
||||
Scrapy - a web crawling and web scraping framework written for Python
|
||||
"""
|
||||
|
||||
__all__ = ['__version__', 'version_info', 'optional_features', 'twisted_version',
|
||||
|
2
setup.py
2
setup.py
@ -10,7 +10,7 @@ setup(
|
||||
name='Scrapy',
|
||||
version=version,
|
||||
url='http://scrapy.org',
|
||||
description='A high-level Web Crawling and Screen Scraping framework',
|
||||
description='A high-level Web Crawling and Web Scraping framework',
|
||||
long_description=open('README.rst').read(),
|
||||
author='Scrapy developers',
|
||||
maintainer='Pablo Hoffman',
|
||||
|
Loading…
x
Reference in New Issue
Block a user