1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 18:08:14 +00:00

151 Commits

Author SHA1 Message Date
Pablo Hoffman
e2ed27e4fd Added documentation for Ubuntu packages. Refs #211 2010-08-23 21:28:32 -03:00
Pablo Hoffman
9aefa242d5 Applied documentation patch provided by Lucian Ursu (closes #207) 2010-08-21 01:26:35 -03:00
Pablo Hoffman
1d3b9e2ca8 Scrapy shell refactoring 2010-08-20 11:26:14 -03:00
Pablo Hoffman
7858244dca Scrapy shell: moved python console starting code to scrapy.utils.console and get rid of noisy console banners 2010-08-20 01:33:02 -03:00
Pablo Hoffman
34554da201 Deprecated scrapy-ctl.py command in favour of simpler "scrapy" command. Closes #199. Also updated documenation accordingly and added convenient scrapy.bat script for running from Windows.
--HG--
rename : debian/scrapy-ctl.1 => debian/scrapy.1
rename : docs/topics/scrapy-ctl.rst => docs/topics/cmdline.rst
2010-08-18 19:48:32 -03:00
Pablo Hoffman
e741a807d2 Added new Feed exports extension with documentation and storage tests. Closes #197.
Also deprecated File export pipeline (to be removed in Scrapy 0.11).

Still need to add tests for FeedExport main extension code.
2010-08-17 14:27:48 -03:00
Pablo Hoffman
43d47e5d9b Some improvements to Item Pipeline (closes #195):
* Made Item Pipeline Manager a subclass of scrapy.middleware.MiddlewareManager
* Added open_spider/close_spider methods with support for returning deferreds from them
* Inverted the process_item() arguments to be more friendly with deferred
  callbacks (backwards compatibility kept through arguments introspection)
* Updated documentation with new methods and process_item() arguments change
2010-08-12 10:48:37 -03:00
Ismael Carnales
e145ec686c Replaced default spider manager (TwistedPluginSpiderManger) with a simpler one that doesn't depend on Twisted Plugins infrastructure. 2010-07-30 17:30:32 -03:00
Pablo Hoffman
9e37ec4230 fixed documentation typo (closes #151) 2010-07-13 19:03:02 -03:00
Pablo Hoffman
6a33d6c4d0 * Added Scrapy Web Service with documentation and tests.
* Marked Web Console as deprecated.
* Removed Web Console documentation to discourage its use.
2010-06-09 13:46:22 -03:00
Pablo Hoffman
81f6502e37 Automated merge with http://hg.scrapy.org/scrapy-0.8/ 2010-04-24 18:22:13 -03:00
Pablo Hoffman
2121a30c74 added note about installing Zope.Interface in windows platforms 2010-04-24 18:19:52 -03:00
Rolando Espinoza La fuente
db5c3df679 SEP12 implementation
* Rename BaseSpider.domain_name to BaseSpider.name

    This patch implements the domain_name to name change in BaseSpider class and
    change all spider instantiations to use the new attribute.

  * Add allowed_domains to spider

    This patch implements the merging of spider.domain_name and
    spider.extra_domain_names in spider.allowed_domains for offsite checking
    purposes.

    Note that spider.domain_name is not touched by this patch, only not used.

  * Remove spider.domain_name references from scrapy.stats

    * Rename domain_stats to spider_stats in MemoryStatsCollector
    * Use ``spider`` instead of ``domain`` in SimpledbStatsCollector
    * Rename domain_stats_history table to spider_data_history and rename domain
    field to spider in MysqlStatsCollector

  * Refactor genspider command

    The new signature for genspider is: genspider [options] <domain_name>.

    Genspider uses domain_name for spider name and for the module name.

  * Remove spider.domain_name references

  * Update crawl command signature <spider|url>

  * docs: updated references to domain_name

  * examples/experimental: use spider.name

  * genspider: require <name> <domain>

  * spidermanager: renamed crawl_domain to crawl_spider_name

  * spiderctl: updated references of *domain* to spider

  * added backward compatiblity with legacy spider's attributes
    'domain_name' and 'extra_domain_names'
2010-04-01 18:27:22 -03:00
Pablo Hoffman
1dfc79b5d0 Automated merge with http://hg.scrapy.org/scrapy-0.8 2010-03-20 20:48:11 -03:00
Pablo Hoffman
99a876754c Improved "What else?" section of "Scrapy at a glance" overview 2010-03-20 20:24:18 -03:00
Pablo Hoffman
264cd2e035 Automated merge with http://hg.scrapy.org/scrapy-0.8 2010-03-19 10:32:42 -03:00
Daniel Grana
17091902f3 Explicity say where to save item class in "Defining our item" section of tutorial 2010-03-12 14:12:49 -02:00
Rolando Espinoza La fuente
7235040936 merged upstream 2010-02-19 17:41:45 -04:00
Pablo Hoffman
48739ae60c install.rst: added explanation about why libxml2 2.6.28 or above is required 2010-01-13 12:20:24 -02:00
Rolando Espinoza La fuente
1402da31c5 docs: fixed typos and updated code examples 2010-01-11 12:28:22 -04:00
Pablo Hoffman
d60412ce19 titlecased Scrapy easy_install and some fixes to sign_release.sh script 2009-12-13 14:23:31 -02:00
Pablo Hoffman
7728a23e99 Changed item pipeline API to pass spider references (instead of domain names) to process_item() method 2009-11-06 13:46:36 -02:00
Pablo Hoffman
9b5fef4f48 fixed typo in intro/install doc (thanks phaithful) 2009-10-28 09:34:31 -02:00
Pablo Hoffman
37d9e015bb minor fix to tutorial 2009-10-07 20:15:49 -02:00
Pablo Hoffman
a0eec7eaf6 some typos fixes and updates to install doc 2009-09-29 09:44:02 -03:00
Ismael Carnales
1646482bef reformatted installation guide 2009-09-29 08:41:34 -03:00
Ismael Carnales
5862ba7db7 modified doc to reflect the new spider callback return policy (lists not needed) 2009-09-22 11:25:40 -03:00
Pablo Hoffman
6e93872955 updated installation guide for using releases 2009-09-17 11:06:55 -03:00
Pablo Hoffman
8a074c9cb5 removed scrapy-admin.py command, and left only scrapy-ctl as the only scrapy command 2009-08-24 15:43:36 -03:00
Ismael Carnales
39540b188a changed torrent in overview doc 2009-08-24 15:11:04 -03:00
Ismael Carnales
4b5aa30867 minor update to tutorial 2009-08-24 14:34:17 -03:00
Pablo Hoffman
9635a7839c rearranged documentation into a better organization
--HG--
rename : docs/topics/index.rst => docs/index.rst
2009-08-21 21:49:54 -03:00
Ismael Carnales
c08d3aa9cc updated tutorial to use new items api 2009-08-21 14:16:27 -03:00
Pablo Hoffman
33b53c59d5 moved scrapy.xpath to scrapy.selector
--HG--
rename : scrapy/xpath/__init__.py => scrapy/selector/__init__.py
rename : scrapy/xpath/document.py => scrapy/selector/document.py
rename : scrapy/xpath/factories.py => scrapy/selector/factories.py
2009-08-19 21:50:52 -03:00
Pablo Hoffman
e8504a054c moved scrapy.newitem to scrapy.item and declared newitem api officially stable. updated docs and example project. deprecated old ScrapedItem 2009-08-19 21:39:58 -03:00
Ismael Carnales
48b40bd620 renamed x method of selectors to select 2009-08-17 15:58:06 -03:00
Pablo Hoffman
5aeab5b291 converted scrapy.item package to module
--HG--
rename : scrapy/item/models.py => scrapy/item.py
2009-08-12 21:31:50 -03:00
Pablo Hoffman
b296d4169e minor doc update for making it more windows-friendly 2009-08-09 17:08:42 -03:00
Pablo Hoffman
38c3f7d0b4 Some changes to logging of scraped items:
1. "Scraped Item" log level changed to DEBUG
2. "Dropped Item" log level changed to WARNING
3. added "Passed Item" log message with INFO level
2009-07-23 11:49:48 -03:00
Pablo Hoffman
e43e28bf1d minimal doc improvement 2009-07-23 09:12:49 -03:00
Ismael Carnales
6d24ae5920 added reference to working with relative xpaths in the tutorial 2009-07-23 09:05:14 -03:00
Pablo Hoffman
1125825996 doc: minor updates to tutorial 2009-07-15 22:10:00 -03:00
Pablo Hoffman
ae7333d598 added simplejson optional dependency to doc 2009-07-09 16:49:20 -03:00
Pablo Hoffman
0c4c153819 improved Scrapy documentation index for better usability 2009-07-01 09:51:57 -03:00
Pablo Hoffman
80cd534f92 removed redundant botname from log lines 2009-06-25 16:48:04 -03:00
Pablo Hoffman
b1dad251ae Deprecated Common Downloader Middleware and added DefaultHeaders Downloader
Middleware
2009-05-25 14:41:06 -03:00
Pablo Hoffman
befd28eef4 docs/tutorial: added reminder about adding pipeline to ITEM_PIPELINES settings (thanks jamie) 2009-05-20 00:57:44 -03:00
Pablo Hoffman
04610a25dc fixed bug in tutorial regarding csv writer pipeline, and other minor corrections 2009-05-19 03:07:08 -03:00
Daniel Grana
abfc52cd17 docs: modify install document to mercurial based installation instructions 2009-05-19 01:50:44 -03:00
Pablo Hoffman
86498abdf1 Sorted out Link Extractors organization by moving all them to
scrapy.contrib.linkextractors.

The most relevant being:
    scrapy.link.extractors.RegexLinkExtractor

which was moved to:
    scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor

The old location still works but throws a deprecation warning. It will be
removed before the 0.7 release.

Documentation and tests were also updated.

Also, in this changeset, a new regex-based link extractor was added to
scrapy.contrib.linkextractors.regex.

--HG--
rename : scrapy/tests/sample_data/link_extractor/regex_linkextractor.html => scrapy/tests/sample_data/link_extractor/sgml_linkextractor.html
rename : scrapy/tests/test_link.py => scrapy/tests/test_contrib_linkextractors.py
2009-05-18 19:19:37 -03:00