Pablo Hoffman
d6867f7984
removed sphinx warnings about duplicate reference names 'this page'
2012-09-13 15:26:45 -03:00
Pablo Hoffman
f4a17ec272
removed references to Scrapy Snippets site
2012-09-03 22:19:15 -03:00
Pablo Hoffman
4a5f70278f
minor tidy up to installation guide windows notes
2012-08-29 15:44:24 -03:00
Pablo Hoffman
098d892c03
simplified installation guide to only mention pip/easy_install mechanism, and provide hints for Windows users
2012-08-29 15:37:05 -03:00
Daniel Graña
abcc8c9f63
Recommend pypi as single way to install on Windows
2012-08-06 10:21:13 -03:00
Valentin-Costel Hăloiu
00bfb37e79
Update master
2012-07-04 06:55:01 +03:00
Pablo Hoffman
2fb5e62c39
doc: update overview page to point to the genspider command. refs #107
2012-04-19 02:37:22 -03:00
Pablo Hoffman
4f28ffcb2c
removed no longer needed dependency on simplejson
2012-04-10 16:01:36 -03:00
Pablo Hoffman
6e8edbd72e
switched default selectors backend to lxml
2012-04-10 15:52:14 -03:00
Pablo Hoffman
b6ae266546
Removed (very old and possibly broken) backwards compatibility support for Twisted 2.5
2012-03-15 00:28:24 -03:00
Pablo Hoffman
e521da2e2f
Dropped support for Python 2.5. See: http://blog.scrapy.org/scrapy-dropping-support-for-python-25
2012-03-01 08:18:12 -02:00
Pablo Hoffman
0be421fbf0
fixed reference to tutorial directory
2011-12-23 18:57:11 -02:00
Daniel Graña
bcb31988f2
change tutorial to follow changes on dmoz site
2011-12-14 13:03:31 -02:00
Pablo Hoffman
ade5efdc61
added -o option to scrapy crawl, a convenient shortcut for using feed exports
2011-10-22 20:53:49 -02:00
Pablo Hoffman
431441cb52
updated documentation to remove references to old issue tracker and mercurial repos
2011-09-25 13:06:24 -03:00
Pablo Hoffman
76af0cdd44
updated documentation and code to use -s instead of --set option
2011-09-01 14:35:37 -03:00
Pablo Hoffman
a3697421c0
some minor updates to documentation
2011-08-11 09:19:59 -03:00
Pablo Hoffman
5da6ffb57b
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-08-11 09:11:19 -03:00
Pablo Hoffman
bc2d2183e9
fixed import in doc
2011-08-11 09:11:08 -03:00
Pablo Hoffman
c59340150f
Added cached DNS resolver based on old caching resolver extension from scrapy.contrib.resolver. This new one is *not* an extension, it comes builtin and always enabled.
2011-07-27 03:45:15 -03:00
Pablo Hoffman
57c43fdce6
added SitemapSpider, with tests and doc
2011-06-15 11:54:34 -03:00
Pablo Hoffman
5bf733b6f6
Changed default representation of items to pretty-printed dicts. This improves
...
default logging by making log more readable in the default case, for both Scraped and Dropped lines.
Projects can still customize how items are represented by overriding the item's __str__ method, as usual.
2011-06-03 01:13:01 -03:00
Pablo Hoffman
951ba507f9
Removed support for default values in Scrapy items, which have proven confusing in the past
2011-05-19 21:42:46 -03:00
Pablo Hoffman
503f302010
removed remaining references to scheduler middleware from doc, as it will be removed on next release
2011-05-18 19:48:48 -03:00
Pablo Hoffman
7f97259ba7
added w3lib to requirements, in installation guide
2011-05-01 11:14:57 -03:00
Pablo Hoffman
bb2b67c862
updated tutorial to use 'dmoz' as the name of the spider instead of 'dmoz.org', so that it's more similar to the dirbot example project
2011-04-28 09:31:57 -03:00
Pablo Hoffman
bf73002428
removed googledir example, replaced by dirbot project on github. updated docs accordingly
2011-04-28 02:28:39 -03:00
Pablo Hoffman
181d1c09ae
Fixed typo and code indentation in the doc. Closes #307 and #308
2011-02-09 11:19:46 -02:00
Pablo Hoffman
426b6fa100
docs/intro/install.rst: added -U flag to easy_install command
2010-11-22 13:50:19 -02:00
Pablo Hoffman
ac007802d6
Simplified installation guide, including lxml as alternative dependency to libxml2. Closes #280
2010-11-17 21:32:23 -02:00
Pablo Hoffman
5a5364d0c1
Updated documentation to point out that simplejson is now required if using Python 2.5, and to recommended switching to Python 2.6
2010-11-16 03:31:04 -02:00
Pablo Hoffman
5f65c26080
Some minor improvements to feature list in Scrapy at a Glance documentation page
2010-10-16 19:02:08 -02:00
Pablo Hoffman
b4fbc6c5fa
Updated Scrapy Tutorial to reference feed exports, instead a custom written pipeline, and extended item pipeline documentation to include a JSON writer.
2010-10-10 20:31:05 -02:00
Pablo Hoffman
f4accb6c7f
Updated dmoz xpaths of Scrapy tutorial
2010-10-07 18:22:01 -02:00
Pablo Hoffman
e3d67d74f7
docs/intro/overview.rst: add example of scraped data and introduce loaders
2010-09-06 10:04:00 -03:00
Pablo Hoffman
00d55fbbd1
Updated 'Scrapy at a glance' document replacing item pipeline example by a simpler usage of feed exports
2010-09-05 23:38:37 -03:00
Pablo Hoffman
e2ed27e4fd
Added documentation for Ubuntu packages. Refs #211
2010-08-23 21:28:32 -03:00
Pablo Hoffman
9aefa242d5
Applied documentation patch provided by Lucian Ursu ( closes #207 )
2010-08-21 01:26:35 -03:00
Pablo Hoffman
1d3b9e2ca8
Scrapy shell refactoring
2010-08-20 11:26:14 -03:00
Pablo Hoffman
7858244dca
Scrapy shell: moved python console starting code to scrapy.utils.console and get rid of noisy console banners
2010-08-20 01:33:02 -03:00
Pablo Hoffman
34554da201
Deprecated scrapy-ctl.py command in favour of simpler "scrapy" command. Closes #199 . Also updated documenation accordingly and added convenient scrapy.bat script for running from Windows.
...
--HG--
rename : debian/scrapy-ctl.1 => debian/scrapy.1
rename : docs/topics/scrapy-ctl.rst => docs/topics/cmdline.rst
2010-08-18 19:48:32 -03:00
Pablo Hoffman
e741a807d2
Added new Feed exports extension with documentation and storage tests. Closes #197 .
...
Also deprecated File export pipeline (to be removed in Scrapy 0.11).
Still need to add tests for FeedExport main extension code.
2010-08-17 14:27:48 -03:00
Pablo Hoffman
43d47e5d9b
Some improvements to Item Pipeline ( closes #195 ):
...
* Made Item Pipeline Manager a subclass of scrapy.middleware.MiddlewareManager
* Added open_spider/close_spider methods with support for returning deferreds from them
* Inverted the process_item() arguments to be more friendly with deferred
callbacks (backwards compatibility kept through arguments introspection)
* Updated documentation with new methods and process_item() arguments change
2010-08-12 10:48:37 -03:00
Ismael Carnales
e145ec686c
Replaced default spider manager (TwistedPluginSpiderManger) with a simpler one that doesn't depend on Twisted Plugins infrastructure.
2010-07-30 17:30:32 -03:00
Pablo Hoffman
9e37ec4230
fixed documentation typo ( closes #151 )
2010-07-13 19:03:02 -03:00
Pablo Hoffman
6a33d6c4d0
* Added Scrapy Web Service with documentation and tests.
...
* Marked Web Console as deprecated.
* Removed Web Console documentation to discourage its use.
2010-06-09 13:46:22 -03:00
Pablo Hoffman
81f6502e37
Automated merge with http://hg.scrapy.org/scrapy-0.8/
2010-04-24 18:22:13 -03:00
Pablo Hoffman
2121a30c74
added note about installing Zope.Interface in windows platforms
2010-04-24 18:19:52 -03:00
Rolando Espinoza La fuente
db5c3df679
SEP12 implementation
...
* Rename BaseSpider.domain_name to BaseSpider.name
This patch implements the domain_name to name change in BaseSpider class and
change all spider instantiations to use the new attribute.
* Add allowed_domains to spider
This patch implements the merging of spider.domain_name and
spider.extra_domain_names in spider.allowed_domains for offsite checking
purposes.
Note that spider.domain_name is not touched by this patch, only not used.
* Remove spider.domain_name references from scrapy.stats
* Rename domain_stats to spider_stats in MemoryStatsCollector
* Use ``spider`` instead of ``domain`` in SimpledbStatsCollector
* Rename domain_stats_history table to spider_data_history and rename domain
field to spider in MysqlStatsCollector
* Refactor genspider command
The new signature for genspider is: genspider [options] <domain_name>.
Genspider uses domain_name for spider name and for the module name.
* Remove spider.domain_name references
* Update crawl command signature <spider|url>
* docs: updated references to domain_name
* examples/experimental: use spider.name
* genspider: require <name> <domain>
* spidermanager: renamed crawl_domain to crawl_spider_name
* spiderctl: updated references of *domain* to spider
* added backward compatiblity with legacy spider's attributes
'domain_name' and 'extra_domain_names'
2010-04-01 18:27:22 -03:00
Pablo Hoffman
1dfc79b5d0
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-03-20 20:48:11 -03:00