Pablo Hoffman
bfda9ec319
added clarification about scrapy versioning including the recently adopted odd/even versioning scheme
...
--HG--
rename : docs/api-stability.rst => docs/versioning.rst
2011-07-12 19:53:23 -03:00
Pablo Hoffman
4fde1ef94d
added CloseSpider exception, to manually close spiders
2011-07-12 14:24:10 -03:00
Pablo Hoffman
db5cae7c03
SitemapSpider: added support for filtering which sitemaps to follow (patch contributed by Rolando Espinoza). closes #330
2011-06-23 18:18:29 -03:00
Pablo Hoffman
57c43fdce6
added SitemapSpider, with tests and doc
2011-06-15 11:54:34 -03:00
Pablo Hoffman
91dc46539f
added LogStats extension for periodically logging basic stats (like crawled pages and scraped items)
2011-06-14 00:50:05 -03:00
Pablo Hoffman
841e9913db
renamed CLOSESPIDER_ITEMPASSED setting to CLOSESPIDER_ITEMCOUNT, to follow the refactoring done in r2630
2011-06-13 16:58:51 -03:00
Pablo Hoffman
474cba512c
simplified MemoryDebugger extension to use stats for dumping memory debugging info
2011-06-06 03:13:28 -03:00
Pablo Hoffman
5fbc32c015
call stats collector engine_stopped() after the engine is closed (to make sure all data from extensions has been collected), and added that method to documented api
2011-06-06 03:12:40 -03:00
Pablo Hoffman
9d9c8877da
added 'scrapy edit' command
2011-06-05 22:02:56 -03:00
Pablo Hoffman
03ae481cad
removed experimental crawlspider v2
2011-06-03 18:23:23 -03:00
Pablo Hoffman
5bf733b6f6
Changed default representation of items to pretty-printed dicts. This improves
...
default logging by making log more readable in the default case, for both Scraped and Dropped lines.
Projects can still customize how items are represented by overriding the item's __str__ method, as usual.
2011-06-03 01:13:01 -03:00
Pablo Hoffman
1bc2339bb8
Merged item passed and item scraped concepts, as they have often proved
...
confusing in the past.
This means:
* original item_scraped signal was removed
* original item_passed signal was renamed to item_scraped
* old log lines "Scraped Item..." removed
* old log lines "Passed Item..." renamed to "Scraped Item..."
2011-06-03 01:13:00 -03:00
Pablo Hoffman
e6091df551
fixed doc typo
2011-05-30 09:04:31 -03:00
Pablo Hoffman
1d98fc8fb5
added spider_error signal
2011-05-29 22:38:17 -03:00
Pablo Hoffman
2fa0f75f2d
added COOKIES_ENABLED setting to support disabling the cookies middleware
2011-05-27 00:35:34 -03:00
Pablo Hoffman
d72d3f4607
stack trace dump extension: also dump engine status, and support triggering it with SIGQUIT, besides SIGUSR2
2011-05-20 03:25:00 -03:00
Pablo Hoffman
951ba507f9
Removed support for default values in Scrapy items, which have proven confusing in the past
2011-05-19 21:42:46 -03:00
Pablo Hoffman
503f302010
removed remaining references to scheduler middleware from doc, as it will be removed on next release
2011-05-18 19:48:48 -03:00
Pablo Hoffman
3fd17432cf
fixed outdated documentation
2011-05-18 14:46:20 -03:00
Pablo Hoffman
9016e7e993
added role to link to scrapy source code (not yet used)
2011-05-18 14:43:34 -03:00
Pablo Hoffman
cd85c12c33
Some Link extractor improvements:
...
* added support for ignoring common file extensions that are not followed if
they occur in links
* fixed link extractor documentation issues
* slighly improved performance of applying filters
* added link to link extractors doc from documentation index
2011-05-18 12:32:34 -03:00
Pablo Hoffman
495152bd50
disabled verbose depth stats collection by default, added DEPTH_STATS_VERBOSE setting to enable it
2011-05-18 11:04:48 -03:00
Pablo Hoffman
accb6ed830
dump stats to log by default (ie. change default value of STATS_DUMP to True)
2011-05-17 22:42:05 -03:00
Pablo Hoffman
7f97259ba7
added w3lib to requirements, in installation guide
2011-05-01 11:14:57 -03:00
Pablo Hoffman
4a83167698
fixed small doc typo
2011-04-30 01:35:30 -03:00
Pablo Hoffman
bb2b67c862
updated tutorial to use 'dmoz' as the name of the spider instead of 'dmoz.org', so that it's more similar to the dirbot example project
2011-04-28 09:31:57 -03:00
Pablo Hoffman
bf73002428
removed googledir example, replaced by dirbot project on github. updated docs accordingly
2011-04-28 02:28:39 -03:00
Pablo Hoffman
b12dd76bb8
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-04-25 09:31:18 -03:00
Pablo Hoffman
678f08bc1b
added warning about using 'parse' as callback in crawl spider rules
2011-04-25 09:30:42 -03:00
Pablo Hoffman
ad496eb3b6
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-04-14 12:36:27 -03:00
Pablo Hoffman
ecb4f44cbc
Added clarification on how to work with local settings and scrapy deploy
2011-04-14 12:36:09 -03:00
Pablo Hoffman
3ee2c94e93
Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it
2011-04-06 14:54:48 -03:00
Pablo Hoffman
8a5c08a6bc
added join_multivalued parameter to CsvItemExporter
2011-03-24 13:15:52 -03:00
Pablo Hoffman
3954e600ca
added DBM storage backend for HTTP cache
2011-03-23 21:32:02 -03:00
Pablo Hoffman
cfd11df539
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-02-24 15:28:57 -02:00
Pablo Hoffman
8f7e163b04
Fixed wrong method name in downloader middleware documentation
2011-02-24 15:26:32 -02:00
Daniel Grana
c55355642c
fix FAQ typos reported by marlun_ at #scrapy IRC channel
2011-02-16 08:57:42 -02:00
Pablo Hoffman
1fb55bdaf0
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-02-15 07:25:12 -02:00
Pablo Hoffman
16d9a33951
added FAQ entry about working with big data feeds
2011-02-15 07:24:52 -02:00
Pablo Hoffman
936353d5f1
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-02-09 11:20:46 -02:00
Pablo Hoffman
181d1c09ae
Fixed typo and code indentation in the doc. Closes #307 and #308
2011-02-09 11:19:46 -02:00
Pablo Hoffman
c91f0d9ea1
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-02-04 13:39:54 -02:00
Pablo Hoffman
c5499ead73
Clarified behaviour when multiple rules match the same link in CrawlSpider
2011-02-04 13:39:12 -02:00
Pablo Hoffman
d7f193cbea
bumped version to 0.13 in documentation
2011-01-02 17:29:43 -02:00
Pablo Hoffman
b56e933be9
bumped version to 0.12 in documentation
2011-01-02 17:28:33 -02:00
Pablo Hoffman
5879389ad0
Bumped version to 0.12
2011-01-02 16:16:40 -02:00
Pablo Hoffman
fa644f7a5e
Some simplifications to Scrapyd architecture and internals:
...
- launcher no longer knows about egg storage
- removed get_spider_list_from_eggifile() file and replaced by simpler
get_spider_list() which doesn't receive en egg file as argument
- changed "egg runner" name to just "runner" to reflect the fact that it
doesn't necesarilly run eggs (though it does in the default case)
--HG--
rename : scrapyd/eggrunner.py => scrapyd/runner.py
2010-12-27 16:22:32 -02:00
Pablo Hoffman
633ebc4c43
minor indentation improvement
2010-12-23 13:04:49 -02:00
Pablo Hoffman
db07a9a938
Added notice to documentation, pointing dev to stable versions and viceversa
2010-12-23 13:03:40 -02:00
Pablo Hoffman
544308d6d0
updated ubuntu repos doc, in preparation for the 0.11 release
2010-12-21 11:02:56 -02:00