1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 16:07:58 +00:00

1170 Commits

Author SHA1 Message Date
Pablo Hoffman
5bf733b6f6 Changed default representation of items to pretty-printed dicts. This improves
default logging by making log more readable in the default case, for both Scraped and Dropped lines.

Projects can still customize how items are represented by overriding the item's __str__ method, as usual.
2011-06-03 01:13:01 -03:00
Pablo Hoffman
1bc2339bb8 Merged item passed and item scraped concepts, as they have often proved
confusing in the past.

This means:

* original item_scraped signal was removed
* original item_passed signal was renamed to item_scraped
* old log lines "Scraped Item..." removed
* old log lines "Passed Item..." renamed to "Scraped Item..."
2011-06-03 01:13:00 -03:00
Pablo Hoffman
e6091df551 fixed doc typo 2011-05-30 09:04:31 -03:00
Pablo Hoffman
1d98fc8fb5 added spider_error signal 2011-05-29 22:38:17 -03:00
Pablo Hoffman
2fa0f75f2d added COOKIES_ENABLED setting to support disabling the cookies middleware 2011-05-27 00:35:34 -03:00
Pablo Hoffman
d72d3f4607 stack trace dump extension: also dump engine status, and support triggering it with SIGQUIT, besides SIGUSR2 2011-05-20 03:25:00 -03:00
Pablo Hoffman
951ba507f9 Removed support for default values in Scrapy items, which have proven confusing in the past 2011-05-19 21:42:46 -03:00
Pablo Hoffman
503f302010 removed remaining references to scheduler middleware from doc, as it will be removed on next release 2011-05-18 19:48:48 -03:00
Pablo Hoffman
3fd17432cf fixed outdated documentation 2011-05-18 14:46:20 -03:00
Pablo Hoffman
9016e7e993 added role to link to scrapy source code (not yet used) 2011-05-18 14:43:34 -03:00
Pablo Hoffman
cd85c12c33 Some Link extractor improvements:
* added support for ignoring common file extensions that are not followed if
  they occur in links
* fixed link extractor documentation issues
* slighly improved performance of applying filters
* added link to link extractors doc from documentation index
2011-05-18 12:32:34 -03:00
Pablo Hoffman
495152bd50 disabled verbose depth stats collection by default, added DEPTH_STATS_VERBOSE setting to enable it 2011-05-18 11:04:48 -03:00
Pablo Hoffman
accb6ed830 dump stats to log by default (ie. change default value of STATS_DUMP to True) 2011-05-17 22:42:05 -03:00
Pablo Hoffman
7f97259ba7 added w3lib to requirements, in installation guide 2011-05-01 11:14:57 -03:00
Pablo Hoffman
4a83167698 fixed small doc typo 2011-04-30 01:35:30 -03:00
Pablo Hoffman
bb2b67c862 updated tutorial to use 'dmoz' as the name of the spider instead of 'dmoz.org', so that it's more similar to the dirbot example project 2011-04-28 09:31:57 -03:00
Pablo Hoffman
bf73002428 removed googledir example, replaced by dirbot project on github. updated docs accordingly 2011-04-28 02:28:39 -03:00
Pablo Hoffman
b12dd76bb8 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-04-25 09:31:18 -03:00
Pablo Hoffman
678f08bc1b added warning about using 'parse' as callback in crawl spider rules 2011-04-25 09:30:42 -03:00
Pablo Hoffman
ad496eb3b6 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-04-14 12:36:27 -03:00
Pablo Hoffman
ecb4f44cbc Added clarification on how to work with local settings and scrapy deploy 2011-04-14 12:36:09 -03:00
Pablo Hoffman
3ee2c94e93 Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it 2011-04-06 14:54:48 -03:00
Pablo Hoffman
8a5c08a6bc added join_multivalued parameter to CsvItemExporter 2011-03-24 13:15:52 -03:00
Pablo Hoffman
3954e600ca added DBM storage backend for HTTP cache 2011-03-23 21:32:02 -03:00
Pablo Hoffman
cfd11df539 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-24 15:28:57 -02:00
Pablo Hoffman
8f7e163b04 Fixed wrong method name in downloader middleware documentation 2011-02-24 15:26:32 -02:00
Daniel Grana
c55355642c fix FAQ typos reported by marlun_ at #scrapy IRC channel 2011-02-16 08:57:42 -02:00
Pablo Hoffman
1fb55bdaf0 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-15 07:25:12 -02:00
Pablo Hoffman
16d9a33951 added FAQ entry about working with big data feeds 2011-02-15 07:24:52 -02:00
Pablo Hoffman
936353d5f1 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-09 11:20:46 -02:00
Pablo Hoffman
181d1c09ae Fixed typo and code indentation in the doc. Closes #307 and #308 2011-02-09 11:19:46 -02:00
Pablo Hoffman
c91f0d9ea1 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-04 13:39:54 -02:00
Pablo Hoffman
c5499ead73 Clarified behaviour when multiple rules match the same link in CrawlSpider 2011-02-04 13:39:12 -02:00
Pablo Hoffman
d7f193cbea bumped version to 0.13 in documentation 2011-01-02 17:29:43 -02:00
Pablo Hoffman
b56e933be9 bumped version to 0.12 in documentation 2011-01-02 17:28:33 -02:00
Pablo Hoffman
5879389ad0 Bumped version to 0.12 2011-01-02 16:16:40 -02:00
Pablo Hoffman
fa644f7a5e Some simplifications to Scrapyd architecture and internals:
- launcher no longer knows about egg storage
- removed get_spider_list_from_eggifile() file and replaced by simpler
  get_spider_list() which doesn't receive en egg file as argument
- changed "egg runner" name to just "runner" to reflect the fact that it
  doesn't necesarilly run eggs (though it does in the default case)

--HG--
rename : scrapyd/eggrunner.py => scrapyd/runner.py
2010-12-27 16:22:32 -02:00
Pablo Hoffman
633ebc4c43 minor indentation improvement 2010-12-23 13:04:49 -02:00
Pablo Hoffman
db07a9a938 Added notice to documentation, pointing dev to stable versions and viceversa 2010-12-23 13:03:40 -02:00
Pablo Hoffman
544308d6d0 updated ubuntu repos doc, in preparation for the 0.11 release 2010-12-21 11:02:56 -02:00
Pablo Hoffman
002abf204f Updated item_passed signal to send passed item in 'item' argument, instead of 'output' argument, keeping backwards compatibility for the 'output' argument. Closes #273 2010-12-13 14:05:47 -02:00
Pablo Hoffman
f984d438a0 updated docs to use scrapy version on aptitude install lines 2010-12-13 14:02:42 -02:00
Pablo Hoffman
119fd20e91 Added verbose option to 'version' command. Closes #298 2010-12-13 00:32:44 -02:00
Pablo Hoffman
6a1b69c93f renamed command 'scrapyd' to 'server', and deprecated 'runserver' and 'queue' commands
--HG--
rename : scrapy/commands/scrapyd.py => scrapy/commands/server.py
2010-11-30 20:23:27 -02:00
Pablo Hoffman
df54ed0041 Some Scrapyd enhancements:
* added minimal web ui
* return unique id per job (spider scheduled)
* store one log per spider run (job) and rotate them, keeping the last N logs (where N is configurable through settings)
2010-11-30 02:26:31 -02:00
Pablo Hoffman
bbffa59497 Some changes to Scrapyd:
* Always start one process per spider
* Added max_proc_per_cpu option (defaults to 4)
* Return the number of spiders (instead of a list of them) in schedule.json
2010-11-29 17:19:05 -02:00
Pablo Hoffman
2557777c39 Updated doc referring to HTTP cache middleware 2010-11-24 13:27:44 -02:00
Pablo Hoffman
426b6fa100 docs/intro/install.rst: added -U flag to easy_install command 2010-11-22 13:50:19 -02:00
Pablo Hoffman
91a7c25797 * Made Response.meta attribute map to Request.meta attribute. Closes #290
* Record redirected URLs in redirect middleware. Closes #291
2010-11-18 12:51:54 -02:00
Pablo Hoffman
ac007802d6 Simplified installation guide, including lxml as alternative dependency to libxml2. Closes #280 2010-11-17 21:32:23 -02:00