1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 12:23:44 +00:00

2726 Commits

Author SHA1 Message Date
Pablo Hoffman
d97a9d8731 improved errors of ItemLoader.load_item() so that it shows the field name and value of the output processor that failed 2011-06-23 12:39:51 -03:00
Pablo Hoffman
fbafb295e8 removed DEFAULT_ITEM_CLASS setting from settings in new project template 2011-06-23 11:34:28 -03:00
Pablo Hoffman
d197895d8f removed deprecated code 2011-06-21 18:06:04 -03:00
Pablo Hoffman
d8775a7575 removed old deprecated FileExportPipeline 2011-06-21 18:01:05 -03:00
Pablo Hoffman
0305ffdd6c sitemaps: support trailing spaces in <loc> elements 2011-06-20 21:22:16 -03:00
Pablo Hoffman
2e74ccaa7e dropped InitSpider super class from CrawlSpider and Feed spiders, to avoid potentially confusing code, as it's also not needed 2011-06-20 13:10:13 -03:00
Pablo Hoffman
03bc218987 fixed bug in get_engine_status() function 2011-06-20 11:09:01 -03:00
Pablo Hoffman
03a92a8b03 slightly improved version of scrapyd script 2011-06-20 11:04:38 -03:00
Pablo Hoffman
5de5cac43e added quick script script to launch scrapyd 2011-06-20 10:48:34 -03:00
Pablo Hoffman
841007b5c5 added envvar SCRAPY_VERSION_FROM_HG=1 to extras/makedeb.py script 2011-06-18 03:31:47 -03:00
Pablo Hoffman
7e5e00cea5 Added public engine.download() method to use the downloader bypassing the scheduler. Changed media pipeline to use engine.download() to prevent deadlocks. 2011-06-18 02:52:21 -03:00
Pablo Hoffman
dd90e83eae get_engine_status(): also look up open spiders in scraper component 2011-06-18 02:48:01 -03:00
Pablo Hoffman
e575e015c1 LogStats extension: fixed KeyError bug caused with spiders that don't scrape any items 2011-06-17 16:50:02 -03:00
Pablo Hoffman
cfc93ba9db added SitemapSpider to basic spider assertion tests 2011-06-16 10:20:28 -03:00
Pablo Hoffman
25b0ca3125 minor imports sort out 2011-06-16 10:19:27 -03:00
Pablo Hoffman
59acb129e5 scrapyd activate_egg(): don't override SCRAPY_SETTINGS_MODULE envvar if already set 2011-06-15 19:35:03 -03:00
Pablo Hoffman
cd52a7c83b removed debugging print 2011-06-15 12:35:54 -03:00
Pablo Hoffman
57c43fdce6 added SitemapSpider, with tests and doc 2011-06-15 11:54:34 -03:00
Pablo Hoffman
91dc46539f added LogStats extension for periodically logging basic stats (like crawled pages and scraped items) 2011-06-14 00:50:05 -03:00
Pablo Hoffman
d2a9c0fdcd issue deprecation warning when using CLOSESPIDER_ITEMPASSED setting 2011-06-13 22:34:01 -03:00
Pablo Hoffman
841e9913db renamed CLOSESPIDER_ITEMPASSED setting to CLOSESPIDER_ITEMCOUNT, to follow the refactoring done in r2630 2011-06-13 16:58:51 -03:00
Pablo Hoffman
5dea6be513 use log for dumping stack trace and engine status, in StackTraceDump extension 2011-06-13 14:28:03 -03:00
Pablo Hoffman
72cf5a97c3 added -e|--edit option to genspider command 2011-06-13 09:54:06 -03:00
Pablo Hoffman
80b557849a fixed test broken in previous commit 2011-06-12 02:55:21 -03:00
Pablo Hoffman
0d5399d0bf fixed scrapyd tests on win32. closes #295 2011-06-12 02:46:41 -03:00
Pablo Hoffman
c434d11f09 added Darian Moody to AUTHORS 2011-06-12 01:42:30 -03:00
Darian Moody
6873d5b952 Added to tests for last commit; now tests to make sure
custom primary keys are editable from the Scrapy Item.
---
 scrapy/tests/test_djangoitem/__init__.py |   15 ++++++++++++++-
 scrapy/tests/test_djangoitem/models.py   |    7 +++++++
 2 files changed, 21 insertions(+), 1 deletions(-)
2011-06-12 01:41:10 -03:00
Darian Moody
05101c7bba Fixed DjangoItem to work properly with auto-generated
fields (such as the primary key); it will now ignore
 those that have had the auto_created flag set - this
 now allows us to work with custom primary keys as the
 previous way ignored a custom primary key field.
---
 scrapy/contrib_exp/djangoitem.py |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)
2011-06-12 01:41:09 -03:00
Pablo Hoffman
37830da1f6 fixed wrong code in test 2011-06-10 18:27:39 -03:00
Pablo Hoffman
c4a607fc78 Raise ValueError if url has no scheme in Request constructor 2011-06-10 18:22:36 -03:00
Pablo Hoffman
88e33ad0ad Simplified Request/Response __repr__ to be the same as __str__. This improves legibility and shouldn't affect any functionality, since we never use __repr__ for reconstructing a response AFAIK. Also fixes #318 2011-06-09 00:15:53 -03:00
Pablo Hoffman
07df0edf74 scrapyd.webservice: use twisted.web multipart data parsing, to simplify code. closes #324 2011-06-08 14:17:04 -03:00
Pablo Hoffman
7643f14c88 fixed bug handling truncated gzipped responses. closes #319 2011-06-06 18:25:14 -03:00
Pablo Hoffman
48509b036a fixed some tests accidentally broken in previous commit 2011-06-06 16:11:43 -03:00
Pablo Hoffman
f793515565 make --headers output of fetch command resemble curl format, and also show request headers 2011-06-06 15:21:50 -03:00
Pablo Hoffman
03751749a8 Scheduler refactoring which introduces the following changes:
* dropped deferred stored along with requests in scheduler queues, which will
  add the ability to support persistent schedulers in the future
* moved duplicates filter into the scheduler itself, using the same
  dupe fltering class as before (DUPEFILTER_CLASS setting)
* removed scheduler middleware component to simplify, as it was only used for
  duplicates filtering and that is now done in the scheduler itself
* adapted media pipeline to work with new scheduler
* cleanup old docstrings
2011-06-06 03:16:56 -03:00
Pablo Hoffman
474cba512c simplified MemoryDebugger extension to use stats for dumping memory debugging info 2011-06-06 03:13:28 -03:00
Pablo Hoffman
5fbc32c015 call stats collector engine_stopped() after the engine is closed (to make sure all data from extensions has been collected), and added that method to documented api 2011-06-06 03:12:40 -03:00
Pablo Hoffman
35b52fcdf0 removed deprecated stat 'envinfo/request_depth_limit'. we should instead support dumping settings, for these cases 2011-06-06 01:02:58 -03:00
Pablo Hoffman
9d9c8877da added 'scrapy edit' command 2011-06-05 22:02:56 -03:00
Pablo Hoffman
ffbc9295f6 simplified DownloaderStats middleware 2011-06-05 20:03:09 -03:00
Pablo Hoffman
3d823d6f45 simplified CoreStats extension 2011-06-05 19:57:38 -03:00
Pablo Hoffman
61cc95df7c removed crawlspider v2 tests 2011-06-03 18:26:17 -03:00
Pablo Hoffman
03ae481cad removed experimental crawlspider v2 2011-06-03 18:23:23 -03:00
Pablo Hoffman
5bf733b6f6 Changed default representation of items to pretty-printed dicts. This improves
default logging by making log more readable in the default case, for both Scraped and Dropped lines.

Projects can still customize how items are represented by overriding the item's __str__ method, as usual.
2011-06-03 01:13:01 -03:00
Pablo Hoffman
1bc2339bb8 Merged item passed and item scraped concepts, as they have often proved
confusing in the past.

This means:

* original item_scraped signal was removed
* original item_passed signal was renamed to item_scraped
* old log lines "Scraped Item..." removed
* old log lines "Passed Item..." renamed to "Scraped Item..."
2011-06-03 01:13:00 -03:00
Pablo Hoffman
e6091df551 fixed doc typo 2011-05-30 09:04:31 -03:00
Pablo Hoffman
1d98fc8fb5 added spider_error signal 2011-05-29 22:38:17 -03:00
Pablo Hoffman
13d8066788 removed undocumented (and untested) extension: SpiderCloseDelay 2011-05-27 11:52:33 -03:00
Pablo Hoffman
6c369c50ca removed support for spider.dont_throttle attribute 2011-05-27 09:09:28 -03:00