Pablo Hoffman
03bc218987
fixed bug in get_engine_status() function
2011-06-20 11:09:01 -03:00
Pablo Hoffman
03a92a8b03
slightly improved version of scrapyd script
2011-06-20 11:04:38 -03:00
Pablo Hoffman
5de5cac43e
added quick script script to launch scrapyd
2011-06-20 10:48:34 -03:00
Pablo Hoffman
841007b5c5
added envvar SCRAPY_VERSION_FROM_HG=1 to extras/makedeb.py script
2011-06-18 03:31:47 -03:00
Pablo Hoffman
7e5e00cea5
Added public engine.download() method to use the downloader bypassing the scheduler. Changed media pipeline to use engine.download() to prevent deadlocks.
2011-06-18 02:52:21 -03:00
Pablo Hoffman
dd90e83eae
get_engine_status(): also look up open spiders in scraper component
2011-06-18 02:48:01 -03:00
Pablo Hoffman
e575e015c1
LogStats extension: fixed KeyError bug caused with spiders that don't scrape any items
2011-06-17 16:50:02 -03:00
Pablo Hoffman
cfc93ba9db
added SitemapSpider to basic spider assertion tests
2011-06-16 10:20:28 -03:00
Pablo Hoffman
25b0ca3125
minor imports sort out
2011-06-16 10:19:27 -03:00
Pablo Hoffman
59acb129e5
scrapyd activate_egg(): don't override SCRAPY_SETTINGS_MODULE envvar if already set
2011-06-15 19:35:03 -03:00
Pablo Hoffman
cd52a7c83b
removed debugging print
2011-06-15 12:35:54 -03:00
Pablo Hoffman
57c43fdce6
added SitemapSpider, with tests and doc
2011-06-15 11:54:34 -03:00
Pablo Hoffman
91dc46539f
added LogStats extension for periodically logging basic stats (like crawled pages and scraped items)
2011-06-14 00:50:05 -03:00
Pablo Hoffman
d2a9c0fdcd
issue deprecation warning when using CLOSESPIDER_ITEMPASSED setting
2011-06-13 22:34:01 -03:00
Pablo Hoffman
841e9913db
renamed CLOSESPIDER_ITEMPASSED setting to CLOSESPIDER_ITEMCOUNT, to follow the refactoring done in r2630
2011-06-13 16:58:51 -03:00
Pablo Hoffman
5dea6be513
use log for dumping stack trace and engine status, in StackTraceDump extension
2011-06-13 14:28:03 -03:00
Pablo Hoffman
72cf5a97c3
added -e|--edit option to genspider command
2011-06-13 09:54:06 -03:00
Pablo Hoffman
80b557849a
fixed test broken in previous commit
2011-06-12 02:55:21 -03:00
Pablo Hoffman
0d5399d0bf
fixed scrapyd tests on win32. closes #295
2011-06-12 02:46:41 -03:00
Pablo Hoffman
c434d11f09
added Darian Moody to AUTHORS
2011-06-12 01:42:30 -03:00
Darian Moody
6873d5b952
Added to tests for last commit; now tests to make sure
...
custom primary keys are editable from the Scrapy Item.
---
scrapy/tests/test_djangoitem/__init__.py | 15 ++++++++++++++-
scrapy/tests/test_djangoitem/models.py | 7 +++++++
2 files changed, 21 insertions(+), 1 deletions(-)
2011-06-12 01:41:10 -03:00
Darian Moody
05101c7bba
Fixed DjangoItem to work properly with auto-generated
...
fields (such as the primary key); it will now ignore
those that have had the auto_created flag set - this
now allows us to work with custom primary keys as the
previous way ignored a custom primary key field.
---
scrapy/contrib_exp/djangoitem.py | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
2011-06-12 01:41:09 -03:00
Pablo Hoffman
37830da1f6
fixed wrong code in test
2011-06-10 18:27:39 -03:00
Pablo Hoffman
c4a607fc78
Raise ValueError if url has no scheme in Request constructor
2011-06-10 18:22:36 -03:00
Pablo Hoffman
88e33ad0ad
Simplified Request/Response __repr__ to be the same as __str__. This improves legibility and shouldn't affect any functionality, since we never use __repr__ for reconstructing a response AFAIK. Also fixes #318
2011-06-09 00:15:53 -03:00
Pablo Hoffman
07df0edf74
scrapyd.webservice: use twisted.web multipart data parsing, to simplify code. closes #324
2011-06-08 14:17:04 -03:00
Pablo Hoffman
7643f14c88
fixed bug handling truncated gzipped responses. closes #319
2011-06-06 18:25:14 -03:00
Pablo Hoffman
48509b036a
fixed some tests accidentally broken in previous commit
2011-06-06 16:11:43 -03:00
Pablo Hoffman
f793515565
make --headers output of fetch command resemble curl format, and also show request headers
2011-06-06 15:21:50 -03:00
Pablo Hoffman
03751749a8
Scheduler refactoring which introduces the following changes:
...
* dropped deferred stored along with requests in scheduler queues, which will
add the ability to support persistent schedulers in the future
* moved duplicates filter into the scheduler itself, using the same
dupe fltering class as before (DUPEFILTER_CLASS setting)
* removed scheduler middleware component to simplify, as it was only used for
duplicates filtering and that is now done in the scheduler itself
* adapted media pipeline to work with new scheduler
* cleanup old docstrings
2011-06-06 03:16:56 -03:00
Pablo Hoffman
474cba512c
simplified MemoryDebugger extension to use stats for dumping memory debugging info
2011-06-06 03:13:28 -03:00
Pablo Hoffman
5fbc32c015
call stats collector engine_stopped() after the engine is closed (to make sure all data from extensions has been collected), and added that method to documented api
2011-06-06 03:12:40 -03:00
Pablo Hoffman
35b52fcdf0
removed deprecated stat 'envinfo/request_depth_limit'. we should instead support dumping settings, for these cases
2011-06-06 01:02:58 -03:00
Pablo Hoffman
9d9c8877da
added 'scrapy edit' command
2011-06-05 22:02:56 -03:00
Pablo Hoffman
ffbc9295f6
simplified DownloaderStats middleware
2011-06-05 20:03:09 -03:00
Pablo Hoffman
3d823d6f45
simplified CoreStats extension
2011-06-05 19:57:38 -03:00
Pablo Hoffman
61cc95df7c
removed crawlspider v2 tests
2011-06-03 18:26:17 -03:00
Pablo Hoffman
03ae481cad
removed experimental crawlspider v2
2011-06-03 18:23:23 -03:00
Pablo Hoffman
5bf733b6f6
Changed default representation of items to pretty-printed dicts. This improves
...
default logging by making log more readable in the default case, for both Scraped and Dropped lines.
Projects can still customize how items are represented by overriding the item's __str__ method, as usual.
2011-06-03 01:13:01 -03:00
Pablo Hoffman
1bc2339bb8
Merged item passed and item scraped concepts, as they have often proved
...
confusing in the past.
This means:
* original item_scraped signal was removed
* original item_passed signal was renamed to item_scraped
* old log lines "Scraped Item..." removed
* old log lines "Passed Item..." renamed to "Scraped Item..."
2011-06-03 01:13:00 -03:00
Pablo Hoffman
e6091df551
fixed doc typo
2011-05-30 09:04:31 -03:00
Pablo Hoffman
1d98fc8fb5
added spider_error signal
2011-05-29 22:38:17 -03:00
Pablo Hoffman
13d8066788
removed undocumented (and untested) extension: SpiderCloseDelay
2011-05-27 11:52:33 -03:00
Pablo Hoffman
6c369c50ca
removed support for spider.dont_throttle attribute
2011-05-27 09:09:28 -03:00
Pablo Hoffman
2fa0f75f2d
added COOKIES_ENABLED setting to support disabling the cookies middleware
2011-05-27 00:35:34 -03:00
Pablo Hoffman
756bf0cc06
register AutoThrottle extension by default, and made AUTOTHROTTLE_ENABLED disabled by default
2011-05-27 00:22:13 -03:00
Pablo Hoffman
dcc28b7186
added setting: AUTOTHROTTLE_ENABLED
2011-05-22 18:31:36 -03:00
Pablo Hoffman
110cd05296
added Spider.dont_throttle attribute to disable AutoThrottle extension per spider
2011-05-22 18:26:38 -03:00
Shane Evans
88dbe2ae87
fix error messages due to fetching pages during shutdown process
...
This version keeps the faster approach of not processing request callbacks when engine is shutting down
2011-05-20 14:35:37 +01:00
Pablo Hoffman
3897e33612
fixed stupid bug in scheduler introduced in previous change
2011-05-20 03:52:41 -03:00