Pablo Hoffman
949e11ee31
SitemapSpider: added support for parsing gzipped sitemaps (patch contributed by Rolando Espinoza)
2011-07-06 01:33:46 -03:00
Pablo Hoffman
5707051352
fixed httpcompression middleware tests
2011-07-04 21:31:05 -03:00
Pablo Hoffman
81fbe8c9a4
added x-gzip to supported encoding declarations in httpcompression middleware
2011-07-04 21:27:24 -03:00
Pablo Hoffman
a5223881ee
removed debugging code
2011-06-30 02:28:53 -03:00
Pablo Hoffman
5275343fa1
use handle_httpstatus_all=True in scrapy shell
2011-06-28 17:27:40 -03:00
Pablo Hoffman
7cd559eca5
SitemapSpider: ignore non-xml responses. fixes #331
2011-06-27 10:02:16 -03:00
Pablo Hoffman
db5cae7c03
SitemapSpider: added support for filtering which sitemaps to follow (patch contributed by Rolando Espinoza). closes #330
2011-06-23 18:18:29 -03:00
Pablo Hoffman
d97a9d8731
improved errors of ItemLoader.load_item() so that it shows the field name and value of the output processor that failed
2011-06-23 12:39:51 -03:00
Pablo Hoffman
fbafb295e8
removed DEFAULT_ITEM_CLASS setting from settings in new project template
2011-06-23 11:34:28 -03:00
Pablo Hoffman
d197895d8f
removed deprecated code
2011-06-21 18:06:04 -03:00
Pablo Hoffman
d8775a7575
removed old deprecated FileExportPipeline
2011-06-21 18:01:05 -03:00
Pablo Hoffman
0305ffdd6c
sitemaps: support trailing spaces in <loc> elements
2011-06-20 21:22:16 -03:00
Pablo Hoffman
2e74ccaa7e
dropped InitSpider super class from CrawlSpider and Feed spiders, to avoid potentially confusing code, as it's also not needed
2011-06-20 13:10:13 -03:00
Pablo Hoffman
03bc218987
fixed bug in get_engine_status() function
2011-06-20 11:09:01 -03:00
Pablo Hoffman
03a92a8b03
slightly improved version of scrapyd script
2011-06-20 11:04:38 -03:00
Pablo Hoffman
5de5cac43e
added quick script script to launch scrapyd
2011-06-20 10:48:34 -03:00
Pablo Hoffman
841007b5c5
added envvar SCRAPY_VERSION_FROM_HG=1 to extras/makedeb.py script
2011-06-18 03:31:47 -03:00
Pablo Hoffman
7e5e00cea5
Added public engine.download() method to use the downloader bypassing the scheduler. Changed media pipeline to use engine.download() to prevent deadlocks.
2011-06-18 02:52:21 -03:00
Pablo Hoffman
dd90e83eae
get_engine_status(): also look up open spiders in scraper component
2011-06-18 02:48:01 -03:00
Pablo Hoffman
e575e015c1
LogStats extension: fixed KeyError bug caused with spiders that don't scrape any items
2011-06-17 16:50:02 -03:00
Pablo Hoffman
cfc93ba9db
added SitemapSpider to basic spider assertion tests
2011-06-16 10:20:28 -03:00
Pablo Hoffman
25b0ca3125
minor imports sort out
2011-06-16 10:19:27 -03:00
Pablo Hoffman
59acb129e5
scrapyd activate_egg(): don't override SCRAPY_SETTINGS_MODULE envvar if already set
2011-06-15 19:35:03 -03:00
Pablo Hoffman
cd52a7c83b
removed debugging print
2011-06-15 12:35:54 -03:00
Pablo Hoffman
57c43fdce6
added SitemapSpider, with tests and doc
2011-06-15 11:54:34 -03:00
Pablo Hoffman
91dc46539f
added LogStats extension for periodically logging basic stats (like crawled pages and scraped items)
2011-06-14 00:50:05 -03:00
Pablo Hoffman
d2a9c0fdcd
issue deprecation warning when using CLOSESPIDER_ITEMPASSED setting
2011-06-13 22:34:01 -03:00
Pablo Hoffman
841e9913db
renamed CLOSESPIDER_ITEMPASSED setting to CLOSESPIDER_ITEMCOUNT, to follow the refactoring done in r2630
2011-06-13 16:58:51 -03:00
Pablo Hoffman
5dea6be513
use log for dumping stack trace and engine status, in StackTraceDump extension
2011-06-13 14:28:03 -03:00
Pablo Hoffman
72cf5a97c3
added -e|--edit option to genspider command
2011-06-13 09:54:06 -03:00
Pablo Hoffman
80b557849a
fixed test broken in previous commit
2011-06-12 02:55:21 -03:00
Pablo Hoffman
0d5399d0bf
fixed scrapyd tests on win32. closes #295
2011-06-12 02:46:41 -03:00
Pablo Hoffman
c434d11f09
added Darian Moody to AUTHORS
2011-06-12 01:42:30 -03:00
Darian Moody
6873d5b952
Added to tests for last commit; now tests to make sure
...
custom primary keys are editable from the Scrapy Item.
---
scrapy/tests/test_djangoitem/__init__.py | 15 ++++++++++++++-
scrapy/tests/test_djangoitem/models.py | 7 +++++++
2 files changed, 21 insertions(+), 1 deletions(-)
2011-06-12 01:41:10 -03:00
Darian Moody
05101c7bba
Fixed DjangoItem to work properly with auto-generated
...
fields (such as the primary key); it will now ignore
those that have had the auto_created flag set - this
now allows us to work with custom primary keys as the
previous way ignored a custom primary key field.
---
scrapy/contrib_exp/djangoitem.py | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
2011-06-12 01:41:09 -03:00
Pablo Hoffman
37830da1f6
fixed wrong code in test
2011-06-10 18:27:39 -03:00
Pablo Hoffman
c4a607fc78
Raise ValueError if url has no scheme in Request constructor
2011-06-10 18:22:36 -03:00
Pablo Hoffman
88e33ad0ad
Simplified Request/Response __repr__ to be the same as __str__. This improves legibility and shouldn't affect any functionality, since we never use __repr__ for reconstructing a response AFAIK. Also fixes #318
2011-06-09 00:15:53 -03:00
Pablo Hoffman
07df0edf74
scrapyd.webservice: use twisted.web multipart data parsing, to simplify code. closes #324
2011-06-08 14:17:04 -03:00
Pablo Hoffman
7643f14c88
fixed bug handling truncated gzipped responses. closes #319
2011-06-06 18:25:14 -03:00
Pablo Hoffman
48509b036a
fixed some tests accidentally broken in previous commit
2011-06-06 16:11:43 -03:00
Pablo Hoffman
f793515565
make --headers output of fetch command resemble curl format, and also show request headers
2011-06-06 15:21:50 -03:00
Pablo Hoffman
03751749a8
Scheduler refactoring which introduces the following changes:
...
* dropped deferred stored along with requests in scheduler queues, which will
add the ability to support persistent schedulers in the future
* moved duplicates filter into the scheduler itself, using the same
dupe fltering class as before (DUPEFILTER_CLASS setting)
* removed scheduler middleware component to simplify, as it was only used for
duplicates filtering and that is now done in the scheduler itself
* adapted media pipeline to work with new scheduler
* cleanup old docstrings
2011-06-06 03:16:56 -03:00
Pablo Hoffman
474cba512c
simplified MemoryDebugger extension to use stats for dumping memory debugging info
2011-06-06 03:13:28 -03:00
Pablo Hoffman
5fbc32c015
call stats collector engine_stopped() after the engine is closed (to make sure all data from extensions has been collected), and added that method to documented api
2011-06-06 03:12:40 -03:00
Pablo Hoffman
35b52fcdf0
removed deprecated stat 'envinfo/request_depth_limit'. we should instead support dumping settings, for these cases
2011-06-06 01:02:58 -03:00
Pablo Hoffman
9d9c8877da
added 'scrapy edit' command
2011-06-05 22:02:56 -03:00
Pablo Hoffman
ffbc9295f6
simplified DownloaderStats middleware
2011-06-05 20:03:09 -03:00
Pablo Hoffman
3d823d6f45
simplified CoreStats extension
2011-06-05 19:57:38 -03:00
Pablo Hoffman
61cc95df7c
removed crawlspider v2 tests
2011-06-03 18:26:17 -03:00