1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 16:44:22 +00:00

2683 Commits

Author SHA1 Message Date
Pablo Hoffman
949e11ee31 SitemapSpider: added support for parsing gzipped sitemaps (patch contributed by Rolando Espinoza) 2011-07-06 01:33:46 -03:00
Pablo Hoffman
5707051352 fixed httpcompression middleware tests 2011-07-04 21:31:05 -03:00
Pablo Hoffman
81fbe8c9a4 added x-gzip to supported encoding declarations in httpcompression middleware 2011-07-04 21:27:24 -03:00
Pablo Hoffman
a5223881ee removed debugging code 2011-06-30 02:28:53 -03:00
Pablo Hoffman
5275343fa1 use handle_httpstatus_all=True in scrapy shell 2011-06-28 17:27:40 -03:00
Pablo Hoffman
7cd559eca5 SitemapSpider: ignore non-xml responses. fixes #331 2011-06-27 10:02:16 -03:00
Pablo Hoffman
db5cae7c03 SitemapSpider: added support for filtering which sitemaps to follow (patch contributed by Rolando Espinoza). closes #330 2011-06-23 18:18:29 -03:00
Pablo Hoffman
d97a9d8731 improved errors of ItemLoader.load_item() so that it shows the field name and value of the output processor that failed 2011-06-23 12:39:51 -03:00
Pablo Hoffman
fbafb295e8 removed DEFAULT_ITEM_CLASS setting from settings in new project template 2011-06-23 11:34:28 -03:00
Pablo Hoffman
d197895d8f removed deprecated code 2011-06-21 18:06:04 -03:00
Pablo Hoffman
d8775a7575 removed old deprecated FileExportPipeline 2011-06-21 18:01:05 -03:00
Pablo Hoffman
0305ffdd6c sitemaps: support trailing spaces in <loc> elements 2011-06-20 21:22:16 -03:00
Pablo Hoffman
2e74ccaa7e dropped InitSpider super class from CrawlSpider and Feed spiders, to avoid potentially confusing code, as it's also not needed 2011-06-20 13:10:13 -03:00
Pablo Hoffman
03bc218987 fixed bug in get_engine_status() function 2011-06-20 11:09:01 -03:00
Pablo Hoffman
03a92a8b03 slightly improved version of scrapyd script 2011-06-20 11:04:38 -03:00
Pablo Hoffman
5de5cac43e added quick script script to launch scrapyd 2011-06-20 10:48:34 -03:00
Pablo Hoffman
841007b5c5 added envvar SCRAPY_VERSION_FROM_HG=1 to extras/makedeb.py script 2011-06-18 03:31:47 -03:00
Pablo Hoffman
7e5e00cea5 Added public engine.download() method to use the downloader bypassing the scheduler. Changed media pipeline to use engine.download() to prevent deadlocks. 2011-06-18 02:52:21 -03:00
Pablo Hoffman
dd90e83eae get_engine_status(): also look up open spiders in scraper component 2011-06-18 02:48:01 -03:00
Pablo Hoffman
e575e015c1 LogStats extension: fixed KeyError bug caused with spiders that don't scrape any items 2011-06-17 16:50:02 -03:00
Pablo Hoffman
cfc93ba9db added SitemapSpider to basic spider assertion tests 2011-06-16 10:20:28 -03:00
Pablo Hoffman
25b0ca3125 minor imports sort out 2011-06-16 10:19:27 -03:00
Pablo Hoffman
59acb129e5 scrapyd activate_egg(): don't override SCRAPY_SETTINGS_MODULE envvar if already set 2011-06-15 19:35:03 -03:00
Pablo Hoffman
cd52a7c83b removed debugging print 2011-06-15 12:35:54 -03:00
Pablo Hoffman
57c43fdce6 added SitemapSpider, with tests and doc 2011-06-15 11:54:34 -03:00
Pablo Hoffman
91dc46539f added LogStats extension for periodically logging basic stats (like crawled pages and scraped items) 2011-06-14 00:50:05 -03:00
Pablo Hoffman
d2a9c0fdcd issue deprecation warning when using CLOSESPIDER_ITEMPASSED setting 2011-06-13 22:34:01 -03:00
Pablo Hoffman
841e9913db renamed CLOSESPIDER_ITEMPASSED setting to CLOSESPIDER_ITEMCOUNT, to follow the refactoring done in r2630 2011-06-13 16:58:51 -03:00
Pablo Hoffman
5dea6be513 use log for dumping stack trace and engine status, in StackTraceDump extension 2011-06-13 14:28:03 -03:00
Pablo Hoffman
72cf5a97c3 added -e|--edit option to genspider command 2011-06-13 09:54:06 -03:00
Pablo Hoffman
80b557849a fixed test broken in previous commit 2011-06-12 02:55:21 -03:00
Pablo Hoffman
0d5399d0bf fixed scrapyd tests on win32. closes #295 2011-06-12 02:46:41 -03:00
Pablo Hoffman
c434d11f09 added Darian Moody to AUTHORS 2011-06-12 01:42:30 -03:00
Darian Moody
6873d5b952 Added to tests for last commit; now tests to make sure
custom primary keys are editable from the Scrapy Item.
---
 scrapy/tests/test_djangoitem/__init__.py |   15 ++++++++++++++-
 scrapy/tests/test_djangoitem/models.py   |    7 +++++++
 2 files changed, 21 insertions(+), 1 deletions(-)
2011-06-12 01:41:10 -03:00
Darian Moody
05101c7bba Fixed DjangoItem to work properly with auto-generated
fields (such as the primary key); it will now ignore
 those that have had the auto_created flag set - this
 now allows us to work with custom primary keys as the
 previous way ignored a custom primary key field.
---
 scrapy/contrib_exp/djangoitem.py |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)
2011-06-12 01:41:09 -03:00
Pablo Hoffman
37830da1f6 fixed wrong code in test 2011-06-10 18:27:39 -03:00
Pablo Hoffman
c4a607fc78 Raise ValueError if url has no scheme in Request constructor 2011-06-10 18:22:36 -03:00
Pablo Hoffman
88e33ad0ad Simplified Request/Response __repr__ to be the same as __str__. This improves legibility and shouldn't affect any functionality, since we never use __repr__ for reconstructing a response AFAIK. Also fixes #318 2011-06-09 00:15:53 -03:00
Pablo Hoffman
07df0edf74 scrapyd.webservice: use twisted.web multipart data parsing, to simplify code. closes #324 2011-06-08 14:17:04 -03:00
Pablo Hoffman
7643f14c88 fixed bug handling truncated gzipped responses. closes #319 2011-06-06 18:25:14 -03:00
Pablo Hoffman
48509b036a fixed some tests accidentally broken in previous commit 2011-06-06 16:11:43 -03:00
Pablo Hoffman
f793515565 make --headers output of fetch command resemble curl format, and also show request headers 2011-06-06 15:21:50 -03:00
Pablo Hoffman
03751749a8 Scheduler refactoring which introduces the following changes:
* dropped deferred stored along with requests in scheduler queues, which will
  add the ability to support persistent schedulers in the future
* moved duplicates filter into the scheduler itself, using the same
  dupe fltering class as before (DUPEFILTER_CLASS setting)
* removed scheduler middleware component to simplify, as it was only used for
  duplicates filtering and that is now done in the scheduler itself
* adapted media pipeline to work with new scheduler
* cleanup old docstrings
2011-06-06 03:16:56 -03:00
Pablo Hoffman
474cba512c simplified MemoryDebugger extension to use stats for dumping memory debugging info 2011-06-06 03:13:28 -03:00
Pablo Hoffman
5fbc32c015 call stats collector engine_stopped() after the engine is closed (to make sure all data from extensions has been collected), and added that method to documented api 2011-06-06 03:12:40 -03:00
Pablo Hoffman
35b52fcdf0 removed deprecated stat 'envinfo/request_depth_limit'. we should instead support dumping settings, for these cases 2011-06-06 01:02:58 -03:00
Pablo Hoffman
9d9c8877da added 'scrapy edit' command 2011-06-05 22:02:56 -03:00
Pablo Hoffman
ffbc9295f6 simplified DownloaderStats middleware 2011-06-05 20:03:09 -03:00
Pablo Hoffman
3d823d6f45 simplified CoreStats extension 2011-06-05 19:57:38 -03:00
Pablo Hoffman
61cc95df7c removed crawlspider v2 tests 2011-06-03 18:26:17 -03:00