scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 16:44:22 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	949e11ee31	SitemapSpider: added support for parsing gzipped sitemaps (patch contributed by Rolando Espinoza)	2011-07-06 01:33:46 -03:00
Pablo Hoffman	5707051352	fixed httpcompression middleware tests	2011-07-04 21:31:05 -03:00
Pablo Hoffman	81fbe8c9a4	added x-gzip to supported encoding declarations in httpcompression middleware	2011-07-04 21:27:24 -03:00
Pablo Hoffman	a5223881ee	removed debugging code	2011-06-30 02:28:53 -03:00
Pablo Hoffman	5275343fa1	use handle_httpstatus_all=True in scrapy shell	2011-06-28 17:27:40 -03:00
Pablo Hoffman	7cd559eca5	SitemapSpider: ignore non-xml responses. fixes #331	2011-06-27 10:02:16 -03:00
Pablo Hoffman	db5cae7c03	SitemapSpider: added support for filtering which sitemaps to follow (patch contributed by Rolando Espinoza). closes #330	2011-06-23 18:18:29 -03:00
Pablo Hoffman	d97a9d8731	improved errors of ItemLoader.load_item() so that it shows the field name and value of the output processor that failed	2011-06-23 12:39:51 -03:00
Pablo Hoffman	fbafb295e8	removed DEFAULT_ITEM_CLASS setting from settings in new project template	2011-06-23 11:34:28 -03:00
Pablo Hoffman	d197895d8f	removed deprecated code	2011-06-21 18:06:04 -03:00
Pablo Hoffman	d8775a7575	removed old deprecated FileExportPipeline	2011-06-21 18:01:05 -03:00
Pablo Hoffman	0305ffdd6c	sitemaps: support trailing spaces in <loc> elements	2011-06-20 21:22:16 -03:00
Pablo Hoffman	2e74ccaa7e	dropped InitSpider super class from CrawlSpider and Feed spiders, to avoid potentially confusing code, as it's also not needed	2011-06-20 13:10:13 -03:00
Pablo Hoffman	03bc218987	fixed bug in get_engine_status() function	2011-06-20 11:09:01 -03:00
Pablo Hoffman	03a92a8b03	slightly improved version of scrapyd script	2011-06-20 11:04:38 -03:00
Pablo Hoffman	5de5cac43e	added quick script script to launch scrapyd	2011-06-20 10:48:34 -03:00
Pablo Hoffman	841007b5c5	added envvar SCRAPY_VERSION_FROM_HG=1 to extras/makedeb.py script	2011-06-18 03:31:47 -03:00
Pablo Hoffman	7e5e00cea5	Added public engine.download() method to use the downloader bypassing the scheduler. Changed media pipeline to use engine.download() to prevent deadlocks.	2011-06-18 02:52:21 -03:00
Pablo Hoffman	dd90e83eae	get_engine_status(): also look up open spiders in scraper component	2011-06-18 02:48:01 -03:00
Pablo Hoffman	e575e015c1	LogStats extension: fixed KeyError bug caused with spiders that don't scrape any items	2011-06-17 16:50:02 -03:00
Pablo Hoffman	cfc93ba9db	added SitemapSpider to basic spider assertion tests	2011-06-16 10:20:28 -03:00
Pablo Hoffman	25b0ca3125	minor imports sort out	2011-06-16 10:19:27 -03:00
Pablo Hoffman	59acb129e5	scrapyd activate_egg(): don't override SCRAPY_SETTINGS_MODULE envvar if already set	2011-06-15 19:35:03 -03:00
Pablo Hoffman	cd52a7c83b	removed debugging print	2011-06-15 12:35:54 -03:00
Pablo Hoffman	57c43fdce6	added SitemapSpider, with tests and doc	2011-06-15 11:54:34 -03:00
Pablo Hoffman	91dc46539f	added LogStats extension for periodically logging basic stats (like crawled pages and scraped items)	2011-06-14 00:50:05 -03:00
Pablo Hoffman	d2a9c0fdcd	issue deprecation warning when using CLOSESPIDER_ITEMPASSED setting	2011-06-13 22:34:01 -03:00
Pablo Hoffman	841e9913db	renamed CLOSESPIDER_ITEMPASSED setting to CLOSESPIDER_ITEMCOUNT, to follow the refactoring done in r2630	2011-06-13 16:58:51 -03:00
Pablo Hoffman	5dea6be513	use log for dumping stack trace and engine status, in StackTraceDump extension	2011-06-13 14:28:03 -03:00
Pablo Hoffman	72cf5a97c3	added -e\|--edit option to genspider command	2011-06-13 09:54:06 -03:00
Pablo Hoffman	80b557849a	fixed test broken in previous commit	2011-06-12 02:55:21 -03:00
Pablo Hoffman	0d5399d0bf	fixed scrapyd tests on win32. closes #295	2011-06-12 02:46:41 -03:00
Pablo Hoffman	c434d11f09	added Darian Moody to AUTHORS	2011-06-12 01:42:30 -03:00
Darian Moody	6873d5b952	Added to tests for last commit; now tests to make sure custom primary keys are editable from the Scrapy Item. --- scrapy/tests/test_djangoitem/__init__.py \| 15 ++++++++++++++- scrapy/tests/test_djangoitem/models.py \| 7 +++++++ 2 files changed, 21 insertions(+), 1 deletions(-)	2011-06-12 01:41:10 -03:00
Darian Moody	05101c7bba	Fixed DjangoItem to work properly with auto-generated fields (such as the primary key); it will now ignore those that have had the auto_created flag set - this now allows us to work with custom primary keys as the previous way ignored a custom primary key field. --- scrapy/contrib_exp/djangoitem.py \| 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-)	2011-06-12 01:41:09 -03:00
Pablo Hoffman	37830da1f6	fixed wrong code in test	2011-06-10 18:27:39 -03:00
Pablo Hoffman	c4a607fc78	Raise ValueError if url has no scheme in Request constructor	2011-06-10 18:22:36 -03:00
Pablo Hoffman	88e33ad0ad	Simplified Request/Response __repr__ to be the same as __str__. This improves legibility and shouldn't affect any functionality, since we never use __repr__ for reconstructing a response AFAIK. Also fixes #318	2011-06-09 00:15:53 -03:00
Pablo Hoffman	07df0edf74	scrapyd.webservice: use twisted.web multipart data parsing, to simplify code. closes #324	2011-06-08 14:17:04 -03:00
Pablo Hoffman	7643f14c88	fixed bug handling truncated gzipped responses. closes #319	2011-06-06 18:25:14 -03:00
Pablo Hoffman	48509b036a	fixed some tests accidentally broken in previous commit	2011-06-06 16:11:43 -03:00
Pablo Hoffman	f793515565	make --headers output of fetch command resemble curl format, and also show request headers	2011-06-06 15:21:50 -03:00
Pablo Hoffman	03751749a8	Scheduler refactoring which introduces the following changes: * dropped deferred stored along with requests in scheduler queues, which will add the ability to support persistent schedulers in the future * moved duplicates filter into the scheduler itself, using the same dupe fltering class as before (DUPEFILTER_CLASS setting) * removed scheduler middleware component to simplify, as it was only used for duplicates filtering and that is now done in the scheduler itself * adapted media pipeline to work with new scheduler * cleanup old docstrings	2011-06-06 03:16:56 -03:00
Pablo Hoffman	474cba512c	simplified MemoryDebugger extension to use stats for dumping memory debugging info	2011-06-06 03:13:28 -03:00
Pablo Hoffman	5fbc32c015	call stats collector engine_stopped() after the engine is closed (to make sure all data from extensions has been collected), and added that method to documented api	2011-06-06 03:12:40 -03:00
Pablo Hoffman	35b52fcdf0	removed deprecated stat 'envinfo/request_depth_limit'. we should instead support dumping settings, for these cases	2011-06-06 01:02:58 -03:00
Pablo Hoffman	9d9c8877da	added 'scrapy edit' command	2011-06-05 22:02:56 -03:00
Pablo Hoffman	ffbc9295f6	simplified DownloaderStats middleware	2011-06-05 20:03:09 -03:00
Pablo Hoffman	3d823d6f45	simplified CoreStats extension	2011-06-05 19:57:38 -03:00
Pablo Hoffman	61cc95df7c	removed crawlspider v2 tests	2011-06-03 18:26:17 -03:00

1 2 3 4 5 ...

2683 Commits