scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 13:04:01 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	4bb409923c	improved encoding detection by adding support for HTML5 meta charset	2011-07-12 09:52:50 -03:00
Pablo Hoffman	67213ce673	logformatter: support non-ascii characters in custom implementations of Item.__str__()	2011-07-12 01:16:06 -03:00
Pablo Hoffman	31a375bde7	Close the scheduler after closing the scraper and downloader. This shouldn't have any real effect in practice, but it feels more appropiate to close the components in this order	2011-07-10 04:18:50 -03:00
Pablo Hoffman	90b1ae694c	get_engine_status(): preserve test order defined in code	2011-07-10 04:10:20 -03:00
Pablo Hoffman	409aaade0b	Refactored close spider behaviour so that the engine now waits for all downloading (and enqueued for download) requests to finish and their responses to be processed in the scraper/spiders, before closing the spider. This will be required in the future to avoid loosing requests when we add scheduler persistence and it's also a more correct behaviour overall. The closing process has also been refactored to remove unneeded closing state from downloader and leave it only in the engine. Finally, some unused methods has been removed too, like spider_is_open() for engine and scheduler.	2011-07-08 11:40:19 -03:00
Pablo Hoffman	574b070bb4	fixed minor bug in sitemap parser	2011-07-08 09:33:56 -03:00
Pablo Hoffman	ab9b786791	Updated CAMELCASE_EXCLUDE_CHARS to also exclude digits (patch by Adam Wentz)	2011-07-06 20:11:11 -03:00
Pablo Hoffman	7abc4b4c5a	fixed typo	2011-07-06 01:35:21 -03:00
Pablo Hoffman	949e11ee31	SitemapSpider: added support for parsing gzipped sitemaps (patch contributed by Rolando Espinoza)	2011-07-06 01:33:46 -03:00
Pablo Hoffman	5707051352	fixed httpcompression middleware tests	2011-07-04 21:31:05 -03:00
Pablo Hoffman	81fbe8c9a4	added x-gzip to supported encoding declarations in httpcompression middleware	2011-07-04 21:27:24 -03:00
Pablo Hoffman	a5223881ee	removed debugging code	2011-06-30 02:28:53 -03:00
Pablo Hoffman	5275343fa1	use handle_httpstatus_all=True in scrapy shell	2011-06-28 17:27:40 -03:00
Pablo Hoffman	7cd559eca5	SitemapSpider: ignore non-xml responses. fixes #331	2011-06-27 10:02:16 -03:00
Pablo Hoffman	db5cae7c03	SitemapSpider: added support for filtering which sitemaps to follow (patch contributed by Rolando Espinoza). closes #330	2011-06-23 18:18:29 -03:00
Pablo Hoffman	d97a9d8731	improved errors of ItemLoader.load_item() so that it shows the field name and value of the output processor that failed	2011-06-23 12:39:51 -03:00
Pablo Hoffman	fbafb295e8	removed DEFAULT_ITEM_CLASS setting from settings in new project template	2011-06-23 11:34:28 -03:00
Pablo Hoffman	d197895d8f	removed deprecated code	2011-06-21 18:06:04 -03:00
Pablo Hoffman	d8775a7575	removed old deprecated FileExportPipeline	2011-06-21 18:01:05 -03:00
Pablo Hoffman	0305ffdd6c	sitemaps: support trailing spaces in <loc> elements	2011-06-20 21:22:16 -03:00
Pablo Hoffman	2e74ccaa7e	dropped InitSpider super class from CrawlSpider and Feed spiders, to avoid potentially confusing code, as it's also not needed	2011-06-20 13:10:13 -03:00
Pablo Hoffman	03bc218987	fixed bug in get_engine_status() function	2011-06-20 11:09:01 -03:00
Pablo Hoffman	03a92a8b03	slightly improved version of scrapyd script	2011-06-20 11:04:38 -03:00
Pablo Hoffman	5de5cac43e	added quick script script to launch scrapyd	2011-06-20 10:48:34 -03:00
Pablo Hoffman	841007b5c5	added envvar SCRAPY_VERSION_FROM_HG=1 to extras/makedeb.py script	2011-06-18 03:31:47 -03:00
Pablo Hoffman	7e5e00cea5	Added public engine.download() method to use the downloader bypassing the scheduler. Changed media pipeline to use engine.download() to prevent deadlocks.	2011-06-18 02:52:21 -03:00
Pablo Hoffman	dd90e83eae	get_engine_status(): also look up open spiders in scraper component	2011-06-18 02:48:01 -03:00
Pablo Hoffman	e575e015c1	LogStats extension: fixed KeyError bug caused with spiders that don't scrape any items	2011-06-17 16:50:02 -03:00
Pablo Hoffman	cfc93ba9db	added SitemapSpider to basic spider assertion tests	2011-06-16 10:20:28 -03:00
Pablo Hoffman	25b0ca3125	minor imports sort out	2011-06-16 10:19:27 -03:00
Pablo Hoffman	59acb129e5	scrapyd activate_egg(): don't override SCRAPY_SETTINGS_MODULE envvar if already set	2011-06-15 19:35:03 -03:00
Pablo Hoffman	cd52a7c83b	removed debugging print	2011-06-15 12:35:54 -03:00
Pablo Hoffman	57c43fdce6	added SitemapSpider, with tests and doc	2011-06-15 11:54:34 -03:00
Pablo Hoffman	91dc46539f	added LogStats extension for periodically logging basic stats (like crawled pages and scraped items)	2011-06-14 00:50:05 -03:00
Pablo Hoffman	d2a9c0fdcd	issue deprecation warning when using CLOSESPIDER_ITEMPASSED setting	2011-06-13 22:34:01 -03:00
Pablo Hoffman	841e9913db	renamed CLOSESPIDER_ITEMPASSED setting to CLOSESPIDER_ITEMCOUNT, to follow the refactoring done in r2630	2011-06-13 16:58:51 -03:00
Pablo Hoffman	5dea6be513	use log for dumping stack trace and engine status, in StackTraceDump extension	2011-06-13 14:28:03 -03:00
Pablo Hoffman	72cf5a97c3	added -e\|--edit option to genspider command	2011-06-13 09:54:06 -03:00
Pablo Hoffman	80b557849a	fixed test broken in previous commit	2011-06-12 02:55:21 -03:00
Pablo Hoffman	0d5399d0bf	fixed scrapyd tests on win32. closes #295	2011-06-12 02:46:41 -03:00
Pablo Hoffman	c434d11f09	added Darian Moody to AUTHORS	2011-06-12 01:42:30 -03:00
Darian Moody	6873d5b952	Added to tests for last commit; now tests to make sure custom primary keys are editable from the Scrapy Item. --- scrapy/tests/test_djangoitem/__init__.py \| 15 ++++++++++++++- scrapy/tests/test_djangoitem/models.py \| 7 +++++++ 2 files changed, 21 insertions(+), 1 deletions(-)	2011-06-12 01:41:10 -03:00
Darian Moody	05101c7bba	Fixed DjangoItem to work properly with auto-generated fields (such as the primary key); it will now ignore those that have had the auto_created flag set - this now allows us to work with custom primary keys as the previous way ignored a custom primary key field. --- scrapy/contrib_exp/djangoitem.py \| 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-)	2011-06-12 01:41:09 -03:00
Pablo Hoffman	37830da1f6	fixed wrong code in test	2011-06-10 18:27:39 -03:00
Pablo Hoffman	c4a607fc78	Raise ValueError if url has no scheme in Request constructor	2011-06-10 18:22:36 -03:00
Pablo Hoffman	88e33ad0ad	Simplified Request/Response __repr__ to be the same as __str__. This improves legibility and shouldn't affect any functionality, since we never use __repr__ for reconstructing a response AFAIK. Also fixes #318	2011-06-09 00:15:53 -03:00
Pablo Hoffman	07df0edf74	scrapyd.webservice: use twisted.web multipart data parsing, to simplify code. closes #324	2011-06-08 14:17:04 -03:00
Pablo Hoffman	7643f14c88	fixed bug handling truncated gzipped responses. closes #319	2011-06-06 18:25:14 -03:00
Pablo Hoffman	48509b036a	fixed some tests accidentally broken in previous commit	2011-06-06 16:11:43 -03:00
Pablo Hoffman	f793515565	make --headers output of fetch command resemble curl format, and also show request headers	2011-06-06 15:21:50 -03:00

1 2 3 4 5 ...

2741 Commits