scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 08:24:16 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	cb9c937f50	minor code rearrangement for consistency	2011-07-26 18:49:01 -03:00
Pablo Hoffman	dd020e184f	removed rather useless (and some deprecated) docstrings	2011-07-26 18:45:51 -03:00
Pablo Hoffman	70493c754d	retry middleware: added TCPTimedOutError to exceptions to retry	2011-07-25 14:52:24 -03:00
Pablo Hoffman	6f0e492390	fixed bug with scraper KeyError's on some ConnectionLost errors. closes #334	2011-07-25 12:24:26 -03:00
Pablo Hoffman	6e50f94406	engine: make it more explicit that we don't need to return the value of nextcall.schedule()	2011-07-25 10:47:54 -03:00
Pablo Hoffman	ea3bf6d95d	more core refactoring including moving engine next request call logic to a separate class	2011-07-25 10:46:00 -03:00
Pablo Hoffman	209ecdf471	updated settings.py for djangoitem tests to new django multi-db format	2011-07-25 00:49:54 -03:00
Pablo Hoffman	2ac08a713d	downloader: renamed SpiderInfo to Slot, for consistency with engine and scraper names	2011-07-22 02:06:10 -03:00
Pablo Hoffman	d6b83fee3e	scraper: renamed SpiderInfo to Slot, for consistency with engine names	2011-07-22 02:01:05 -03:00
Pablo Hoffman	f19442425a	forked UnicodeDammit from BeautifulSoup to explicitly disable usage of chardet library	2011-07-20 17:41:53 -03:00
Pablo Hoffman	7d18fe18e2	added missing import	2011-07-20 17:05:21 -03:00
Pablo Hoffman	0e008268e1	removed SimpledbStatsCollector from scrapy code, it was moved to https://github.com/scrapinghub/scaws	2011-07-20 10:38:16 -03:00
Pablo Hoffman	b6b0a54d9f	removed FAQ entry	2011-07-20 01:31:36 -03:00
Pablo Hoffman	cc6ef3beb2	engine: renamed slot.requests to slot.start_requests	2011-07-20 01:18:34 -03:00
Pablo Hoffman	de0bf22010	speed up consumption of spider start requests, while the engine has capacity to process them	2011-07-20 01:18:00 -03:00
Pablo Hoffman	9f742fc97c	removed unused import from 'crawl' spider template	2011-07-20 01:04:16 -03:00
Pablo Hoffman	e3f640c7bf	added FAQ entry about scrapy deploy issue on Mac + Python 2.5	2011-07-19 19:53:32 -03:00
Pablo Hoffman	75e2c3eb33	moved spider queues to scrapyd --HG-- rename : scrapy/spiderqueue.py => scrapyd/spiderqueue.py rename : scrapy/tests/test_spiderqueue.py => scrapyd/tests/test_spiderqueue.py	2011-07-19 19:39:27 -03:00
Pablo Hoffman	d97d6d20c6	removed no longer used settings	2011-07-19 19:31:19 -03:00
Pablo Hoffman	442c0bdc18	removed SQSSpiderQueue from base scrapy code, it was moved to https://github.com/scrapinghub/scaws	2011-07-19 14:11:55 -03:00
Daniel Grana	bdd627fe1d	allow overriding store_uri by extending ImagePipeline --HG-- extra : rebase_source : 5c561b8282f733ab0f26607059dd96d858154426	2011-07-15 15:17:38 -03:00
Pablo Hoffman	84f518fc5e	More core changes: * removed execution queue (replaced by newer spider queues) * added real support for returning iterators in Spider.start_requests() * removed support for passing urls to 'scrapy crawl' command	2011-07-15 15:18:39 -03:00
Daniel Grana	4dadeb7ccb	fix issue with responses preventing spiders to be idle in engine counts	2011-07-15 13:57:24 -03:00
Pablo Hoffman	d207c0afe4	fixed bug in engine.download() method	2011-07-15 12:55:07 -03:00
Pablo Hoffman	830255eea3	removed deprecated commands: queue, runserver	2011-07-14 01:41:24 -03:00
Pablo Hoffman	359129adf9	fixed python pass handling in cmdline/commands tests so that it works with new w3lib library	2011-07-14 01:40:31 -03:00
Pablo Hoffman	dbad1373f1	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-07-13 18:44:54 -03:00
Pablo Hoffman	18cb4ff1d8	added natty to list of supporte ubuntu distros	2011-07-13 18:43:52 -03:00
Pablo Hoffman	39a2ea97c8	redirect mw: added REDIRECT_ENABLED setting and documented the other settings	2011-07-13 14:18:15 -03:00
Pablo Hoffman	0b6c7ce9b8	improved download errors propagation to the spiders, and removed no longer needed code to simplify	2011-07-13 14:10:05 -03:00
Pablo Hoffman	804c0279ec	setup.py: only add lxml requirement if libxml2 is not available	2011-07-13 13:04:42 -03:00
Pablo Hoffman	541ed3913b	retry middleware: added RETRY_ENABLED setting and documented the other settings more properly, also improved messages when no longer retrying requests	2011-07-13 11:55:05 -03:00
Pablo Hoffman	763f3dc628	minor update to doc	2011-07-12 19:56:39 -03:00
Pablo Hoffman	bfda9ec319	added clarification about scrapy versioning including the recently adopted odd/even versioning scheme --HG-- rename : docs/api-stability.rst => docs/versioning.rst	2011-07-12 19:53:23 -03:00
Pablo Hoffman	4fde1ef94d	added CloseSpider exception, to manually close spiders	2011-07-12 14:24:10 -03:00
Pablo Hoffman	4bb409923c	improved encoding detection by adding support for HTML5 meta charset	2011-07-12 09:52:50 -03:00
Pablo Hoffman	67213ce673	logformatter: support non-ascii characters in custom implementations of Item.__str__()	2011-07-12 01:16:06 -03:00
Pablo Hoffman	31a375bde7	Close the scheduler after closing the scraper and downloader. This shouldn't have any real effect in practice, but it feels more appropiate to close the components in this order	2011-07-10 04:18:50 -03:00
Pablo Hoffman	90b1ae694c	get_engine_status(): preserve test order defined in code	2011-07-10 04:10:20 -03:00
Pablo Hoffman	409aaade0b	Refactored close spider behaviour so that the engine now waits for all downloading (and enqueued for download) requests to finish and their responses to be processed in the scraper/spiders, before closing the spider. This will be required in the future to avoid loosing requests when we add scheduler persistence and it's also a more correct behaviour overall. The closing process has also been refactored to remove unneeded closing state from downloader and leave it only in the engine. Finally, some unused methods has been removed too, like spider_is_open() for engine and scheduler.	2011-07-08 11:40:19 -03:00
Pablo Hoffman	574b070bb4	fixed minor bug in sitemap parser	2011-07-08 09:33:56 -03:00
Pablo Hoffman	ab9b786791	Updated CAMELCASE_EXCLUDE_CHARS to also exclude digits (patch by Adam Wentz)	2011-07-06 20:11:11 -03:00
Pablo Hoffman	7abc4b4c5a	fixed typo	2011-07-06 01:35:21 -03:00
Pablo Hoffman	949e11ee31	SitemapSpider: added support for parsing gzipped sitemaps (patch contributed by Rolando Espinoza)	2011-07-06 01:33:46 -03:00
Pablo Hoffman	5707051352	fixed httpcompression middleware tests	2011-07-04 21:31:05 -03:00
Pablo Hoffman	81fbe8c9a4	added x-gzip to supported encoding declarations in httpcompression middleware	2011-07-04 21:27:24 -03:00
Pablo Hoffman	a5223881ee	removed debugging code	2011-06-30 02:28:53 -03:00
Pablo Hoffman	5275343fa1	use handle_httpstatus_all=True in scrapy shell	2011-06-28 17:27:40 -03:00
Pablo Hoffman	7cd559eca5	SitemapSpider: ignore non-xml responses. fixes #331	2011-06-27 10:02:16 -03:00
Pablo Hoffman	db5cae7c03	SitemapSpider: added support for filtering which sitemaps to follow (patch contributed by Rolando Espinoza). closes #330	2011-06-23 18:18:29 -03:00

1 2 3 4 5 ...

2726 Commits