scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 15:28:29 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	e5b99a56c4	Several core changes: Execution Manager: * added control_reactor argument to delegate external twisted reactor control (for example by twistd) * now it loads spiders (if not already loaded) * now it stars the log (if not already started) * removed args from configure() method removed *opts from runonce and start methods Execution engine: added control_reactor argument to to delegate external twisted reactor control (for example by twistd) * changed some functions and method names for clarity * improve handling of exceptions in st() method * regrouped close_domain, closed_domain, and _close_domain method for legibilty Scheduler: * replaced pending_domains_count (dict) by pending_domains (set) * simplified some doc	2009-06-15 19:44:26 -03:00
Pablo Hoffman	3c919f2562	Several core changes: Execution Manager: * added control_reactor argument to delegate external twisted reactor control (for example by twistd) * now it loads spiders (if not already loaded) * now it stars the log (if not already started) * removed args from configure() method removed *opts from runonce and start methods Execution engine: added control_reactor argument to to delegate external twisted reactor control (for example by twistd) * changed some functions and method names for clarity * improve handling of exceptions in st() method * regrouped close_domain, closed_domain, and _close_domain method for legibilty Scheduler: * replaced pending_domains_count (dict) by pending_domains (set) * simplified some doc	2009-06-15 19:40:56 -03:00
Pablo Hoffman	5e3ef5a2fd	item pipeline: added check for domain not already closed	2009-06-15 18:59:40 -03:00
Pablo Hoffman	aeb9734a80	downloader: made log message visible only when debug_mode is on	2009-06-15 18:58:37 -03:00
Pablo Hoffman	ff76f46d5a	removed noisy comment and moved import to the top	2009-06-15 18:55:09 -03:00
Pablo Hoffman	1d8cec63d1	scrapy.log: check if twisted log started before	2009-06-15 18:50:47 -03:00
daniel	a8d430b4dd	httpcache: add domain to logging message	2009-06-15 12:35:42 -03:00
Pablo Hoffman	fd0e490157	added StatsMailer extension	2009-06-12 15:38:21 -03:00
Pablo Hoffman	7c2476bb25	fixed a couple of bugs caused by adding priority to Requests (thanks Artem for reporting)	2009-06-12 08:31:30 -03:00
Pablo Hoffman	4a1a01354b	Added 'priority' attribute to Requests and removed old 'priority' argument passed through engine, scheduler and scheduler middleware calls	2009-06-11 22:25:47 -03:00
Pablo Hoffman	962dbeba88	fixed typo in docstring	2009-06-11 08:33:01 -03:00
Pablo Hoffman	e55158ebdd	Merged olveyra's patch	2009-06-10 18:00:32 -03:00
Pablo Hoffman	635ac1ca64	Simplified domain prioritizers, so that they don't receive domains in the constructor (domain prioritizers will be refactored later anyway) and simplified Scrapy Manager code thanks to this. Added make_request_from_url method to BaseSpider, splitting funtionality to create requests from URLs which was previously done all in start_requests.	2009-06-10 14:21:36 -03:00
Pablo Hoffman	a74b0b1764	additional simplification of OffsiteMiddleware	2009-06-09 13:09:35 -03:00
Pablo Hoffman	eca05c9e12	OffsiteMiddleware: removed logging and simplified implementation	2009-06-09 12:37:15 -03:00
molveyra	6524def4b8	dont check guid in RobustScrapedItem.validate. Instead, raise NotImplemented.	2009-06-04 10:44:40 -03:00
Daniel Grana	87fbc9c58c	spidermw: add domain name to warning about missing callbacks in requests	2009-05-28 21:47:41 -03:00
Daniel Grana	727e67af5e	spidermw: ignore and warn about requests without callback returned by spiders	2009-05-28 21:41:02 -03:00
Daniel Grana	cfafa01109	spidermw: check for __iter__ instead of trying to iter() that may cause that a string pass as iterable	2009-05-28 21:10:30 -03:00
Pablo Hoffman	0f690b03dc	added deprecation warning to ErrorPages downloader middleware	2009-05-28 13:57:25 -03:00
Pablo Hoffman	1aac694343	updated settings doc	2009-05-28 13:52:56 -03:00
Pablo Hoffman	04e7f8f5f6	merged with Daniel's HttpException-removal branch	2009-05-28 13:45:26 -03:00
Daniel Grana	abda5edf09	decompressionmw: dont try to do decompress empty responses	2009-05-28 09:31:43 -03:00
Daniel Grana	85dbdf5789	finally remove HttpException in this changeset: * remove HttpException from engine and core exceptions * replace dwmw ErrorPages with spidermw HttpError * bugfix image pipeline media_to_download method when stat_key returns None	2009-05-28 09:30:31 -03:00
Daniel Grana	0e5bea67fd	images: adapt images pipeline to recent changes on HttpException topic	2009-05-28 00:27:42 -03:00
Daniel Grana	7eaa3ed24d	stop raising HttpException at download handlers and adapt download middlewares	2009-05-27 16:51:36 -03:00
Daniel Grana	c8827552b6	fix typo at WEBCONSOLE_ENABLED setting documentaion of default value. thanks dzen	2009-05-26 15:48:34 -03:00
Pablo Hoffman	89950af834	cluster: fixed KeyError when crawler process failed to start	2009-05-25 23:45:10 -03:00
Pablo Hoffman	6d1ffa7137	renamed CrawlDebug downloader middleware to DebugMiddleware	2009-05-25 20:14:50 -03:00
Pablo Hoffman	b1dad251ae	Deprecated Common Downloader Middleware and added DefaultHeaders Downloader Middleware	2009-05-25 14:41:06 -03:00
Pablo Hoffman	90d408b04f	Some changes to HTTP cache middleware: * documented * moved from scrapy.contrib.downloadermiddleware.cache.CacheMiddleware to scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware * settings prefix changed from CACHE2_ to HTTPCACHE_ --HG-- rename : scrapy/contrib/downloadermiddleware/cache.py => scrapy/contrib/downloadermiddleware/httpcache.py	2009-05-24 19:13:06 -03:00
Pablo Hoffman	19f2992b26	applied Patrick patch: test_storedb: add base class for both mysql tests	2009-05-23 18:31:54 -03:00
Daniel Grana	dae0b1973b	aws: missing import	2009-05-22 13:21:46 -03:00
Daniel Grana	4efcf78a4a	aws: take AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from enviroment just like boto does	2009-05-22 13:14:16 -03:00
Ismael Carnales	3955844115	Removed FieldValueError in favour of ValueError	2009-05-21 15:01:48 +00:00
Ismael Carnales	c03e246002	Added DateTimeField	2009-05-21 14:57:52 +00:00
Ismael Carnales	d5f0cae776	New implementation of Field and MultiValuedField	2009-05-21 14:55:14 +00:00
Ismael Carnales	0cc289ac84	New and simpler implementation of BooleanField	2009-05-21 14:51:50 +00:00
Ismael Carnales	55d922a4b0	Fixed BooleanField default value	2009-05-21 14:50:35 +00:00
Ismael Carnales	1ffe64dab3	Added test for newitem fields	2009-05-21 14:48:43 +00:00
Pablo Hoffman	48bfd3fe4b	renamed old setting	2009-05-20 02:15:31 -03:00
Pablo Hoffman	befd28eef4	docs/tutorial: added reminder about adding pipeline to ITEM_PIPELINES settings (thanks jamie)	2009-05-20 00:57:44 -03:00
Pablo Hoffman	04610a25dc	fixed bug in tutorial regarding csv writer pipeline, and other minor corrections	2009-05-19 03:07:08 -03:00
Daniel Grana	abfc52cd17	docs: modify install document to mercurial based installation instructions	2009-05-19 01:50:44 -03:00
Pablo Hoffman	13bb9934f9	moved htmlparser and lxml based link extractors to scrapy.contrib.linkextractors, with the rest of the link extractors	2009-05-18 23:06:27 -03:00
Pablo Hoffman	c161c29e08	simplified some scrapy.log implementation code	2009-05-18 21:32:17 -03:00
Pablo Hoffman	a8a3de17ef	removed unused line	2009-05-18 21:11:03 -03:00
Pablo Hoffman	b87734341d	fixed docstring	2009-05-18 20:59:26 -03:00
Pablo Hoffman	59e504a003	removed code from scrapy.link to avoid cyclic imports from scrapy.contrib.linkextractors.sgml	2009-05-18 19:27:51 -03:00
Pablo Hoffman	86498abdf1	Sorted out Link Extractors organization by moving all them to scrapy.contrib.linkextractors. The most relevant being: scrapy.link.extractors.RegexLinkExtractor which was moved to: scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor The old location still works but throws a deprecation warning. It will be removed before the 0.7 release. Documentation and tests were also updated. Also, in this changeset, a new regex-based link extractor was added to scrapy.contrib.linkextractors.regex. --HG-- rename : scrapy/tests/sample_data/link_extractor/regex_linkextractor.html => scrapy/tests/sample_data/link_extractor/sgml_linkextractor.html rename : scrapy/tests/test_link.py => scrapy/tests/test_contrib_linkextractors.py	2009-05-18 19:19:37 -03:00

1 2 3 4 5 ...

1273 Commits