scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 15:03:56 +00:00

Author	SHA1	Message	Date
Daniel Grana	47970e91bc	core: Invert request priority meaning, a higher request.priority value means more priority	2009-06-20 19:23:26 -03:00
Daniel Grana	18b6fecc47	Remove custom redirection priority of request returned by downloadermiddleware	2009-06-20 19:19:07 -03:00
Daniel Grana	00b49752ce	Multiples changes to core scheduling and duplicates filtering * removed starters from engine * moved schedulermiddleware to scheduler * raise RuntimeError when trying to open/close a scheduler domain twice * removed dupesfilter singleton and spidermw dupefilter middleware --HG-- extra : rebase_source : e4c3ad4b970cbc8f532bc751ba9d8a944ca16be5	2009-06-20 18:15:02 -03:00
Pablo Hoffman	728ec7c5c9	minor adjustment to FifoDomainScheduler and improved documentation of domain scheduler API (remove_pending_domain method removes all ocurrences)	2009-06-19 19:41:56 -03:00
Pablo Hoffman	161335fe78	Added domain schedulers (whose functionality was previously mixed with the Scrapy Scheduler) and removed domain prioritizers whose functionality became duplicated by the new domain schedulers.	2009-06-19 17:55:54 -03:00
Pablo Hoffman	3cb12e5dd3	Moved init_domain functionality out of the engine (refs #88 ) and into the spider level. A new spider (InitSpider, not yet documented) was added to provide initialization facilities. Also renamed make_request_from_url to make_requests_from_url and allowed it to return iterables.	2009-06-18 14:43:56 -03:00
Pablo Hoffman	b040e6a3a7	removed unused GenericSpider	2009-06-18 14:33:05 -03:00
Pablo Hoffman	e716fad03d	engine: log error when reactor.listenTCP fails instead of failing	2009-06-18 09:44:02 -03:00
Pablo Hoffman	d23dcbb390	engine: minor code simplification and fixed potential KeyError bug	2009-06-17 09:17:50 -03:00
daniel	9022f84c05	decompressionmw: use a temporal filename because None is failing in some cases	2009-06-17 05:01:04 -03:00
daniel	2c93bc40c6	decompressionmw: relative hardcoded filename raises OSError when process has not write access to CWD	2009-06-17 03:41:42 -03:00
daniel	d175ced4a1	Automated merge with ssh://hg.scrapy.org/scrapy	2009-06-17 02:45:46 -03:00
Pablo Hoffman	74f6bd3a4d	removed python2.5 from rpm-install.sh script	2009-06-16 13:14:40 -03:00
pablo	584a844e71	Fixed distutils packaging for all setup.py bdist formats --HG-- rename : scrapy/bin/scrapy-admin.py => bin/scrapy-admin.py	2009-06-16 17:10:43 +01:00
Pablo Hoffman	ead5466998	changed format of --set SETTING:VALUE to --set SETTING=VALUE	2009-06-16 11:06:42 -03:00
Pablo Hoffman	e5b99a56c4	Several core changes: Execution Manager: * added control_reactor argument to delegate external twisted reactor control (for example by twistd) * now it loads spiders (if not already loaded) * now it stars the log (if not already started) * removed args from configure() method removed *opts from runonce and start methods Execution engine: added control_reactor argument to to delegate external twisted reactor control (for example by twistd) * changed some functions and method names for clarity * improve handling of exceptions in st() method * regrouped close_domain, closed_domain, and _close_domain method for legibilty Scheduler: * replaced pending_domains_count (dict) by pending_domains (set) * simplified some doc	2009-06-15 19:44:26 -03:00
Pablo Hoffman	3c919f2562	Several core changes: Execution Manager: * added control_reactor argument to delegate external twisted reactor control (for example by twistd) * now it loads spiders (if not already loaded) * now it stars the log (if not already started) * removed args from configure() method removed *opts from runonce and start methods Execution engine: added control_reactor argument to to delegate external twisted reactor control (for example by twistd) * changed some functions and method names for clarity * improve handling of exceptions in st() method * regrouped close_domain, closed_domain, and _close_domain method for legibilty Scheduler: * replaced pending_domains_count (dict) by pending_domains (set) * simplified some doc	2009-06-15 19:40:56 -03:00
Pablo Hoffman	5e3ef5a2fd	item pipeline: added check for domain not already closed	2009-06-15 18:59:40 -03:00
Pablo Hoffman	aeb9734a80	downloader: made log message visible only when debug_mode is on	2009-06-15 18:58:37 -03:00
Pablo Hoffman	ff76f46d5a	removed noisy comment and moved import to the top	2009-06-15 18:55:09 -03:00
Pablo Hoffman	1d8cec63d1	scrapy.log: check if twisted log started before	2009-06-15 18:50:47 -03:00
daniel	a8d430b4dd	httpcache: add domain to logging message	2009-06-15 12:35:42 -03:00
Pablo Hoffman	fd0e490157	added StatsMailer extension	2009-06-12 15:38:21 -03:00
Pablo Hoffman	7c2476bb25	fixed a couple of bugs caused by adding priority to Requests (thanks Artem for reporting)	2009-06-12 08:31:30 -03:00
Pablo Hoffman	4a1a01354b	Added 'priority' attribute to Requests and removed old 'priority' argument passed through engine, scheduler and scheduler middleware calls	2009-06-11 22:25:47 -03:00
Pablo Hoffman	962dbeba88	fixed typo in docstring	2009-06-11 08:33:01 -03:00
Pablo Hoffman	e55158ebdd	Merged olveyra's patch	2009-06-10 18:00:32 -03:00
Pablo Hoffman	635ac1ca64	Simplified domain prioritizers, so that they don't receive domains in the constructor (domain prioritizers will be refactored later anyway) and simplified Scrapy Manager code thanks to this. Added make_request_from_url method to BaseSpider, splitting funtionality to create requests from URLs which was previously done all in start_requests.	2009-06-10 14:21:36 -03:00
Pablo Hoffman	a74b0b1764	additional simplification of OffsiteMiddleware	2009-06-09 13:09:35 -03:00
Pablo Hoffman	eca05c9e12	OffsiteMiddleware: removed logging and simplified implementation	2009-06-09 12:37:15 -03:00
molveyra	6524def4b8	dont check guid in RobustScrapedItem.validate. Instead, raise NotImplemented.	2009-06-04 10:44:40 -03:00
Daniel Grana	87fbc9c58c	spidermw: add domain name to warning about missing callbacks in requests	2009-05-28 21:47:41 -03:00
Daniel Grana	727e67af5e	spidermw: ignore and warn about requests without callback returned by spiders	2009-05-28 21:41:02 -03:00
Daniel Grana	cfafa01109	spidermw: check for __iter__ instead of trying to iter() that may cause that a string pass as iterable	2009-05-28 21:10:30 -03:00
Pablo Hoffman	0f690b03dc	added deprecation warning to ErrorPages downloader middleware	2009-05-28 13:57:25 -03:00
Pablo Hoffman	1aac694343	updated settings doc	2009-05-28 13:52:56 -03:00
Pablo Hoffman	04e7f8f5f6	merged with Daniel's HttpException-removal branch	2009-05-28 13:45:26 -03:00
Daniel Grana	abda5edf09	decompressionmw: dont try to do decompress empty responses	2009-05-28 09:31:43 -03:00
Daniel Grana	85dbdf5789	finally remove HttpException in this changeset: * remove HttpException from engine and core exceptions * replace dwmw ErrorPages with spidermw HttpError * bugfix image pipeline media_to_download method when stat_key returns None	2009-05-28 09:30:31 -03:00
Daniel Grana	0e5bea67fd	images: adapt images pipeline to recent changes on HttpException topic	2009-05-28 00:27:42 -03:00
Daniel Grana	7eaa3ed24d	stop raising HttpException at download handlers and adapt download middlewares	2009-05-27 16:51:36 -03:00
Daniel Grana	c8827552b6	fix typo at WEBCONSOLE_ENABLED setting documentaion of default value. thanks dzen	2009-05-26 15:48:34 -03:00
Pablo Hoffman	89950af834	cluster: fixed KeyError when crawler process failed to start	2009-05-25 23:45:10 -03:00
Pablo Hoffman	6d1ffa7137	renamed CrawlDebug downloader middleware to DebugMiddleware	2009-05-25 20:14:50 -03:00
Pablo Hoffman	b1dad251ae	Deprecated Common Downloader Middleware and added DefaultHeaders Downloader Middleware	2009-05-25 14:41:06 -03:00
Pablo Hoffman	90d408b04f	Some changes to HTTP cache middleware: * documented * moved from scrapy.contrib.downloadermiddleware.cache.CacheMiddleware to scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware * settings prefix changed from CACHE2_ to HTTPCACHE_ --HG-- rename : scrapy/contrib/downloadermiddleware/cache.py => scrapy/contrib/downloadermiddleware/httpcache.py	2009-05-24 19:13:06 -03:00
Pablo Hoffman	19f2992b26	applied Patrick patch: test_storedb: add base class for both mysql tests	2009-05-23 18:31:54 -03:00
Daniel Grana	dae0b1973b	aws: missing import	2009-05-22 13:21:46 -03:00
Daniel Grana	4efcf78a4a	aws: take AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from enviroment just like boto does	2009-05-22 13:14:16 -03:00
Ismael Carnales	3955844115	Removed FieldValueError in favour of ValueError	2009-05-21 15:01:48 +00:00

1 2 3 4 5 ...

1188 Commits