scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 10:24:14 +00:00

Author	SHA1	Message	Date
Rolando Espinoza La fuente	35a7059636	cleanup and refactor of parse & fetch commands * removed scrapy.utils.fetch * each command schedule requests and start scrapy engine * fetch command instance BaseSpider if given url does not match any spider or match more than one * parse command schedule url if one spider matches * parse and fetch doesn't support multiple urls as parameter * force spider behavior --spider moved from BaseCommand to only commands: fetch, parse, crawl	2010-04-01 17:16:38 -03:00
Rolando Espinoza La fuente	dd477914db	spidermanager refactoring * Implements find/create method in Spider Manager API, removed fromdomain and fromurl This method is now in charge of spider resolution, it must return spider object from its argument or raise KeyError if no spider is found. This method obsoletes from_domain and from_url methods. The default implementation of resolve only searches against spider.name, it won't use spider.allowed_domains like the old fromdomain. This is the reason of why you must supply a spider if you want to crawl an url. Find methods returns only available spider names. Not spider instances. If no spider found returns empty list. Affected modules: * command.models (force_domain) * removed spiders.force_domain * each command pass spider to crawl_* commands * command.commands.* * crawl * set spider from opts.spider if arg is url * group urls by spider to instance spider just once * genspider * use spiders.create() to check spider id * parse * log error if more than one spider found * core.manager * on crawl_* log message if multiple spiders found for url or request * shell * prints "Multiple found" if more than one spider found for url or request * populate_vars(): added spider keyword parameter * contrib.spidermanager: * removed fromdomain() & fromurl() * new create(spider_id) -> Spider. Raises KeyError if spider not found * new find_by_request(request) -> list(spiders)	2010-04-01 17:16:38 -03:00
Rolando Espinoza La fuente	8db67b17a3	scrapy manager refactor * ExecutionManager * deprecated runonce(args) changed start() to start(keep_alive=Bool) * changed crawl(args) to crawl(requests, spider=None) if no spider given, tries to resolve spider for each request * added crawl_url(url, spider=None) * added crawl_request(request, spider=None) * added crawl_domain(domain) * added crawl_spider(spider) * updated commands: crawl, runspider, start * updated webconsole * updated crawler * updated tests.test_engine * updated utils.fetch	2010-04-01 17:16:38 -03:00
Pablo Hoffman	32f9c5fe68	removed old untested (and probably broken) code	2010-04-01 04:05:53 -03:00
Pablo Hoffman	4dc886e319	Improved comment	2010-03-31 18:26:35 -03:00
Pablo Hoffman	83d5eff0b7	More refactoring to encoding handling in TextResponse and subclasses	2010-03-31 18:21:41 -03:00
Pablo Hoffman	de896fa62d	Refactored implementation of Request.replace() and Response.replace()	2010-03-31 16:29:53 -03:00
Pablo Hoffman	2ed8a5bfb5	Automated merge with http://hg.scrapy.org/scrapy-0.8	2010-03-27 13:25:06 -03:00
Pablo Hoffman	2f75839e7a	Ignore noisy Twisted deprecation warnings	2010-03-27 13:23:13 -03:00
Pablo Hoffman	2299deda66	updated wrong link in doc	2010-03-26 14:02:33 -03:00
Pablo Hoffman	7cf2f87e27	Automated merge with http://hg.scrapy.org/scrapy-0.8	2010-03-26 08:29:34 -03:00
Pablo Hoffman	f19c939925	fixed doc typo	2010-03-26 08:28:32 -03:00
Daniel Grana	996a1b3574	fix handling of relative base urls in get_base_url util --HG-- extra : rebase_source : eb552219e6bf40bc0d2e35968c367105233b6ecc	2010-03-25 15:50:34 -03:00
Pablo Hoffman	1330697c3d	Some improvements to Response encoding support: * added encoding aliases, configurable through a new ENCODING_ALIASES setting * Response.encoding now returns the real encoding detected for the body * simplified TextResponse API by removing body_encoding() and headers_encoding() methods * Response.encoding now tries to infer the encoding from the body always (it was done before only on HtmlResponse and TextResponse) * removed scrapy.utils.encoding.add_encoding_alias() function * updated implementation of scrapy.utils.response function to reflect these API changes * updated documentation to reflect API changes	2010-03-25 15:47:10 -03:00
Daniel Grana	173e94386b	Support relative url used in base tag. closes #148 --HG-- extra : rebase_source : 1bff87c127a7e9d8d12c772b3068feb11eb5d97f	2010-03-25 12:38:37 -03:00
Pablo Hoffman	9ddcd1095d	sort setting alphabetically	2010-03-25 11:45:06 -03:00
Pablo Hoffman	cb49567ca6	Removed wrong line added in previous commit	2010-03-24 12:15:18 -03:00
Pablo Hoffman	45411926b5	Improved encoding support by explicitly passing encoding to all str_to_unicode() and unicode_to_str() calls	2010-03-24 12:14:07 -03:00
Pablo Hoffman	4fa833c849	Added LOG_ENCODING setting	2010-03-24 12:13:38 -03:00
Pablo Hoffman	87e68e7438	Made MailSender non IO-blocking, and improved MailSender documentation	2010-03-22 13:37:37 -03:00
Pablo Hoffman	1dfc79b5d0	Automated merge with http://hg.scrapy.org/scrapy-0.8	2010-03-20 20:48:11 -03:00
Pablo Hoffman	99a876754c	Improved "What else?" section of "Scrapy at a glance" overview	2010-03-20 20:24:18 -03:00
Pablo Hoffman	264cd2e035	Automated merge with http://hg.scrapy.org/scrapy-0.8	2010-03-19 10:32:42 -03:00
Pablo Hoffman	234fd709ad	fixed doc typo (thanks Victor)	2010-03-19 10:32:17 -03:00
Daniel Grana	184cf6684f	Remove HttpException references from docs. Since 0.7, scrapy returns non-200 as Response objects and does not raise HttpException anymore	2010-03-18 10:05:33 -03:00
Pablo Hoffman	403a21ec74	removed obsolete scrapy.crawler module	2010-03-12 17:28:33 -02:00
Daniel Grana	17091902f3	Explicity say where to save item class in "Defining our item" section of tutorial	2010-03-12 14:12:49 -02:00
Pablo Hoffman	54ae2c36d0	better implementation of open_in_browser() tests	2010-03-12 10:19:50 -02:00
Pablo Hoffman	38a296aa2c	Added tests to open_in_browser() function	2010-03-12 09:52:39 -02:00
Pablo Hoffman	2ab94d75e2	Automated merge with http://hg.scrapy.org/scrapy-0.8	2010-03-12 09:32:35 -02:00
Pablo Hoffman	c5cd8b9d3d	Fixed bug in open_in_browser() function with Python 2.5 (closes #145 ).	2010-03-12 09:31:05 -02:00
Pablo Hoffman	39e4df0cff	removed unmaintained (and untested) contrib_exp ShoveItemPipeline	2010-03-10 00:10:36 -02:00
Pablo Hoffman	a505a9d490	minor code refactoring on scrapy.command.cmdline module	2010-03-04 11:09:16 -02:00
Pablo Hoffman	4c1ec0c97e	replaced hacky command_executed dict by standard signal	2010-03-04 10:58:18 -02:00
Pablo Hoffman	861f9691c7	removed partly-obsolete module scrapy.contrib.groupsettings	2010-03-04 10:40:41 -02:00
Pablo Hoffman	d12cd22d5e	switched default scheduler order to DFO, which consumes less memory by default	2010-03-04 10:15:58 -02:00
Daniel Grana	700be3202b	Automated merge with ssh://hg.scrapy.org/scrapy-0.8	2010-02-24 15:44:41 -02:00
Daniel Grana	2322322ee6	Add missing priority and errback arguments to Request.replace method signature	2010-02-24 15:43:09 -02:00
Pablo Hoffman	180c091fb2	Fixed encoding issue (reported in #135 ) when the encoding declared in the HTTP header is unknown. This is the patch proposed by Rolando, with an update to the Request/Response documentation.	2010-02-24 14:01:29 -02:00
Pablo Hoffman	bbef0fe870	Automated merge with http://hg.scrapy.org/users/rolando/scrapy/	2010-02-20 11:12:37 -02:00
Rolando Espinoza La fuente	7b1ad321e3	examples/experimental: added imdb top movies spider	2010-02-19 21:31:17 -04:00
Pablo Hoffman	cb99edd153	simplified and improved AUTHORS file	2010-02-19 23:16:55 -02:00
Pablo Hoffman	a3d22c7240	Automated merge with http://hg.scrapy.org/scrapy-0.8/	2010-02-19 23:11:24 -02:00
Pablo Hoffman	60961e5499	minor documentation fix (refs #135 )	2010-02-19 23:09:48 -02:00
Pablo Hoffman	c1f8198639	Added RANDOMIZE_DOWNLOAD_DELAY setting	2010-02-19 21:53:18 -02:00
Rolando Espinoza La fuente	4a053a762f	examples/experimental: added gooledir crawler	2010-02-19 18:28:16 -04:00
Rolando Espinoza La fuente	a6a3f085a7	docs: added crawlspider v2 outline documentation Sign-Off: Rolando Espinoza La fuente	2010-02-19 18:22:38 -04:00
Rolando Espinoza La fuente	17d1543929	contrib_exp: added crawlspider v2 package + tests Sign-Off: Rolando Espinoza La fuente	2010-02-19 18:19:01 -04:00
Rolando Espinoza La fuente	7ddd4441e3	utils.python: added equal_attributes() to compare two objects arbitrary attributes Sign-Off: Rolando Espinoza La fuente	2010-02-19 17:57:48 -04:00
Rolando Espinoza La fuente	7235040936	merged upstream	2010-02-19 17:41:45 -04:00

1 2 3 4 5 ...

1975 Commits