scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-03-04 03:18:05 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	fd0e490157	added StatsMailer extension	2009-06-12 15:38:21 -03:00
Pablo Hoffman	7c2476bb25	fixed a couple of bugs caused by adding priority to Requests (thanks Artem for reporting)	2009-06-12 08:31:30 -03:00
Pablo Hoffman	4a1a01354b	Added 'priority' attribute to Requests and removed old 'priority' argument passed through engine, scheduler and scheduler middleware calls	2009-06-11 22:25:47 -03:00
Pablo Hoffman	962dbeba88	fixed typo in docstring	2009-06-11 08:33:01 -03:00
Pablo Hoffman	e55158ebdd	Merged olveyra's patch	2009-06-10 18:00:32 -03:00
Pablo Hoffman	635ac1ca64	Simplified domain prioritizers, so that they don't receive domains in the constructor (domain prioritizers will be refactored later anyway) and simplified Scrapy Manager code thanks to this. Added make_request_from_url method to BaseSpider, splitting funtionality to create requests from URLs which was previously done all in start_requests.	2009-06-10 14:21:36 -03:00
Pablo Hoffman	a74b0b1764	additional simplification of OffsiteMiddleware	2009-06-09 13:09:35 -03:00
Pablo Hoffman	eca05c9e12	OffsiteMiddleware: removed logging and simplified implementation	2009-06-09 12:37:15 -03:00
molveyra	6524def4b8	dont check guid in RobustScrapedItem.validate. Instead, raise NotImplemented.	2009-06-04 10:44:40 -03:00
Daniel Grana	87fbc9c58c	spidermw: add domain name to warning about missing callbacks in requests	2009-05-28 21:47:41 -03:00
Daniel Grana	727e67af5e	spidermw: ignore and warn about requests without callback returned by spiders	2009-05-28 21:41:02 -03:00
Daniel Grana	cfafa01109	spidermw: check for __iter__ instead of trying to iter() that may cause that a string pass as iterable	2009-05-28 21:10:30 -03:00
Pablo Hoffman	0f690b03dc	added deprecation warning to ErrorPages downloader middleware	2009-05-28 13:57:25 -03:00
Pablo Hoffman	1aac694343	updated settings doc	2009-05-28 13:52:56 -03:00
Pablo Hoffman	04e7f8f5f6	merged with Daniel's HttpException-removal branch	2009-05-28 13:45:26 -03:00
Daniel Grana	abda5edf09	decompressionmw: dont try to do decompress empty responses	2009-05-28 09:31:43 -03:00
Daniel Grana	85dbdf5789	finally remove HttpException in this changeset: * remove HttpException from engine and core exceptions * replace dwmw ErrorPages with spidermw HttpError * bugfix image pipeline media_to_download method when stat_key returns None	2009-05-28 09:30:31 -03:00
Daniel Grana	0e5bea67fd	images: adapt images pipeline to recent changes on HttpException topic	2009-05-28 00:27:42 -03:00
Daniel Grana	7eaa3ed24d	stop raising HttpException at download handlers and adapt download middlewares	2009-05-27 16:51:36 -03:00
Daniel Grana	c8827552b6	fix typo at WEBCONSOLE_ENABLED setting documentaion of default value. thanks dzen	2009-05-26 15:48:34 -03:00
Pablo Hoffman	89950af834	cluster: fixed KeyError when crawler process failed to start	2009-05-25 23:45:10 -03:00
Pablo Hoffman	6d1ffa7137	renamed CrawlDebug downloader middleware to DebugMiddleware	2009-05-25 20:14:50 -03:00
Pablo Hoffman	b1dad251ae	Deprecated Common Downloader Middleware and added DefaultHeaders Downloader Middleware	2009-05-25 14:41:06 -03:00
Pablo Hoffman	90d408b04f	Some changes to HTTP cache middleware: * documented * moved from scrapy.contrib.downloadermiddleware.cache.CacheMiddleware to scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware * settings prefix changed from CACHE2_ to HTTPCACHE_ --HG-- rename : scrapy/contrib/downloadermiddleware/cache.py => scrapy/contrib/downloadermiddleware/httpcache.py	2009-05-24 19:13:06 -03:00
Pablo Hoffman	19f2992b26	applied Patrick patch: test_storedb: add base class for both mysql tests	2009-05-23 18:31:54 -03:00
Daniel Grana	dae0b1973b	aws: missing import	2009-05-22 13:21:46 -03:00
Daniel Grana	4efcf78a4a	aws: take AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from enviroment just like boto does	2009-05-22 13:14:16 -03:00
Ismael Carnales	3955844115	Removed FieldValueError in favour of ValueError	2009-05-21 15:01:48 +00:00
Ismael Carnales	c03e246002	Added DateTimeField	2009-05-21 14:57:52 +00:00
Ismael Carnales	d5f0cae776	New implementation of Field and MultiValuedField	2009-05-21 14:55:14 +00:00
Ismael Carnales	0cc289ac84	New and simpler implementation of BooleanField	2009-05-21 14:51:50 +00:00
Ismael Carnales	55d922a4b0	Fixed BooleanField default value	2009-05-21 14:50:35 +00:00
Ismael Carnales	1ffe64dab3	Added test for newitem fields	2009-05-21 14:48:43 +00:00
Pablo Hoffman	48bfd3fe4b	renamed old setting	2009-05-20 02:15:31 -03:00
Pablo Hoffman	befd28eef4	docs/tutorial: added reminder about adding pipeline to ITEM_PIPELINES settings (thanks jamie)	2009-05-20 00:57:44 -03:00
Pablo Hoffman	04610a25dc	fixed bug in tutorial regarding csv writer pipeline, and other minor corrections	2009-05-19 03:07:08 -03:00
Daniel Grana	abfc52cd17	docs: modify install document to mercurial based installation instructions	2009-05-19 01:50:44 -03:00
Pablo Hoffman	13bb9934f9	moved htmlparser and lxml based link extractors to scrapy.contrib.linkextractors, with the rest of the link extractors	2009-05-18 23:06:27 -03:00
Pablo Hoffman	c161c29e08	simplified some scrapy.log implementation code	2009-05-18 21:32:17 -03:00
Pablo Hoffman	a8a3de17ef	removed unused line	2009-05-18 21:11:03 -03:00
Pablo Hoffman	b87734341d	fixed docstring	2009-05-18 20:59:26 -03:00
Pablo Hoffman	59e504a003	removed code from scrapy.link to avoid cyclic imports from scrapy.contrib.linkextractors.sgml	2009-05-18 19:27:51 -03:00
Pablo Hoffman	86498abdf1	Sorted out Link Extractors organization by moving all them to scrapy.contrib.linkextractors. The most relevant being: scrapy.link.extractors.RegexLinkExtractor which was moved to: scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor The old location still works but throws a deprecation warning. It will be removed before the 0.7 release. Documentation and tests were also updated. Also, in this changeset, a new regex-based link extractor was added to scrapy.contrib.linkextractors.regex. --HG-- rename : scrapy/tests/sample_data/link_extractor/regex_linkextractor.html => scrapy/tests/sample_data/link_extractor/sgml_linkextractor.html rename : scrapy/tests/test_link.py => scrapy/tests/test_contrib_linkextractors.py	2009-05-18 19:19:37 -03:00
pablo	7b34e08392	sorted out running of unittests: 1. removed scrapy.tests.run module which didn't work well because of a problem with Twisted trial 2. added runtests.bat script for running tests in windows 3. added additional lookup path for trial in unix systems	2009-05-16 20:11:23 -03:00
Pablo Hoffman	eb649d661d	fixed bug with unittests data in win32	2009-05-15 19:19:05 -03:00
Pablo Hoffman	ba48a24bb7	sorted out some tests sample data paths and fixed bug with test in windows	2009-05-15 15:03:42 -03:00
Pablo Hoffman	85cd7ea140	fixed encoding bug in xmliter (thanks Atamert!), added unittests and updated utils.iterator unittest names for consistency	2009-05-14 20:21:02 -03:00
Daniel Grana	7166f9f6f9	url2guid: allow reutrning None and single values --HG-- extra : rebase_source : 1a24237cb6d90d30fe8f086dbb210858f1627620	2009-05-14 14:10:09 -03:00
Pablo Hoffman	766c8a4ea8	fixed some doc typos reported by phaithful	2009-05-14 08:32:48 -03:00
Pablo Hoffman	314a1c2bc2	improved configuration of middlewares using dicts and orders (closes #85 )	2009-05-11 01:40:40 -03:00

... 28 29 30 31 32 ...

2616 Commits