scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-28 17:03:13 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	630db4fecf	Simplified file:// download handler, adding support for reading binary files	2010-09-05 05:59:40 -03:00
Pablo Hoffman	14e985b076	Updated Command line tool documentation	2010-09-05 05:29:58 -03:00
Pablo Hoffman	1190f97944	Updated settings documentation	2010-09-05 04:58:14 -03:00
Pablo Hoffman	ebdb733e95	Updated some old messages in Scrapy shell doc	2010-09-05 04:45:43 -03:00
Pablo Hoffman	2f12618890	Post reference to Scrapyd in FAQ	2010-09-05 04:35:27 -03:00
Pablo Hoffman	a66bef7925	Make execution queue poll interval configurable through a new QUEUE_POLL_INTERVAL setting	2010-09-05 02:23:08 -03:00
Pablo Hoffman	b800fdcb4d	SqliteSpiderQueue: failback to in-memory SQLite if database cannot be opened (typically due to missing write permissions)	2010-09-04 03:49:46 -03:00
Pablo Hoffman	bf34094e5a	Added versionadded:: notice to new documentation topics	2010-09-04 03:30:45 -03:00
Pablo Hoffman	e3921ab016	Don't set allowed_domains attribute in BaseSpider constructor	2010-09-04 03:20:05 -03:00
Daniel Grana	9f4b1e47a4	damn, really fix httpcache docs	2010-09-04 03:26:41 -03:00
Daniel Grana	7ad901640b	fix httpcache docs	2010-09-04 03:23:08 -03:00
Daniel Grana	1abaa79469	Make ignored schemes configurable in HttpCacheMiddleware. closes #224 --HG-- extra : rebase_source : 2e6e8b93c642290f9bd6eb634eb4c8cd6da07c75	2010-09-04 02:58:43 -03:00
Pablo Hoffman	5a6284ceb3	Added TODO:	2010-09-04 02:56:50 -03:00
Daniel Grana	58feb15528	httpcache must restore responses using response.url instead of request.url --HG-- extra : rebase_source : 08fa2c3862bb35db2234e0f9bb9cb9ce4a8f4d8d	2010-09-04 02:53:09 -03:00
Pablo Hoffman	7b9fa7fbaa	Don't filter out requests coming from spiders that don't define allowed_domains. Closes #225	2010-09-04 02:23:04 -03:00
Daniel Grana	1b11563383	monkeypatch urlparse if s3 netloc parsing fails (python issue7904). closes #223	2010-09-04 01:02:52 -03:00
Pablo Hoffman	9cfa8edd14	Automatic merge	2010-09-03 17:46:36 -03:00
Daniel Grana	9b68c3c1b1	Add S3 scheme request handler. closes #222	2010-09-03 16:19:47 -03:00
Daniel Grana	30d94b5bf5	Convert request handlers to classes and support NotConfigured. closes #221	2010-09-03 16:18:46 -03:00
Pablo Hoffman	37e9c5d78e	Added new Scrapy service with support for: * multiple projects * uploading scrapy projects as Python eggs * scheduling spiders using a JSON API Documentation is added along with the code. Closes #218. --HG-- rename : debian/scrapy-service.default => debian/scrapyd.default rename : debian/scrapy-service.dirs => debian/scrapyd.dirs rename : debian/scrapy-service.install => debian/scrapyd.install rename : debian/scrapy-service.lintian-overrides => debian/scrapyd.lintian-overrides rename : debian/scrapy-service.postinst => debian/scrapyd.postinst rename : debian/scrapy-service.postrm => debian/scrapyd.postrm rename : debian/scrapy-service.upstart => debian/scrapyd.upstart rename : extras/scrapy.tac => extras/scrapyd.tac	2010-09-03 15:54:42 -03:00
Pablo Hoffman	1b766877f1	Added ISpiderManager interface and a test to verify the default SpiderManager comforms to it	2010-09-03 14:29:27 -03:00
Pablo Hoffman	7cfc379230	Execution Queue refactoring by taking out the queue backend to a new Spider Queue API. Also ported SQS Execution Queue to Spider Queue API, and make the scrapy queue command use the Spider Queue directly, with deferreds support. Closes #220.	2010-09-03 14:29:27 -03:00
Pablo Hoffman	37776618a3	Changed format of scrapy.cfg file to contain a [settings] section and a 'default' key inside it, instead of the other way around	2010-09-02 21:22:17 -03:00
Pablo Hoffman	fb69655a9f	Removed unused imports	2010-08-31 21:40:05 -03:00
Pablo Hoffman	758d21b2f9	Simplified images pipeline by allowing it to be used without having to override it in your project. Closes #217	2010-08-31 16:03:08 -03:00
Pablo Hoffman	59f09c50e4	Yet another scrapy.cmdline code refactoring by removing --settings and --version options, adding a version command and adding a UsageError exception for signaling usage errors. Updated all commands accordingly	2010-08-28 18:06:51 -03:00
Pablo Hoffman	7394aa926d	Fixed typo	2010-08-28 14:47:19 -03:00
Pablo Hoffman	616ecc5a1a	Support passing spider arguments in crawl command with -a option. Closes #216	2010-08-28 14:43:28 -03:00
Pablo Hoffman	d4941a0619	Minor change to --pidfile argument	2010-08-28 14:07:03 -03:00
Pablo Hoffman	0ae4c3aa82	call Spider.closed() method (if it exists) on SpiderManager.close_spider()	2010-08-27 18:36:00 -03:00
Pablo Hoffman	35fd1a2660	Fixed typo	2010-08-27 17:21:30 -03:00
Pablo Hoffman	3234d76b8d	Restored SpiderManager.close_spider() method but using signals instead of calling it from the engine	2010-08-27 16:19:51 -03:00
Pablo Hoffman	783887457a	Moved tests to reflect new module location --HG-- rename : scrapy/tests/test_core_queue.py => scrapy/tests/test_queue.py	2010-08-27 15:50:12 -03:00
Pablo Hoffman	88c99e0b83	Check that arguments and keyword arguments are not passed simultaneously in jsonrpc_client_call()	2010-08-27 13:45:52 -03:00
Pablo Hoffman	ffad8e08e7	Support passing all keyword arguments to ExecutionQueue append_spider_name and append_url	2010-08-27 13:45:14 -03:00
Pablo Hoffman	4eb0383dd2	Added Scrapy to scrapy --version	2010-08-27 11:15:37 -03:00
Pablo Hoffman	e7b3247a18	Updated some missing references to scrapy-ws script	2010-08-27 01:05:59 -03:00
Pablo Hoffman	aab38be498	Print Scrapy on first log line	2010-08-27 00:53:26 -03:00
Pablo Hoffman	e14cc2c12a	Moved scrapy-ws script to extras/ and fixed broken methods due to changes in web service API --HG-- rename : bin/scrapy-ws.py => extras/scrapy-ws.py	2010-08-27 00:33:08 -03:00
Pablo Hoffman	648f700ed1	Fixed log formatter tests --HG-- rename : scrapy/tests/test_contrib_logformatter.py => scrapy/tests/test_logformatter.py	2010-08-26 23:23:58 -03:00
Pablo Hoffman	ad18d4a70e	Added pluggable log formatter	2010-08-26 23:19:35 -03:00
Pablo Hoffman	d1e260a8d4	Simplified engine by removing the configure() and kill() methods. Also simplified the Spider Manager by removing the close_spider() method	2010-08-26 22:20:04 -03:00
Pablo Hoffman	f6c11af4c2	Moved module: scrapy.core.queue to scrapy.queue --HG-- rename : scrapy/core/queue.py => scrapy/queue.py	2010-08-26 21:15:32 -03:00
Pablo Hoffman	747f090f94	Improved Twisted version detection (wasn't working for Twisted 10.0.0)	2010-08-26 20:32:26 -03:00
Pablo Hoffman	e95f7f63f9	Fixed typo	2010-08-25 21:06:10 -03:00
Pablo Hoffman	a82a4be3ab	Added docstring to test_engine.py	2010-08-25 21:04:47 -03:00
Pablo Hoffman	59d18cf99e	Fixed crawler reference	2010-08-25 19:59:51 -03:00
Pablo Hoffman	40b590cad3	Moved scrapy.cfg auto-discovery to scrapy.conf.EnvironmentSettings class	2010-08-25 19:59:30 -03:00
Pablo Hoffman	ef7a097272	Replaced old manager references with crawler	2010-08-25 19:31:04 -03:00
Pablo Hoffman	8fc78c4d0a	Refactoring of Crawler, Commands, Execution Queue and Spider Manager: Commands changes: * removed (somewhat hacky) --init argument from settings command * added set_crawler method to Commands, and a ``crawler`` property that returns a configured crawler. This way, commands that don't require a crawler (such as startproject) won't need to configure one. Execution Queue changes: * changed SERVICE_QUEUE_FILE setting to SQLITE_DB * removed SERVICE_QUEUE setting * added QUEUE_CLASS setting for defining the class to use for the execution queue * added SERVER_QUEUE_CLASS setting for defining the class to use for the execution queue in server mode (runserver command) Spider Manager changes: * simplified SpiderManager API by removing the load() method * added from_settings classmethod to SpiderManager * added spider_modules constructor argument to SpiderManager Crawler changes: * added install() method to Crawler (to install it in scrapy.project) and uninstall() to remove it * use CrawlerProcess.install() in scrapy.cmdline * use crawler.install() and crawler.uninstall() in tests that a crawler in scrapy.project * make telnet console and webservice play nicer with twisted by stopping listening when then engine goes down * refactored Scrapy engine tests - it no longer uses the crawler singleton. Closes #215.	2010-08-25 19:24:36 -03:00

... 6 7 8 9 10 ...

2616 Commits