scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 20:23:56 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	91b9d89ffd	moved scrapy.utils.sqlite to scrapyd.sqlite --HG-- rename : scrapy/utils/sqlite.py => scrapyd/sqlite.py rename : scrapy/tests/test_utils_sqlite.py => scrapyd/tests/test_sqlite.py	2011-08-27 01:20:57 -03:00
Pablo Hoffman	075a2d62d3	scrapyd: added support for passing custom settings to schedule.json	2011-08-27 01:02:14 -03:00
Pablo Hoffman	75e2c3eb33	moved spider queues to scrapyd --HG-- rename : scrapy/spiderqueue.py => scrapyd/spiderqueue.py rename : scrapy/tests/test_spiderqueue.py => scrapyd/tests/test_spiderqueue.py	2011-07-19 19:39:27 -03:00
Pablo Hoffman	59acb129e5	scrapyd activate_egg(): don't override SCRAPY_SETTINGS_MODULE envvar if already set	2011-06-15 19:35:03 -03:00
Pablo Hoffman	80b557849a	fixed test broken in previous commit	2011-06-12 02:55:21 -03:00
Pablo Hoffman	0d5399d0bf	fixed scrapyd tests on win32. closes #295	2011-06-12 02:46:41 -03:00
Pablo Hoffman	07df0edf74	scrapyd.webservice: use twisted.web multipart data parsing, to simplify code. closes #324	2011-06-08 14:17:04 -03:00
Jochen Maes	47a7f154ab	Add listjobs.json to Scrapyd API You can use listjobs.json with project=<projectname> to get a list of projects that are running currently. It returns a list of jobs with spidername and job-id. Signed-off-by: Jochen Maes <jochen.maes@sejo.be> --- scrapyd/webservice.py \| 9 +++++++++ scrapyd/website.py \| 1 + 2 files changed, 10 insertions(+), 0 deletions(-)	2011-03-09 14:22:10 -02:00
Pablo Hoffman	65fc2fbd1f	Set CONCURRENT_SPIDERS=1 in Scrapyd to force one spider per process	2011-02-04 13:30:01 -02:00
Pablo Hoffman	048044c1f8	A couple of changes to fix #303 : * improved detection of inside-project environments * make list command faster (by only instantiating the spider manger) * print a warning when extensions (middlewares, etc) are disabled with a message on NotConfigured exception * assert that scrapy configuration hasn't been loaded in scrapyd.runner * simplified IgnoreRequest exception, to avoid loading settings when importing scrapy.exceptions * added test to make sure certain modules don't cause scrapy.conf module to be loaded, to ensure the scrapyd runner bootstraping performs properly	2011-01-05 15:59:43 -02:00
Pablo Hoffman	3d8b368fc6	scrapyd: use runner from config (if not specified) on get_spider_list()	2010-12-28 11:16:58 -02:00
Pablo Hoffman	fa644f7a5e	Some simplifications to Scrapyd architecture and internals: - launcher no longer knows about egg storage - removed get_spider_list_from_eggifile() file and replaced by simpler get_spider_list() which doesn't receive en egg file as argument - changed "egg runner" name to just "runner" to reflect the fact that it doesn't necesarilly run eggs (though it does in the default case) --HG-- rename : scrapyd/eggrunner.py => scrapyd/runner.py	2010-12-27 16:22:32 -02:00
Pablo Hoffman	9cd649b3a0	scrapyd: populate SCRAPY_SPIDER and SCRAPY_JOB environment variables	2010-12-26 19:32:56 -02:00
Pablo Hoffman	1c8d74eb5b	scrapyd: populate SCRAPY_SLOT environment variable with the scrapyd slot number	2010-12-24 12:47:59 -02:00
Pablo Hoffman	b19ff21acd	scrapyd: added support for deferred spider queues	2010-12-10 15:55:40 -02:00
Pablo Hoffman	831dc818d6	scrapyd: added more information webui homepage	2010-11-30 18:43:59 -02:00
Pablo Hoffman	a3d30c35fe	scrapyd: log url where web console can be accesed	2010-11-30 17:58:34 -02:00
Pablo Hoffman	823fd9822c	scrapyd: fixed bug discovering the current project scrapy.cfg file	2010-11-30 16:19:17 -02:00
Pablo Hoffman	7b84591ea9	added command for starting a scrapyd server for the current project	2010-11-30 15:52:15 -02:00
Pablo Hoffman	5a46ce47ee	scrapyd: add extra_sources consturctor argument, and also read scrapyd configuratoin from current project's scrapy.cfg file	2010-11-30 15:47:05 -02:00
Pablo Hoffman	c02d6db6a3	scrapyd: force application to receive config as argument	2010-11-30 15:46:24 -02:00
Pablo Hoffman	85890a5092	scrapyd: log process logfile when process starts/finishes	2010-11-30 15:45:42 -02:00
Pablo Hoffman	5c4f562ec4	scrapyd: changed keys used in poller message to _project, _spider, _job, and added link to log file in web ui	2010-11-30 13:03:20 -02:00
Pablo Hoffman	df54ed0041	Some Scrapyd enhancements: * added minimal web ui * return unique id per job (spider scheduled) * store one log per spider run (job) and rotate them, keeping the last N logs (where N is configurable through settings)	2010-11-30 02:26:31 -02:00
Pablo Hoffman	46e5d694e6	Scrapyd: return project and version in addversion.json	2010-11-29 17:22:28 -02:00
Pablo Hoffman	bbffa59497	Some changes to Scrapyd: * Always start one process per spider * Added max_proc_per_cpu option (defaults to 4) * Return the number of spiders (instead of a list of them) in schedule.json	2010-11-29 17:19:05 -02:00
Pablo Hoffman	5bdffadbe3	Simplified get_spider_list_from_eggfile() function now that it doesn't need to chdir to a custom directory (Scrapy now works when it's unable to create the SQLite database)	2010-11-05 11:48:12 -02:00
Pablo Hoffman	31bbcc9476	Raise error when egg is corrupt in activate_egg(). Use a more descriptive name for temporary dirs in get_spider_list_from_eggfile(). Make scrapyd webservice pass egg_runner to get_spider_list_from_eggfile()	2010-11-05 11:24:33 -02:00
Pablo Hoffman	de4909faca	get_spider_list_from_eggfile(): more improvements to error messages, and support passing eggruner module as argument	2010-11-04 18:56:11 -02:00
Pablo Hoffman	7ba972d8cf	get_spider_list_from_eggfile(): fail if unable to extract spider list	2010-11-04 16:27:47 -02:00
Pablo Hoffman	a8be54a8ea	scrapyd: make Environment tests independent of the current OS environment --HG-- rename : scrapyd/tests/test_envion.py => scrapyd/tests/test_environ.py	2010-10-27 06:49:15 -02:00
Pablo Hoffman	a3a108dc71	fixed some compatibility issues with python 2.5 in scrapyd	2010-10-26 17:21:43 -02:00
Pablo Hoffman	a4639ffb06	Removed hacky SCRAPY_SETTINGS_DISABLED environment variable	2010-09-22 16:08:18 -03:00
Pablo Hoffman	f3769651af	Refactored Scrapyd code to fix a couple of bugs that ocurred when running projects without eggs	2010-09-22 01:04:15 -03:00
Pablo Hoffman	4c61df7abb	get_spider_list_from_eggfile(): fixed bug when SCRAPY_SETTINGS_DISABLED is set	2010-09-20 08:47:55 -03:00
Pablo Hoffman	400c4134af	Make scrapyd.eggutils compatible with Python 2.5 and added tests for get_spider_list_from_eggfile() function (closes #242 )	2010-09-19 21:08:27 -03:00
Pablo Hoffman	4cecbcdc5b	Fixed bug in Scrapyd launcher when running projects without eggs. Refs #238	2010-09-15 21:03:43 -03:00
Pablo Hoffman	c559b06a85	Removed unused import	2010-09-14 01:53:05 -03:00
Pablo Hoffman	833baa6041	Support running projects without eggs in Scrapyd. Closes #238	2010-09-14 01:44:25 -03:00
Pablo Hoffman	b76cd42690	Added tests for Scrapyd components. Closes #237	2010-09-14 01:44:10 -03:00
Pablo Hoffman	9158e9d682	Some changes to Scrapyd to support multiple configuration files, to make it easier to deploy Scrapyd applications. Also documented 'egg_runner' and 'application' options --HG-- rename : debian/scrapyd.cfg => debian/000-default rename : scrapyd/default_scrapyd.cfg => scrapyd/default_scrapyd.conf	2010-09-07 09:17:25 -03:00
Pablo Hoffman	37e9c5d78e	Added new Scrapy service with support for: * multiple projects * uploading scrapy projects as Python eggs * scheduling spiders using a JSON API Documentation is added along with the code. Closes #218. --HG-- rename : debian/scrapy-service.default => debian/scrapyd.default rename : debian/scrapy-service.dirs => debian/scrapyd.dirs rename : debian/scrapy-service.install => debian/scrapyd.install rename : debian/scrapy-service.lintian-overrides => debian/scrapyd.lintian-overrides rename : debian/scrapy-service.postinst => debian/scrapyd.postinst rename : debian/scrapy-service.postrm => debian/scrapyd.postrm rename : debian/scrapy-service.upstart => debian/scrapyd.upstart rename : extras/scrapy.tac => extras/scrapyd.tac	2010-09-03 15:54:42 -03:00

42 Commits