Pablo Hoffman
531fa95f98
scrapyd: removed redundant .scrapy component from paths when using scrapyd in 'scrapy server' mode
2012-01-03 23:13:56 -02:00
Pablo Hoffman
dbda33efa6
scrapyd: added support for storing items by default
...
Items are stored the same way as logs, in jsonlines format.
Also renamed logs_to_keep setting to jobs_to_keep.
2012-01-03 23:08:54 -02:00
Pablo Hoffman
0693694bcf
scrapyd: fixed documentation link
2012-01-03 23:02:25 -02:00
Pablo Hoffman
485bc180df
scrapyd: improved web interface to also show pending and finished jobs
2012-01-03 23:02:25 -02:00
Pablo Hoffman
f07e968a93
scrapyd: added new cancel.json api to cancel pending/running jobs
2012-01-03 23:02:19 -02:00
Pablo Hoffman
9064188035
removed unused import
2011-12-28 15:21:10 -02:00
Pablo Hoffman
150f82e600
some some changes to scrapyd listjobs.json api:
...
* the api is now a GET instead of POST (for consistency)
* the api also returns pending and finished jobs, in addition to running
ones
* only the last 100 finished jobs are kept (can be changed through the
finished_to_keep setting)
2011-12-28 15:17:52 -02:00
Pablo Hoffman
1dfbe5d7a8
scrapyd.webservice: relocate ListJobs resource for better consistency
2011-12-28 14:36:24 -02:00
Simon Ratner
7232c31f78
Delete old logs based on file mtime.
2011-11-11 11:53:00 -08:00
Pablo Hoffman
bff3d31469
scrapyd: updated schedule.json response format
2011-09-04 09:29:24 -03:00
Pablo Hoffman
a1dbc62b45
removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead)
2011-09-02 18:27:39 -03:00
Pablo Hoffman
76af0cdd44
updated documentation and code to use -s instead of --set option
2011-09-01 14:35:37 -03:00
Pablo Hoffman
91b9d89ffd
moved scrapy.utils.sqlite to scrapyd.sqlite
...
--HG--
rename : scrapy/utils/sqlite.py => scrapyd/sqlite.py
rename : scrapy/tests/test_utils_sqlite.py => scrapyd/tests/test_sqlite.py
2011-08-27 01:20:57 -03:00
Pablo Hoffman
075a2d62d3
scrapyd: added support for passing custom settings to schedule.json
2011-08-27 01:02:14 -03:00
Pablo Hoffman
75e2c3eb33
moved spider queues to scrapyd
...
--HG--
rename : scrapy/spiderqueue.py => scrapyd/spiderqueue.py
rename : scrapy/tests/test_spiderqueue.py => scrapyd/tests/test_spiderqueue.py
2011-07-19 19:39:27 -03:00
Pablo Hoffman
59acb129e5
scrapyd activate_egg(): don't override SCRAPY_SETTINGS_MODULE envvar if already set
2011-06-15 19:35:03 -03:00
Pablo Hoffman
80b557849a
fixed test broken in previous commit
2011-06-12 02:55:21 -03:00
Pablo Hoffman
0d5399d0bf
fixed scrapyd tests on win32. closes #295
2011-06-12 02:46:41 -03:00
Pablo Hoffman
07df0edf74
scrapyd.webservice: use twisted.web multipart data parsing, to simplify code. closes #324
2011-06-08 14:17:04 -03:00
Jochen Maes
47a7f154ab
Add listjobs.json to Scrapyd API
...
You can use listjobs.json with project=<projectname> to get a list of projects that are running currently.
It returns a list of jobs with spidername and job-id.
Signed-off-by: Jochen Maes <jochen.maes@sejo.be>
---
scrapyd/webservice.py | 9 +++++++++
scrapyd/website.py | 1 +
2 files changed, 10 insertions(+), 0 deletions(-)
2011-03-09 14:22:10 -02:00
Pablo Hoffman
65fc2fbd1f
Set CONCURRENT_SPIDERS=1 in Scrapyd to force one spider per process
2011-02-04 13:30:01 -02:00
Pablo Hoffman
048044c1f8
A couple of changes to fix #303 :
...
* improved detection of inside-project environments
* make list command faster (by only instantiating the spider manger)
* print a warning when extensions (middlewares, etc) are disabled with a message on NotConfigured exception
* assert that scrapy configuration hasn't been loaded in scrapyd.runner
* simplified IgnoreRequest exception, to avoid loading settings when importing scrapy.exceptions
* added test to make sure certain modules don't cause scrapy.conf module to be
loaded, to ensure the scrapyd runner bootstraping performs properly
2011-01-05 15:59:43 -02:00
Pablo Hoffman
3d8b368fc6
scrapyd: use runner from config (if not specified) on get_spider_list()
2010-12-28 11:16:58 -02:00
Pablo Hoffman
fa644f7a5e
Some simplifications to Scrapyd architecture and internals:
...
- launcher no longer knows about egg storage
- removed get_spider_list_from_eggifile() file and replaced by simpler
get_spider_list() which doesn't receive en egg file as argument
- changed "egg runner" name to just "runner" to reflect the fact that it
doesn't necesarilly run eggs (though it does in the default case)
--HG--
rename : scrapyd/eggrunner.py => scrapyd/runner.py
2010-12-27 16:22:32 -02:00
Pablo Hoffman
9cd649b3a0
scrapyd: populate SCRAPY_SPIDER and SCRAPY_JOB environment variables
2010-12-26 19:32:56 -02:00
Pablo Hoffman
1c8d74eb5b
scrapyd: populate SCRAPY_SLOT environment variable with the scrapyd slot number
2010-12-24 12:47:59 -02:00
Pablo Hoffman
b19ff21acd
scrapyd: added support for deferred spider queues
2010-12-10 15:55:40 -02:00
Pablo Hoffman
831dc818d6
scrapyd: added more information webui homepage
2010-11-30 18:43:59 -02:00
Pablo Hoffman
a3d30c35fe
scrapyd: log url where web console can be accesed
2010-11-30 17:58:34 -02:00
Pablo Hoffman
823fd9822c
scrapyd: fixed bug discovering the current project scrapy.cfg file
2010-11-30 16:19:17 -02:00
Pablo Hoffman
7b84591ea9
added command for starting a scrapyd server for the current project
2010-11-30 15:52:15 -02:00
Pablo Hoffman
5a46ce47ee
scrapyd: add extra_sources consturctor argument, and also read scrapyd configuratoin from current project's scrapy.cfg file
2010-11-30 15:47:05 -02:00
Pablo Hoffman
c02d6db6a3
scrapyd: force application to receive config as argument
2010-11-30 15:46:24 -02:00
Pablo Hoffman
85890a5092
scrapyd: log process logfile when process starts/finishes
2010-11-30 15:45:42 -02:00
Pablo Hoffman
5c4f562ec4
scrapyd: changed keys used in poller message to _project, _spider, _job, and added link to log file in web ui
2010-11-30 13:03:20 -02:00
Pablo Hoffman
df54ed0041
Some Scrapyd enhancements:
...
* added minimal web ui
* return unique id per job (spider scheduled)
* store one log per spider run (job) and rotate them, keeping the last N logs (where N is configurable through settings)
2010-11-30 02:26:31 -02:00
Pablo Hoffman
46e5d694e6
Scrapyd: return project and version in addversion.json
2010-11-29 17:22:28 -02:00
Pablo Hoffman
bbffa59497
Some changes to Scrapyd:
...
* Always start one process per spider
* Added max_proc_per_cpu option (defaults to 4)
* Return the number of spiders (instead of a list of them) in schedule.json
2010-11-29 17:19:05 -02:00
Pablo Hoffman
5bdffadbe3
Simplified get_spider_list_from_eggfile() function now that it doesn't need to chdir to a custom directory (Scrapy now works when it's unable to create the SQLite database)
2010-11-05 11:48:12 -02:00
Pablo Hoffman
31bbcc9476
Raise error when egg is corrupt in activate_egg(). Use a more descriptive name for temporary dirs in get_spider_list_from_eggfile(). Make scrapyd webservice pass egg_runner to get_spider_list_from_eggfile()
2010-11-05 11:24:33 -02:00
Pablo Hoffman
de4909faca
get_spider_list_from_eggfile(): more improvements to error messages, and support passing eggruner module as argument
2010-11-04 18:56:11 -02:00
Pablo Hoffman
7ba972d8cf
get_spider_list_from_eggfile(): fail if unable to extract spider list
2010-11-04 16:27:47 -02:00
Pablo Hoffman
a8be54a8ea
scrapyd: make Environment tests independent of the current OS environment
...
--HG--
rename : scrapyd/tests/test_envion.py => scrapyd/tests/test_environ.py
2010-10-27 06:49:15 -02:00
Pablo Hoffman
a3a108dc71
fixed some compatibility issues with python 2.5 in scrapyd
2010-10-26 17:21:43 -02:00
Pablo Hoffman
a4639ffb06
Removed hacky SCRAPY_SETTINGS_DISABLED environment variable
2010-09-22 16:08:18 -03:00
Pablo Hoffman
f3769651af
Refactored Scrapyd code to fix a couple of bugs that ocurred when running projects without eggs
2010-09-22 01:04:15 -03:00
Pablo Hoffman
4c61df7abb
get_spider_list_from_eggfile(): fixed bug when SCRAPY_SETTINGS_DISABLED is set
2010-09-20 08:47:55 -03:00
Pablo Hoffman
400c4134af
Make scrapyd.eggutils compatible with Python 2.5 and added tests for get_spider_list_from_eggfile() function ( closes #242 )
2010-09-19 21:08:27 -03:00
Pablo Hoffman
4cecbcdc5b
Fixed bug in Scrapyd launcher when running projects without eggs. Refs #238
2010-09-15 21:03:43 -03:00
Pablo Hoffman
c559b06a85
Removed unused import
2010-09-14 01:53:05 -03:00