1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 02:43:41 +00:00

69 Commits

Author SHA1 Message Date
Artem Bogomyagkov
5dde26d3c8 simplified code for finished jobs times, updated docs for scrapyd 2012-09-12 11:10:49 +03:00
Artem Bogomyagkov
a42a8cafef added start and stop times for scrapyd list jobs web service 2012-09-11 21:06:36 +03:00
Pablo Hoffman
241abcb8aa scrapyd: log errors in API calls to the scrapyd log 2012-09-04 14:22:21 -03:00
Pablo Hoffman
c6435c5aa7 restored scrapyd log message 2012-08-31 18:49:20 -03:00
Daniel Graña
dcef7b03c1 format log lines lazily in case they are dropped by loglevels 2012-08-31 17:14:35 -03:00
Pablo Hoffman
94b40162a9 fixed tests to work on windows 2012-08-30 11:24:29 -03:00
Alexandr N Zamaraev (aka tonal)
cf968c328f Configure json services members 2012-06-25 12:50:48 +07:00
Pablo Hoffman
179e3810dc fixed links to doc. closes #150 2012-06-24 01:00:33 -03:00
Alexandr N Zamaraev (aka tonal)
3bddedfc6d Add launcher class to config 2012-06-19 22:01:53 +07:00
Pablo Hoffman
b33303779a scrapyd.launcher: make SCRAPY_LOG_FILE and SCRAPY_FEED_URI optional 2012-05-21 14:29:15 -03:00
Pablo Hoffman
58e88ed246 scrapyd: do not set SCRAPY_FEED_URI/SCRAPY_LOG_FILE if items_dir/logs_dir settings are not set 2012-05-08 17:43:00 -03:00
Pablo Hoffman
35fb01156e removed some obsolete remaining code related to sqlite support in scrapy 2012-03-16 11:55:55 -03:00
Pablo Hoffman
e521da2e2f Dropped support for Python 2.5. See: http://blog.scrapy.org/scrapy-dropping-support-for-python-25 2012-03-01 08:18:12 -02:00
lostsnow
5afe4f50c1 scrapyd: support bind to a specific ip address 2012-02-29 13:47:40 +08:00
Pablo Hoffman
2ee523b14a scrapyd: added Items link to completed jobs table 2012-01-12 17:43:44 -02:00
Pablo Hoffman
531fa95f98 scrapyd: removed redundant .scrapy component from paths when using scrapyd in 'scrapy server' mode 2012-01-03 23:13:56 -02:00
Pablo Hoffman
dbda33efa6 scrapyd: added support for storing items by default
Items are stored the same way as logs, in jsonlines format.

Also renamed logs_to_keep setting to jobs_to_keep.
2012-01-03 23:08:54 -02:00
Pablo Hoffman
0693694bcf scrapyd: fixed documentation link 2012-01-03 23:02:25 -02:00
Pablo Hoffman
485bc180df scrapyd: improved web interface to also show pending and finished jobs 2012-01-03 23:02:25 -02:00
Pablo Hoffman
f07e968a93 scrapyd: added new cancel.json api to cancel pending/running jobs 2012-01-03 23:02:19 -02:00
Pablo Hoffman
9064188035 removed unused import 2011-12-28 15:21:10 -02:00
Pablo Hoffman
150f82e600 some some changes to scrapyd listjobs.json api:
* the api is now a GET instead of POST (for consistency)
* the api also returns pending and finished jobs, in addition to running
  ones
* only the last 100 finished jobs are kept (can be changed through the
  finished_to_keep setting)
2011-12-28 15:17:52 -02:00
Pablo Hoffman
1dfbe5d7a8 scrapyd.webservice: relocate ListJobs resource for better consistency 2011-12-28 14:36:24 -02:00
Simon Ratner
7232c31f78 Delete old logs based on file mtime. 2011-11-11 11:53:00 -08:00
Pablo Hoffman
bff3d31469 scrapyd: updated schedule.json response format 2011-09-04 09:29:24 -03:00
Pablo Hoffman
a1dbc62b45 removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead) 2011-09-02 18:27:39 -03:00
Pablo Hoffman
76af0cdd44 updated documentation and code to use -s instead of --set option 2011-09-01 14:35:37 -03:00
Pablo Hoffman
91b9d89ffd moved scrapy.utils.sqlite to scrapyd.sqlite
--HG--
rename : scrapy/utils/sqlite.py => scrapyd/sqlite.py
rename : scrapy/tests/test_utils_sqlite.py => scrapyd/tests/test_sqlite.py
2011-08-27 01:20:57 -03:00
Pablo Hoffman
075a2d62d3 scrapyd: added support for passing custom settings to schedule.json 2011-08-27 01:02:14 -03:00
Pablo Hoffman
75e2c3eb33 moved spider queues to scrapyd
--HG--
rename : scrapy/spiderqueue.py => scrapyd/spiderqueue.py
rename : scrapy/tests/test_spiderqueue.py => scrapyd/tests/test_spiderqueue.py
2011-07-19 19:39:27 -03:00
Pablo Hoffman
59acb129e5 scrapyd activate_egg(): don't override SCRAPY_SETTINGS_MODULE envvar if already set 2011-06-15 19:35:03 -03:00
Pablo Hoffman
80b557849a fixed test broken in previous commit 2011-06-12 02:55:21 -03:00
Pablo Hoffman
0d5399d0bf fixed scrapyd tests on win32. closes #295 2011-06-12 02:46:41 -03:00
Pablo Hoffman
07df0edf74 scrapyd.webservice: use twisted.web multipart data parsing, to simplify code. closes #324 2011-06-08 14:17:04 -03:00
Jochen Maes
47a7f154ab Add listjobs.json to Scrapyd API
You can use listjobs.json with project=<projectname> to get a list of projects that are running currently.
It returns a list of jobs with spidername and job-id.

Signed-off-by: Jochen Maes <jochen.maes@sejo.be>
---
 scrapyd/webservice.py |    9 +++++++++
 scrapyd/website.py    |    1 +
 2 files changed, 10 insertions(+), 0 deletions(-)
2011-03-09 14:22:10 -02:00
Pablo Hoffman
65fc2fbd1f Set CONCURRENT_SPIDERS=1 in Scrapyd to force one spider per process 2011-02-04 13:30:01 -02:00
Pablo Hoffman
048044c1f8 A couple of changes to fix #303:
* improved detection of inside-project environments
* make list command faster (by only instantiating the spider manger)
* print a warning when extensions (middlewares, etc) are disabled with a message on NotConfigured exception
* assert that scrapy configuration hasn't been loaded in scrapyd.runner
* simplified IgnoreRequest exception, to avoid loading settings when importing scrapy.exceptions
* added test to make sure certain modules don't cause scrapy.conf module to be
  loaded, to ensure the scrapyd runner bootstraping performs properly
2011-01-05 15:59:43 -02:00
Pablo Hoffman
3d8b368fc6 scrapyd: use runner from config (if not specified) on get_spider_list() 2010-12-28 11:16:58 -02:00
Pablo Hoffman
fa644f7a5e Some simplifications to Scrapyd architecture and internals:
- launcher no longer knows about egg storage
- removed get_spider_list_from_eggifile() file and replaced by simpler
  get_spider_list() which doesn't receive en egg file as argument
- changed "egg runner" name to just "runner" to reflect the fact that it
  doesn't necesarilly run eggs (though it does in the default case)

--HG--
rename : scrapyd/eggrunner.py => scrapyd/runner.py
2010-12-27 16:22:32 -02:00
Pablo Hoffman
9cd649b3a0 scrapyd: populate SCRAPY_SPIDER and SCRAPY_JOB environment variables 2010-12-26 19:32:56 -02:00
Pablo Hoffman
1c8d74eb5b scrapyd: populate SCRAPY_SLOT environment variable with the scrapyd slot number 2010-12-24 12:47:59 -02:00
Pablo Hoffman
b19ff21acd scrapyd: added support for deferred spider queues 2010-12-10 15:55:40 -02:00
Pablo Hoffman
831dc818d6 scrapyd: added more information webui homepage 2010-11-30 18:43:59 -02:00
Pablo Hoffman
a3d30c35fe scrapyd: log url where web console can be accesed 2010-11-30 17:58:34 -02:00
Pablo Hoffman
823fd9822c scrapyd: fixed bug discovering the current project scrapy.cfg file 2010-11-30 16:19:17 -02:00
Pablo Hoffman
7b84591ea9 added command for starting a scrapyd server for the current project 2010-11-30 15:52:15 -02:00
Pablo Hoffman
5a46ce47ee scrapyd: add extra_sources consturctor argument, and also read scrapyd configuratoin from current project's scrapy.cfg file 2010-11-30 15:47:05 -02:00
Pablo Hoffman
c02d6db6a3 scrapyd: force application to receive config as argument 2010-11-30 15:46:24 -02:00
Pablo Hoffman
85890a5092 scrapyd: log process logfile when process starts/finishes 2010-11-30 15:45:42 -02:00
Pablo Hoffman
5c4f562ec4 scrapyd: changed keys used in poller message to _project, _spider, _job, and added link to log file in web ui 2010-11-30 13:03:20 -02:00