Pablo Hoffman
495152bd50
disabled verbose depth stats collection by default, added DEPTH_STATS_VERBOSE setting to enable it
2011-05-18 11:04:48 -03:00
Pablo Hoffman
accb6ed830
dump stats to log by default (ie. change default value of STATS_DUMP to True)
2011-05-17 22:42:05 -03:00
Pablo Hoffman
315457c2ef
added support for -a option to runspider command (like it works with crawl command)
2011-05-17 22:07:49 -03:00
Pablo Hoffman
ab6a4d053f
minor code improvement
2011-05-16 09:56:32 -03:00
Pablo Hoffman
d29eccba56
AutoThrottle: added missing line to connect spider_closed hanlder
2011-05-16 09:42:44 -03:00
Pablo Hoffman
403dc536e2
improved documentation of AutoThrottle extension
2011-05-15 06:07:26 -03:00
Pablo Hoffman
2b933a4a8c
added AutoThrottle extension (still under testing, not yet enabled by default)
2011-05-15 05:39:58 -03:00
Pablo Hoffman
bd8d7f5cf4
collect download latencies in 'download_latency' request/response meta key
2011-05-15 05:24:01 -03:00
Pablo Hoffman
668dfcabf3
send the response_received signal from the engine, after tying it with the corresponding request
2011-05-15 05:20:14 -03:00
Pablo Hoffman
f9aa819b06
scraper: minor performance improvement by using collections.deque() as in downloader (see previous commit)
2011-05-14 21:50:14 -03:00
Pablo Hoffman
079de67719
downloader: minor performance improvement by using collections.deque() to avoid the list.pop(0) call which is O(n)
2011-05-14 21:47:25 -03:00
Pablo Hoffman
7e62a0a1a1
Downloader: Added support for dynamically adjusting download delay and maximum concurrent requests
2011-05-14 21:35:46 -03:00
Pablo Hoffman
bac46ba438
make sure Request.method is always str
2011-05-02 01:11:19 -03:00
Pablo Hoffman
afa23688c6
fixed bug in scrapy.http.Headers: values weren't being encoded to str when passed as lists
2011-05-01 19:39:13 -03:00
Pablo Hoffman
7f97259ba7
added w3lib to requirements, in installation guide
2011-05-01 11:14:57 -03:00
Pablo Hoffman
718428c0ab
debian/control: added python-setuptools to Recommends, because it's need by 'scrapy deploy' command
2011-05-01 11:00:02 -03:00
Pablo Hoffman
d08281a44f
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-04-30 01:35:43 -03:00
Pablo Hoffman
4a83167698
fixed small doc typo
2011-04-30 01:35:30 -03:00
Pablo Hoffman
cf572bb642
removed experimental examples
2011-04-28 18:07:23 -03:00
Pablo Hoffman
bb2b67c862
updated tutorial to use 'dmoz' as the name of the spider instead of 'dmoz.org', so that it's more similar to the dirbot example project
2011-04-28 09:31:57 -03:00
Pablo Hoffman
bf73002428
removed googledir example, replaced by dirbot project on github. updated docs accordingly
2011-04-28 02:28:39 -03:00
Pablo Hoffman
b12dd76bb8
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-04-25 09:31:18 -03:00
Pablo Hoffman
678f08bc1b
added warning about using 'parse' as callback in crawl spider rules
2011-04-25 09:30:42 -03:00
Pablo Hoffman
18d303b5f1
ported internal scrapy.utils imports to w3lib
2011-04-19 01:33:52 -03:00
Pablo Hoffman
fcc8d73840
Removed scrapy.contrib.ibl module (and submodules). They have been moved to a new library "scrapely". See https://github.com/scrapy/scrapely
2011-04-19 01:04:22 -03:00
Pablo Hoffman
ebcbb9f453
debian: added python-w3lib package to dependencies
2011-04-19 00:55:08 -03:00
Pablo Hoffman
b10f4fae35
Moved several functions from scrapy.utils.{http,markup,multipart,response,url} (and their tests) to a new library called 'w3lib'. Scrapy will now depend on w3lib.
2011-04-18 22:37:19 -03:00
Pablo Hoffman
ad496eb3b6
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-04-14 12:36:27 -03:00
Pablo Hoffman
ecb4f44cbc
Added clarification on how to work with local settings and scrapy deploy
2011-04-14 12:36:09 -03:00
Pablo Hoffman
6f262a198c
Added IOError to the list of exceptions to retry in the RetryMiddleware
2011-04-12 18:12:36 -03:00
Pablo Hoffman
7c49e8679c
fixed typo
2011-04-07 02:04:42 -03:00
Pablo Hoffman
3ee2c94e93
Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it
2011-04-06 14:54:48 -03:00
Pablo Hoffman
8a5c08a6bc
added join_multivalued parameter to CsvItemExporter
2011-03-24 13:15:52 -03:00
Pablo Hoffman
84dee1f77f
removed unused function
2011-03-24 09:03:57 -03:00
Pablo Hoffman
3954e600ca
added DBM storage backend for HTTP cache
2011-03-23 21:32:02 -03:00
Pablo Hoffman
60f6a9b054
moved scrapyd python module to scrapy debian package. left scrapyd package only for installing service (upstart script) and scrapy user
2011-03-23 15:45:40 -03:00
Shane Evans
407f7f2d65
fix minor error in IBL tests and post-processing nested annoations
2011-03-11 21:58:47 +00:00
Shane Evans
bc6eee71e3
refactor IBL extraction to allow processing parsed data
...
--HG--
extra : rebase_source : 1a0ec4322702288f6e996d384d45a36deede3868
2011-03-11 20:02:29 +00:00
Pablo Hoffman
9591413d9d
added Jochen Maes to AUTHORS
2011-03-09 14:24:17 -02:00
Jochen Maes
47a7f154ab
Add listjobs.json to Scrapyd API
...
You can use listjobs.json with project=<projectname> to get a list of projects that are running currently.
It returns a list of jobs with spidername and job-id.
Signed-off-by: Jochen Maes <jochen.maes@sejo.be>
---
scrapyd/webservice.py | 9 +++++++++
scrapyd/website.py | 1 +
2 files changed, 10 insertions(+), 0 deletions(-)
2011-03-09 14:22:10 -02:00
Pablo Hoffman
99033d91c3
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-03-09 12:41:26 -02:00
Pablo Hoffman
36431a1439
Silenced confusing sqlite3.ProgrammingError exception. For more info see: http://twistedmatrix.com/trac/ticket/4040
2011-03-09 12:39:24 -02:00
Shane Evans
5cae22b665
add nofollow to Link object
2011-02-25 12:55:51 -02:00
Pablo Hoffman
cfd11df539
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-02-24 15:28:57 -02:00
Pablo Hoffman
8f7e163b04
Fixed wrong method name in downloader middleware documentation
2011-02-24 15:26:32 -02:00
Shane Evans
32fa2add75
style fix to ibl contrib
2011-02-24 14:21:23 -02:00
Pablo Hoffman
fe9febe2b1
added --build-only option to deploy command, to build the egg without deploying it
2011-02-23 18:10:16 -02:00
Shane Evans
af4db2767c
Automated merge with ssh://hg.scrapy.org/scrapy-0.12
2011-02-16 18:31:49 -02:00
Shane Evans
74413ff989
prevent incorrect assertion error possible when spiders are closing
2011-02-16 18:30:50 -02:00
Daniel Grana
c55355642c
fix FAQ typos reported by marlun_ at #scrapy IRC channel
2011-02-16 08:57:42 -02:00