Daniel Graña
eb8e98461d
Add some comments and references to github issues. closes #82
2012-01-25 19:15:59 -02:00
Daniel Graña
2840865746
Allow overriding ClientContextFactory and enable SSL bug workarounds by default. refs #82
2012-01-25 18:29:54 -02:00
Pablo Hoffman
a0f41f100c
Merge pull request #80 from kalessin/master
...
autothrottle code improvements (download delay + style)
2012-01-16 12:24:04 -08:00
Martin Olveyra
1c6a5a9374
some minor improvements in autothrottle code style
2012-01-16 18:19:31 -02:00
Martin Olveyra
59cf9d9b1a
allow to set minimal download delay for autothrottle extension. also
...
limit download delay to a minimal of spider.download_delay if given
2012-01-16 18:16:24 -02:00
Pablo Hoffman
fc52d8d5cf
Merge pull request #79 from seriyps/master
...
~10x speed-up for libxml2 XPathSelector
2012-01-15 19:32:59 -08:00
Сергей Прохоров
a6a2120715
Speed-up libxml2 XPathSelector
2012-01-15 03:09:00 +04:00
Pablo Hoffman
85e2b493b4
make scrapyd debian package dependent on the same (or higher) version of scrapy package
2012-01-13 10:55:20 -02:00
Pablo Hoffman
2ee523b14a
scrapyd: added Items link to completed jobs table
2012-01-12 17:43:44 -02:00
Pablo Hoffman
8a45dd121b
scrapyd: fixed issue with ubuntu package: /var/lib/scrapyd/items dir not being created by default
2012-01-12 17:17:50 -02:00
Pablo Hoffman
ea77342b55
updated versioning doc according to recent changes
2012-01-05 11:50:28 -02:00
Pablo Hoffman
0b0bce7f3c
scrapyd: added cancel.json and listjobs.json api methods to documentation
2012-01-05 11:23:25 -02:00
Pablo Hoffman
8f42633a94
scrapyd: added clarification about how to disable items feeds generation
2012-01-05 11:20:50 -02:00
Pablo Hoffman
531fa95f98
scrapyd: removed redundant .scrapy component from paths when using scrapyd in 'scrapy server' mode
2012-01-03 23:13:56 -02:00
Pablo Hoffman
dbda33efa6
scrapyd: added support for storing items by default
...
Items are stored the same way as logs, in jsonlines format.
Also renamed logs_to_keep setting to jobs_to_keep.
2012-01-03 23:08:54 -02:00
Pablo Hoffman
0693694bcf
scrapyd: fixed documentation link
2012-01-03 23:02:25 -02:00
Pablo Hoffman
485bc180df
scrapyd: improved web interface to also show pending and finished jobs
2012-01-03 23:02:25 -02:00
Pablo Hoffman
f07e968a93
scrapyd: added new cancel.json api to cancel pending/running jobs
2012-01-03 23:02:19 -02:00
Pablo Hoffman
10ed28b9d0
SitemapSpider: added support for sitemap urls ending in .xml and .xml.gz, even if they have a wrong content type
2012-01-03 12:17:17 -02:00
Pablo Hoffman
fb44f303a9
extras/makedeb.py: no longer obtaining version from git
2012-01-02 13:28:51 -02:00
Pablo Hoffman
db92bc8c40
bumped version to 0.15.1, mainly to avoid package upgrade issues with new versioning based on git describe
0.15.1
2012-01-02 13:07:15 -02:00
Pablo Hoffman
b6220b8e95
use git describe for building version from git, and removed support for building version from hg
2012-01-02 13:05:26 -02:00
Pablo Hoffman
1f87d7ff4b
Merge pull request #75 from darkrho/httpcache-stats
...
tests: fixed httpcache testcase.
2012-01-01 09:30:31 -08:00
Rolando Espinoza La fuente
93eb5b32dd
tests: fixed httpcache testcase.
2011-12-30 23:11:16 -04:00
Pablo Hoffman
a36c8691d7
Merge pull request #74 from darkrho/httpcache-stats
...
httpcache: keep stats of cache hit/miss/store and don't store already cached response
2011-12-30 18:15:43 -08:00
Rolando Espinoza La fuente
503fdf39fe
httpcache: don't store already cached response.
2011-12-30 14:21:07 -04:00
Rolando Espinoza La fuente
f2966eebc7
httpcache: keep stats of cache hit/miss/store.
2011-12-30 14:10:18 -04:00
Pablo Hoffman
9064188035
removed unused import
2011-12-28 15:21:10 -02:00
Pablo Hoffman
150f82e600
some some changes to scrapyd listjobs.json api:
...
* the api is now a GET instead of POST (for consistency)
* the api also returns pending and finished jobs, in addition to running
ones
* only the last 100 finished jobs are kept (can be changed through the
finished_to_keep setting)
2011-12-28 15:17:52 -02:00
Pablo Hoffman
1dfbe5d7a8
scrapyd.webservice: relocate ListJobs resource for better consistency
2011-12-28 14:36:24 -02:00
Pablo Hoffman
f214c94912
CrawlSpider: don't follow links from non-HTML responses
2011-12-27 21:22:38 -02:00
Pablo Hoffman
bda9c97c78
Merge pull request #48 from simonratner/delete-logs-by-mtime
...
Delete old logs based on file mtime.
2011-12-23 13:17:00 -08:00
Pablo Hoffman
0be421fbf0
fixed reference to tutorial directory
2011-12-23 18:57:11 -02:00
Pablo Hoffman
41fd3c4f6c
doc: removed duplicated callback argument from Request.replace()
2011-12-23 15:55:46 -02:00
Pablo Hoffman
0eeff76227
fixed formatting of scrapyd doc
2011-12-20 03:18:37 -02:00
Daniel Graña
64ba6e7982
Dump stacks for all running threads and fix engine status dumped by StackTraceDump extension
2011-12-15 17:05:45 -02:00
Pablo Hoffman
023232f7d4
added comment about why we disable ssl on boto images upload
2011-12-15 14:23:45 -02:00
Daniel Graña
aea060e144
Merge branch '0.14'
2011-12-14 13:09:00 -02:00
Daniel Graña
63d583d9be
SSL handshaking hangs when doing too many parallel connections to S3
2011-12-14 13:06:12 -02:00
Daniel Graña
bcb31988f2
change tutorial to follow changes on dmoz site
2011-12-14 13:03:31 -02:00
Rolando Espinoza La fuente
98f3f87530
Avoid _disconnectedDeferred AttributeError exception in Twisted>=11.1.0
2011-12-12 18:11:43 +00:00
Pablo Hoffman
9b84914736
Merge pull request #62 from darkrho/issue-58
...
Avoid _disconnectedDeferred AttributeError exception in Twisted>=11.1.0
2011-12-12 10:10:08 -08:00
Rolando Espinoza La fuente
9c04945785
Avoid _disconnectedDeferred AttributeError exception in Twisted>=11.1.0
2011-12-04 23:42:41 -04:00
Martin Olveyra
175a4b5957
allow spider to set autothrottle max concurrency
2011-12-02 15:50:53 -02:00
Pablo Hoffman
4fe42dc6fe
Merge pull request #61 from kalessin/master
...
allow spider to set autothrottle max concurrency
2011-12-01 12:47:14 -08:00
Martin Olveyra
7b0184eb59
allow spider to set autothrottle max concurrency
2011-12-01 18:44:26 -02:00
Pablo Hoffman
e29f9e5b24
bumped version to 0.15
0.15.0
2011-11-17 14:44:34 -02:00
Pablo Hoffman
0f649f0e30
bumped version to 0.14
0.14.0
2011-11-17 14:43:40 -02:00
Pablo Hoffman
6d13de4366
fixed "No free spider slots" bug when calling fetch() from scrapy shell
2011-11-14 20:03:43 -02:00
Pablo Hoffman
d37a788d22
improve handling of KeyError exception when creating spiders in spider manager. closes issue 49
2011-11-14 17:00:25 -02:00