Rolando Espinoza La fuente
9c04945785
Avoid _disconnectedDeferred AttributeError exception in Twisted>=11.1.0
2011-12-04 23:42:41 -04:00
Martin Olveyra
175a4b5957
allow spider to set autothrottle max concurrency
2011-12-02 15:50:53 -02:00
Pablo Hoffman
4fe42dc6fe
Merge pull request #61 from kalessin/master
...
allow spider to set autothrottle max concurrency
2011-12-01 12:47:14 -08:00
Martin Olveyra
7b0184eb59
allow spider to set autothrottle max concurrency
2011-12-01 18:44:26 -02:00
Pablo Hoffman
e29f9e5b24
bumped version to 0.15
0.15.0
2011-11-17 14:44:34 -02:00
Pablo Hoffman
0f649f0e30
bumped version to 0.14
0.14.0
2011-11-17 14:43:40 -02:00
Pablo Hoffman
6d13de4366
fixed "No free spider slots" bug when calling fetch() from scrapy shell
2011-11-14 20:03:43 -02:00
Pablo Hoffman
d37a788d22
improve handling of KeyError exception when creating spiders in spider manager. closes issue 49
2011-11-14 17:00:25 -02:00
Pablo Hoffman
36df87b4de
ignore meta-refresh redirects embedded in <script> tags. related to issue 18
2011-11-14 16:54:13 -02:00
Pablo Hoffman
ec1ef0235f
ignore meta-refresh redirect when embedded inside <noscript> tag. closes issue 18
2011-11-14 16:25:22 -02:00
Simon Ratner
7232c31f78
Delete old logs based on file mtime.
2011-11-11 11:53:00 -08:00
Pablo Hoffman
6cc40dc062
fixed bug in MEMUSAGE_NOTIFY_MAIL setting
2011-11-08 11:51:26 -02:00
Pablo Hoffman
37ad4f8791
added support for ajax crawleable urls
2011-10-28 16:33:12 -02:00
Pablo Hoffman
992af8d38f
ubuntu repos: added support for oneiric release
2011-10-25 14:26:38 -02:00
Pablo Hoffman
f4821a123d
Do not raise PartialDownloadError if Content-Length doesn't match the body size. This fixes the error reported in: https://groups.google.com/d/topic/scrapy-users/FQ25O3KPQuU/discussion
2011-10-25 13:04:58 -02:00
Pablo Hoffman
c085f81641
removed deprecation warning for spider.download_timeout attribute
2011-10-25 04:26:37 -02:00
Pablo Hoffman
c38c49d56a
fixed PickeItemExporter bug, added unittest, and added pickle to suported feed exports formats
2011-10-25 02:36:51 -02:00
Pablo Hoffman
8bdf288428
made scrapyd doc more version agnostic
2011-10-23 05:29:54 -02:00
Pablo Hoffman
64b8e2648e
added support for using '-' in scrapy crawl -o, to dump items to standard output
2011-10-23 03:06:59 -02:00
Pablo Hoffman
028bf3386d
feed exports: removed dependency on file.tell() method, so that stdout output works
2011-10-23 03:05:06 -02:00
Pablo Hoffman
10ced29e18
changed feed exports storage api so that file/stdio outputs directly without using a temporary file
2011-10-23 02:49:17 -02:00
Pablo Hoffman
ade5efdc61
added -o option to scrapy crawl, a convenient shortcut for using feed exports
2011-10-22 20:53:49 -02:00
Pablo Hoffman
13cd9a1b0f
remove deprecation warning for spider.user_agent attribute
2011-10-22 19:28:12 -02:00
Pablo Hoffman
43b79afc9c
remove usage of assertLess() which is only available on python 2.7+
2011-09-26 12:21:52 -03:00
Pablo Hoffman
431441cb52
updated documentation to remove references to old issue tracker and mercurial repos
2011-09-25 13:06:24 -03:00
Pablo Hoffman
ce03ccd4ec
updated documentation about DEPTH_PRIORITY and DFO/BFO crawls
2011-09-23 13:22:25 -03:00
Pablo Hoffman
f850a44784
Some changes to persistent scheduler after some initial usage feedback:
...
* added LIFO queues, in addition to the original FIFO queues
* use LIFO queues (instead of FIFO queues) by default, since they resemble DFO
better which is a more convenient crawling order for most cases
* do not adjust the priority based on depth by default (DEPTH_PRIORITY = 0)
If someone does need to use strict BFO order, it can be by done by setting:
DEPTH_PRIORITY = 1
SCHEDULER_DISK_QUEUE = 'scrapy.squeue.PickleFifoDiskQueue'
SCHEDULER_MEMORY_QUEUE = 'scrapy.squeue.FifoMemoryQueue'
2011-09-23 13:03:07 -03:00
Pablo Hoffman
cfddc314ce
make SpiderState extension always available, regardless of whether there is a job dir, and make sure it always set spider.state attribute, for consistency between in-memory and on-disk runs
2011-09-23 12:56:44 -03:00
Pablo Hoffman
2559f211a8
Merge pull request #42 from noplay/deploy-git-version
...
scrapy deploy support git version
2011-09-22 14:42:41 -07:00
Julien Duponchelle
b7c436343a
scrapy deploy support git version
2011-09-21 22:17:08 +02:00
Pablo Hoffman
0f0783e525
Merge pull request #39 from kalessin/master
...
autothrottle fix
2011-09-21 13:11:28 -07:00
Martin Olveyra
dcc50201e3
fix autothrottle extension for working with new downloader
2011-09-21 17:07:21 -03:00
Pablo Hoffman
b003687389
moved rpm-install.sh to extras/
2011-09-18 06:08:28 -03:00
Daniel Graña
a5c15004f9
Merge branch '0.12'
2011-09-18 01:08:04 -03:00
Daniel Graña
fac5e5ea21
migrate hgignore to gitignore
2011-09-18 01:05:31 -03:00
Pablo Hoffman
d788ba8a44
new mechanism to override settings in scrapy commands before the Crawler object is available
2011-09-15 13:27:01 -03:00
Pablo Hoffman
77ffaa50e5
Merge pull request #38 from kalessin/master
...
_extract_links require extra parameter base_url
2011-09-14 07:58:43 -07:00
Martin Olveyra
509b05db57
_extract_links requires extra parameter base_url in order to avoid
...
exception when called from superclass method
2011-09-14 10:58:09 -03:00
Pablo Hoffman
2fcb7097bd
removed documentation header notifying about other documentation versions, as that's provided by readthedocs already
2011-09-14 02:41:01 -03:00
Pablo Hoffman
ab1c9cfc56
removed documentation header notifying about other documentation versions, as that's provided by readthedocs already
2011-09-14 02:39:32 -03:00
Pablo Hoffman
3b00b9cb12
added support for generating version from git revision
2011-09-11 11:24:12 -03:00
Pablo Hoffman
43ae7bdd89
added tests for SpiderState extension
2011-09-11 08:27:05 -03:00
Pablo Hoffman
1e43afeaea
added support for generating version from git revision, and use it in extras/makedeb.py
2011-09-09 03:03:46 -03:00
Daniel Grana
5f1b1c05f8
Do not filter requests with dont_filter attribute set in OffsiteMiddleware
2011-09-08 15:18:10 -03:00
Pablo Hoffman
bff3d31469
scrapyd: updated schedule.json response format
2011-09-04 09:29:24 -03:00
Pablo Hoffman
17cc90e3fe
added unittest for SpiderState extension
2011-09-04 08:58:23 -03:00
Pablo Hoffman
e0ec239930
restored support for spider.DOWNLOAD_DELAY attribute, with deprecation warning
2011-09-04 08:39:57 -03:00
Pablo Hoffman
c8d30c6ffa
replaced use of deprecated w3lib.url.urljoin_rfc by stdlib urlparse.urljoin
2011-09-02 19:09:21 -03:00
Pablo Hoffman
a1dbc62b45
removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead)
2011-09-02 18:27:39 -03:00
Pablo Hoffman
40f7075f11
added initial documentation about suspend and resume crawls
2011-09-02 13:12:27 -03:00