olveyra
b39cb22d83
dont discard slot when empty, just save in another dict in order to recycle if needed again.
...
This fix avoids to continuosly create new slot under certain cases, bug that prevents download_delay and max_concurrent_requests to work properly.
The problem arises when the slot for a given domain becomes empty, but further requests for that domain werent still created by the spider. This is typical when spider creates requests one by one, or it makes requests to multiple domains and one or more of them are created in a rate enough slow that makes slot to be empty each time the response is fetched.
The effect is that a new slot is created for each request under such conditions, and so the download_delay and max_concurrent_requests are not taking effect (because in order to apply, depends on an already existing slot for that domain).
2012-04-02 20:34:57 +00:00
Pablo Hoffman
e9184def35
make selector re() method use re.UNICODE flag to compile regexes
2012-04-01 00:41:03 -03:00
Pablo Hoffman
27018fced7
changed default user agent to Scrapy/0.15 (+ http://scrapy.org ) and removed no longer needed BOT_VERSION setting
2012-03-23 13:45:21 -03:00
Pablo Hoffman
731c569b5c
fixed test-scrapyd.sh script after changed on insophia website
2012-03-22 16:38:28 -03:00
Pablo Hoffman
8933e2f2be
added REFERER_ENABLED setting, to control referer middleware
2012-03-22 16:35:14 -03:00
Pablo Hoffman
eed34e88cd
Merge pull request #103 from jsyeo/patch-1
...
fixed minor mistake in Request objects documentation
2012-03-20 19:49:31 -07:00
Jason Yeo
da826aa13d
fixed minor mistake in Request objects documentation
2012-03-21 10:25:41 +08:00
Pablo Hoffman
175c70ad44
fixed minor defect in link extractors documentation
2012-03-20 22:56:45 -03:00
Pablo Hoffman
056a7c53d0
added artwork files properly now
2012-03-20 10:46:45 -03:00
Pablo Hoffman
aef70e8394
removed wrongly added artwork files
2012-03-20 10:45:48 -03:00
Pablo Hoffman
bcd8520f8d
added sep directory with Scrapy Enhancement Proposal imported from old Trac site
2012-03-20 10:15:00 -03:00
Pablo Hoffman
c0141d154e
added artwork directory (data taken from old Trac)
2012-03-20 10:14:11 -03:00
Pablo Hoffman
35fb01156e
removed some obsolete remaining code related to sqlite support in scrapy
2012-03-16 11:55:55 -03:00
Pablo Hoffman
838e1dcce9
updated FormRequest tests to use HtmlResponse instead of Response, as it makes more sense
2012-03-15 11:47:02 -03:00
Pablo Hoffman
b6ae266546
Removed (very old and possibly broken) backwards compatibility support for Twisted 2.5
2012-03-15 00:28:24 -03:00
Pablo Hoffman
9fddc73ed8
removed backwards compatibility code for old scrapy versions
2012-03-06 05:42:09 -02:00
Pablo Hoffman
9a508d4638
Removed deprecated setting: CLOSESPIDER_ITEMPASSED
2012-03-06 05:26:57 -02:00
Pablo Hoffman
8b83177655
Added CLOSESPIDER_ERRORCOUNT to scrapy/default_settings.py
2012-03-06 05:26:57 -02:00
Pablo Hoffman
9006227358
bumped required python-w3lib version in debian/control
2012-03-05 20:25:38 -02:00
Daniel Graña
2909a60e95
test that default start_request return value type is a generator. refs #98
2012-03-05 17:53:20 -02:00
Pablo Hoffman
45685ea6cd
Restored scrapy.utils.py26 module for backwards compatibility, with a deprecation message. This is needed because the module was used a lot by users and the change causes too much trouble
2012-03-05 17:15:49 -02:00
Daniel Graña
cc6e297062
Merge pull request #98 from kalessin/start_requests
...
This will break any spider that extends `start_requests` and expect a `list` as return value.
In the other side:
* [Docs](http://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spider.BaseSpider.start_requests ) says that return value is **iterable** not list:
* Scrapy core already support consuming start_requests generator on demand so we can avoid problems like #47
* it allows extensions to change starting requests on `spider_opened` signal
2012-03-05 08:51:22 -08:00
Martin Olveyra
f6179a927e
replace list by generator also in start_requests method of Sitemap
...
spider
2012-03-05 14:25:12 -02:00
Martin Olveyra
cc7fc33833
change start_request to return a generator instead of a list, in order
...
to allow to modify start_urls triggered by spider_opened signal
2012-03-05 12:49:17 -02:00
Pablo Hoffman
e521da2e2f
Dropped support for Python 2.5. See: http://blog.scrapy.org/scrapy-dropping-support-for-python-25
2012-03-01 08:18:12 -02:00
Pablo Hoffman
8eb0b11f8a
removed unused import
2012-02-29 17:40:30 -02:00
Pablo Hoffman
5c329b6514
Merge pull request #97 from scrapy/w3lib_encoding
...
Ported scrapy to use w3lib.encoding
2012-02-29 01:45:59 -08:00
Pablo Hoffman
de3a3b68dc
bumped required w3lib version to 1.1, after refactoring encoding detection to use the new w3lib.encoding module
2012-02-29 07:44:22 -02:00
Pablo Hoffman
2b16ebdc11
added minor clarification on cookiejar request meta key usage
2012-02-29 07:19:01 -02:00
Pablo Hoffman
61df6b4691
Merge pull request #51 from lostsnow/master
...
scrapyd: support bind to a specific ip address
2012-02-28 23:49:56 -08:00
lostsnow
5afe4f50c1
scrapyd: support bind to a specific ip address
2012-02-29 13:47:40 +08:00
Daniel Graña
798169805a
Adapt response encoding detection to pass test cases
2012-02-28 14:32:55 -02:00
Pablo Hoffman
81abb45000
fixed bug in new cookiejar documentation
2012-02-28 11:08:25 -02:00
Pablo Hoffman
26c8004125
added documentation for the new cookiejar Request.meta key
2012-02-27 19:58:58 -02:00
Pablo Hoffman
44d6da82fd
Merge pull request #96 from kalessin/cookiesmultijar
...
allow to work with multiple cookie jars on the same spider
2012-02-27 13:48:43 -08:00
olveyra
c093ac5ec6
allow to work with multiple cookie jars on the same spider
2012-02-27 18:03:48 +00:00
Pablo Hoffman
4ed1a03521
Merge pull request #95 from scrapy/openmobilealliance-mimetype
...
Handle as html standard mimetype defined by Open Mobile Alliance
2012-02-24 10:28:32 -08:00
Daniel Graña
049f315ff4
Handle as html standard mimetype defined by Open Mobile Alliance
2012-02-24 16:16:35 -02:00
Pablo Hoffman
b1f011d740
use netloc instead of hostname in url_is_from_any_domain(). closes #50
2012-02-24 02:09:02 -02:00
Daniel Graña
08d2c2b9ee
Merge branch 'GH92-image-buf-threading'
2012-02-23 19:25:28 -02:00
Daniel Graña
2dbf2a38a2
move buffer pointing to start of file before computing checksum. refs #92
2012-02-23 19:23:37 -02:00
Pablo Hoffman
e0de5f3eab
Merge pull request #93 from dangra/GH92-image-buf-threading
...
compute image checksum before persisting images
2012-02-23 13:21:10 -08:00
Daniel Graña
3286ce4f42
Compute image checksum before persisting images. closes #92
...
Avoids threading issue accesing buffer
2012-02-23 19:17:21 -02:00
Pablo Hoffman
52483c55cd
Merge pull request #94 from dangra/mediapipeline-cache-failures
...
remove as much information as possible from cached failure
2012-02-23 13:11:29 -08:00
Daniel Graña
5c73a0b1c1
remove leaking references in cached failures
2012-02-23 19:08:36 -02:00
Pablo Hoffman
e312a88582
MemoryUsage: use resident memory size (instead of virtual) for tracking memory usage
2012-02-23 17:42:02 -02:00
Pablo Hoffman
7fe7c3f3b1
MemoryUsage extension: close the spiders (instead of stopping the engine) when the limit is exceeded, providing a descriptive reason for the close. Also fixed default value of MEMUSAGE_ENABLED setting to match the documentation.
2012-02-23 17:05:06 -02:00
Pablo Hoffman
c476681c06
ported to code to use w3lib.encoding (work in progress, many tests failing yet)
2012-02-21 21:31:19 -02:00
Pablo Hoffman
6769b92493
Merge branch 'master' of github.com:scrapy/scrapy
2012-02-19 05:59:30 -02:00
Pablo Hoffman
0939106872
fixed bug in MemoryUsage extension: get_engine_status() takes exactly 1 argument (0 given)
2012-02-19 05:59:21 -02:00