1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 02:43:41 +00:00

2939 Commits

Author SHA1 Message Date
Pablo Hoffman
056a7c53d0 added artwork files properly now 2012-03-20 10:46:45 -03:00
Pablo Hoffman
aef70e8394 removed wrongly added artwork files 2012-03-20 10:45:48 -03:00
Pablo Hoffman
bcd8520f8d added sep directory with Scrapy Enhancement Proposal imported from old Trac site 2012-03-20 10:15:00 -03:00
Pablo Hoffman
c0141d154e added artwork directory (data taken from old Trac) 2012-03-20 10:14:11 -03:00
Pablo Hoffman
35fb01156e removed some obsolete remaining code related to sqlite support in scrapy 2012-03-16 11:55:55 -03:00
Pablo Hoffman
838e1dcce9 updated FormRequest tests to use HtmlResponse instead of Response, as it makes more sense 2012-03-15 11:47:02 -03:00
Pablo Hoffman
b6ae266546 Removed (very old and possibly broken) backwards compatibility support for Twisted 2.5 2012-03-15 00:28:24 -03:00
Pablo Hoffman
9fddc73ed8 removed backwards compatibility code for old scrapy versions 2012-03-06 05:42:09 -02:00
Pablo Hoffman
9a508d4638 Removed deprecated setting: CLOSESPIDER_ITEMPASSED 2012-03-06 05:26:57 -02:00
Pablo Hoffman
8b83177655 Added CLOSESPIDER_ERRORCOUNT to scrapy/default_settings.py 2012-03-06 05:26:57 -02:00
Pablo Hoffman
9006227358 bumped required python-w3lib version in debian/control 2012-03-05 20:25:38 -02:00
Daniel Graña
2909a60e95 test that default start_request return value type is a generator. refs #98 2012-03-05 17:53:20 -02:00
Pablo Hoffman
45685ea6cd Restored scrapy.utils.py26 module for backwards compatibility, with a deprecation message. This is needed because the module was used a lot by users and the change causes too much trouble 2012-03-05 17:15:49 -02:00
Daniel Graña
cc6e297062 Merge pull request #98 from kalessin/start_requests
This will break any spider that extends `start_requests` and expect a `list` as return value.

In the other side:

* [Docs](http://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spider.BaseSpider.start_requests) says that return value is **iterable** not list: 
* Scrapy core already support consuming start_requests generator on demand so we can avoid problems like #47
* it allows extensions to change starting requests on `spider_opened` signal
2012-03-05 08:51:22 -08:00
Martin Olveyra
f6179a927e replace list by generator also in start_requests method of Sitemap
spider
2012-03-05 14:25:12 -02:00
Martin Olveyra
cc7fc33833 change start_request to return a generator instead of a list, in order
to allow to modify start_urls triggered by spider_opened signal
2012-03-05 12:49:17 -02:00
Pablo Hoffman
e521da2e2f Dropped support for Python 2.5. See: http://blog.scrapy.org/scrapy-dropping-support-for-python-25 2012-03-01 08:18:12 -02:00
Pablo Hoffman
8eb0b11f8a removed unused import 2012-02-29 17:40:30 -02:00
Pablo Hoffman
5c329b6514 Merge pull request #97 from scrapy/w3lib_encoding
Ported scrapy to use w3lib.encoding
2012-02-29 01:45:59 -08:00
Pablo Hoffman
de3a3b68dc bumped required w3lib version to 1.1, after refactoring encoding detection to use the new w3lib.encoding module 2012-02-29 07:44:22 -02:00
Pablo Hoffman
2b16ebdc11 added minor clarification on cookiejar request meta key usage 2012-02-29 07:19:01 -02:00
Pablo Hoffman
61df6b4691 Merge pull request #51 from lostsnow/master
scrapyd: support bind to a specific ip address
2012-02-28 23:49:56 -08:00
lostsnow
5afe4f50c1 scrapyd: support bind to a specific ip address 2012-02-29 13:47:40 +08:00
Daniel Graña
798169805a Adapt response encoding detection to pass test cases 2012-02-28 14:32:55 -02:00
Pablo Hoffman
81abb45000 fixed bug in new cookiejar documentation 2012-02-28 11:08:25 -02:00
Pablo Hoffman
26c8004125 added documentation for the new cookiejar Request.meta key 2012-02-27 19:58:58 -02:00
Pablo Hoffman
44d6da82fd Merge pull request #96 from kalessin/cookiesmultijar
allow to work with multiple cookie jars on the same spider
2012-02-27 13:48:43 -08:00
olveyra
c093ac5ec6 allow to work with multiple cookie jars on the same spider 2012-02-27 18:03:48 +00:00
Pablo Hoffman
4ed1a03521 Merge pull request #95 from scrapy/openmobilealliance-mimetype
Handle as html standard mimetype defined by Open Mobile Alliance
2012-02-24 10:28:32 -08:00
Daniel Graña
049f315ff4 Handle as html standard mimetype defined by Open Mobile Alliance 2012-02-24 16:16:35 -02:00
Pablo Hoffman
b1f011d740 use netloc instead of hostname in url_is_from_any_domain(). closes #50 2012-02-24 02:09:02 -02:00
Daniel Graña
08d2c2b9ee Merge branch 'GH92-image-buf-threading' 2012-02-23 19:25:28 -02:00
Daniel Graña
2dbf2a38a2 move buffer pointing to start of file before computing checksum. refs #92 2012-02-23 19:23:37 -02:00
Pablo Hoffman
e0de5f3eab Merge pull request #93 from dangra/GH92-image-buf-threading
compute image checksum before persisting images
2012-02-23 13:21:10 -08:00
Daniel Graña
3286ce4f42 Compute image checksum before persisting images. closes #92
Avoids threading issue accesing buffer
2012-02-23 19:17:21 -02:00
Pablo Hoffman
52483c55cd Merge pull request #94 from dangra/mediapipeline-cache-failures
remove as much information as possible from cached failure
2012-02-23 13:11:29 -08:00
Daniel Graña
5c73a0b1c1 remove leaking references in cached failures 2012-02-23 19:08:36 -02:00
Pablo Hoffman
e312a88582 MemoryUsage: use resident memory size (instead of virtual) for tracking memory usage 2012-02-23 17:42:02 -02:00
Pablo Hoffman
7fe7c3f3b1 MemoryUsage extension: close the spiders (instead of stopping the engine) when the limit is exceeded, providing a descriptive reason for the close. Also fixed default value of MEMUSAGE_ENABLED setting to match the documentation. 2012-02-23 17:05:06 -02:00
Pablo Hoffman
c476681c06 ported to code to use w3lib.encoding (work in progress, many tests failing yet) 2012-02-21 21:31:19 -02:00
Pablo Hoffman
6769b92493 Merge branch 'master' of github.com:scrapy/scrapy 2012-02-19 05:59:30 -02:00
Pablo Hoffman
0939106872 fixed bug in MemoryUsage extension: get_engine_status() takes exactly 1 argument (0 given) 2012-02-19 05:59:21 -02:00
Daniel Graña
5b0f10e767 Merge pull request #90 from kalessin/master
identify autothrottle debug mode stats with slot key, in order to allow to track concurrency/delay issues with spiders which crawls more than one site.
2012-02-17 11:12:41 -08:00
Martin Olveyra
b094cd4c57 identify autothrottle debug mode stats with slot key, in order to allow to
track concurrency/delay issues with spiders which crawls more than one
site.
2012-02-17 16:53:48 -02:00
Pablo Hoffman
7b8942a648 updated StackTraceDump extension doc 2012-02-16 15:14:17 -02:00
Pablo Hoffman
fe2ce938ee also dump (scrapy.utils.trackref) live references in StackTraceDump extension 2012-02-16 14:58:11 -02:00
Pablo Hoffman
900bf08fb6 fixed struct.error on http compression middleware. closes #87 2012-02-11 20:28:09 -02:00
Daniel Graña
acf69dac44 ajax crawling wasn't expanding for unicode urls 2012-02-07 14:44:06 -02:00
Daniel Graña
2e18f0db33 Catch start_requests iterator errors. refs #83 2012-01-27 17:41:38 -02:00
Daniel Graña
7201d074c1 Merge branch 'issue82' 2012-01-25 19:17:26 -02:00