Pablo Hoffman
0f690b03dc
added deprecation warning to ErrorPages downloader middleware
2009-05-28 13:57:25 -03:00
Pablo Hoffman
1aac694343
updated settings doc
2009-05-28 13:52:56 -03:00
Pablo Hoffman
04e7f8f5f6
merged with Daniel's HttpException-removal branch
2009-05-28 13:45:26 -03:00
Daniel Grana
abda5edf09
decompressionmw: dont try to do decompress empty responses
2009-05-28 09:31:43 -03:00
Daniel Grana
85dbdf5789
finally remove HttpException
...
in this changeset:
* remove HttpException from engine and core exceptions
* replace dwmw ErrorPages with spidermw HttpError
* bugfix image pipeline media_to_download method when stat_key returns None
2009-05-28 09:30:31 -03:00
Daniel Grana
0e5bea67fd
images: adapt images pipeline to recent changes on HttpException topic
2009-05-28 00:27:42 -03:00
Daniel Grana
7eaa3ed24d
stop raising HttpException at download handlers and adapt download middlewares
2009-05-27 16:51:36 -03:00
Daniel Grana
c8827552b6
fix typo at WEBCONSOLE_ENABLED setting documentaion of default value. thanks dzen
2009-05-26 15:48:34 -03:00
Pablo Hoffman
89950af834
cluster: fixed KeyError when crawler process failed to start
2009-05-25 23:45:10 -03:00
Pablo Hoffman
6d1ffa7137
renamed CrawlDebug downloader middleware to DebugMiddleware
2009-05-25 20:14:50 -03:00
Pablo Hoffman
b1dad251ae
Deprecated Common Downloader Middleware and added DefaultHeaders Downloader
...
Middleware
2009-05-25 14:41:06 -03:00
Pablo Hoffman
90d408b04f
Some changes to HTTP cache middleware:
...
* documented
* moved from scrapy.contrib.downloadermiddleware.cache.CacheMiddleware to
scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware
* settings prefix changed from CACHE2_ to HTTPCACHE_
--HG--
rename : scrapy/contrib/downloadermiddleware/cache.py => scrapy/contrib/downloadermiddleware/httpcache.py
2009-05-24 19:13:06 -03:00
Pablo Hoffman
19f2992b26
applied Patrick patch: test_storedb: add base class for both mysql tests
2009-05-23 18:31:54 -03:00
Daniel Grana
dae0b1973b
aws: missing import
2009-05-22 13:21:46 -03:00
Daniel Grana
4efcf78a4a
aws: take AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from enviroment just like boto does
2009-05-22 13:14:16 -03:00
Ismael Carnales
3955844115
Removed FieldValueError in favour of ValueError
2009-05-21 15:01:48 +00:00
Ismael Carnales
c03e246002
Added DateTimeField
2009-05-21 14:57:52 +00:00
Ismael Carnales
d5f0cae776
New implementation of Field and MultiValuedField
2009-05-21 14:55:14 +00:00
Ismael Carnales
0cc289ac84
New and simpler implementation of BooleanField
2009-05-21 14:51:50 +00:00
Ismael Carnales
55d922a4b0
Fixed BooleanField default value
2009-05-21 14:50:35 +00:00
Ismael Carnales
1ffe64dab3
Added test for newitem fields
2009-05-21 14:48:43 +00:00
Pablo Hoffman
48bfd3fe4b
renamed old setting
2009-05-20 02:15:31 -03:00
Pablo Hoffman
befd28eef4
docs/tutorial: added reminder about adding pipeline to ITEM_PIPELINES settings (thanks jamie)
2009-05-20 00:57:44 -03:00
Pablo Hoffman
04610a25dc
fixed bug in tutorial regarding csv writer pipeline, and other minor corrections
2009-05-19 03:07:08 -03:00
Daniel Grana
abfc52cd17
docs: modify install document to mercurial based installation instructions
2009-05-19 01:50:44 -03:00
Pablo Hoffman
13bb9934f9
moved htmlparser and lxml based link extractors to scrapy.contrib.linkextractors, with the rest of the link extractors
2009-05-18 23:06:27 -03:00
Pablo Hoffman
c161c29e08
simplified some scrapy.log implementation code
2009-05-18 21:32:17 -03:00
Pablo Hoffman
a8a3de17ef
removed unused line
2009-05-18 21:11:03 -03:00
Pablo Hoffman
b87734341d
fixed docstring
2009-05-18 20:59:26 -03:00
Pablo Hoffman
59e504a003
removed code from scrapy.link to avoid cyclic imports from scrapy.contrib.linkextractors.sgml
2009-05-18 19:27:51 -03:00
Pablo Hoffman
86498abdf1
Sorted out Link Extractors organization by moving all them to
...
scrapy.contrib.linkextractors.
The most relevant being:
scrapy.link.extractors.RegexLinkExtractor
which was moved to:
scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor
The old location still works but throws a deprecation warning. It will be
removed before the 0.7 release.
Documentation and tests were also updated.
Also, in this changeset, a new regex-based link extractor was added to
scrapy.contrib.linkextractors.regex.
--HG--
rename : scrapy/tests/sample_data/link_extractor/regex_linkextractor.html => scrapy/tests/sample_data/link_extractor/sgml_linkextractor.html
rename : scrapy/tests/test_link.py => scrapy/tests/test_contrib_linkextractors.py
2009-05-18 19:19:37 -03:00
pablo
7b34e08392
sorted out running of unittests:
...
1. removed scrapy.tests.run module which didn't work well because of a problem
with Twisted trial
2. added runtests.bat script for running tests in windows
3. added additional lookup path for trial in unix systems
2009-05-16 20:11:23 -03:00
Pablo Hoffman
eb649d661d
fixed bug with unittests data in win32
2009-05-15 19:19:05 -03:00
Pablo Hoffman
ba48a24bb7
sorted out some tests sample data paths and fixed bug with test in windows
2009-05-15 15:03:42 -03:00
Pablo Hoffman
85cd7ea140
fixed encoding bug in xmliter (thanks Atamert!), added unittests and updated utils.iterator unittest names for consistency
2009-05-14 20:21:02 -03:00
Daniel Grana
7166f9f6f9
url2guid: allow reutrning None and single values
...
--HG--
extra : rebase_source : 1a24237cb6d90d30fe8f086dbb210858f1627620
2009-05-14 14:10:09 -03:00
Pablo Hoffman
766c8a4ea8
fixed some doc typos reported by phaithful
2009-05-14 08:32:48 -03:00
Pablo Hoffman
314a1c2bc2
improved configuration of middlewares using dicts and orders ( closes #85 )
2009-05-11 01:40:40 -03:00
Daniel Grana
3dee4b6728
redirect: 3xx status code based redirection requires Location header to be set
2009-05-08 12:01:02 -03:00
Daniel Grana
05f1e0c12d
adding .svn to hgignore to help with hg2svn autocommits
2009-05-07 17:29:30 -03:00
Pablo Hoffman
9be900706e
merge
2009-05-07 16:35:13 -03:00
Pablo Hoffman
edf5b6723a
renamed remove_escape_chars to replace_escape_chars (adaptor and function), added more tests to replace_escape_chars, keeping backwards compatibility
2009-05-07 16:33:06 -03:00
Pablo Hoffman
91657c0d12
Sorted exceptions reference alphabetically
2009-05-07 14:52:32 -03:00
Pablo Hoffman
c1c7b2d6c6
Sorted exceptions reference alphabetically
...
--HG--
extra : rebase_source : 1c6a192a76fcc90103ea324f6baf4387ba65e14a
2009-05-07 14:52:32 -03:00
Ismael Carnales
da7b9358a4
updated docstrings for remove_escape_chars and its adaptor factory
2009-05-07 16:03:41 +00:00
Ismael Carnales
1b3d40e639
force replace_by to be unicode in remove_escape_chars, added tests
2009-05-07 15:44:38 +00:00
Ismael Carnales
7eb79488aa
added a replace_str param to remove_escape_chars and added remove_escape adaptor using it
2009-05-07 15:24:40 +00:00
Ismael Carnales
77b25d3036
Added serialization functions to newitem
2009-05-07 12:06:18 -03:00
Pablo Hoffman
426282bbc9
fixed bug with FormRequest class which wasn't setting method=POST by default
2009-05-07 00:36:39 -03:00
Daniel Grana
33909e05b4
ignore pycs and twisted temp trial dir and dropin.cache
2009-05-06 15:59:50 -03:00