1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 06:14:56 +00:00

1904 Commits

Author SHA1 Message Date
Pablo Hoffman
0f690b03dc added deprecation warning to ErrorPages downloader middleware 2009-05-28 13:57:25 -03:00
Pablo Hoffman
1aac694343 updated settings doc 2009-05-28 13:52:56 -03:00
Pablo Hoffman
04e7f8f5f6 merged with Daniel's HttpException-removal branch 2009-05-28 13:45:26 -03:00
Daniel Grana
abda5edf09 decompressionmw: dont try to do decompress empty responses 2009-05-28 09:31:43 -03:00
Daniel Grana
85dbdf5789 finally remove HttpException
in this changeset:
* remove HttpException from engine and core exceptions
* replace dwmw ErrorPages with spidermw HttpError
* bugfix image pipeline media_to_download method when stat_key returns None
2009-05-28 09:30:31 -03:00
Daniel Grana
0e5bea67fd images: adapt images pipeline to recent changes on HttpException topic 2009-05-28 00:27:42 -03:00
Daniel Grana
7eaa3ed24d stop raising HttpException at download handlers and adapt download middlewares 2009-05-27 16:51:36 -03:00
Daniel Grana
c8827552b6 fix typo at WEBCONSOLE_ENABLED setting documentaion of default value. thanks dzen 2009-05-26 15:48:34 -03:00
Pablo Hoffman
89950af834 cluster: fixed KeyError when crawler process failed to start 2009-05-25 23:45:10 -03:00
Pablo Hoffman
6d1ffa7137 renamed CrawlDebug downloader middleware to DebugMiddleware 2009-05-25 20:14:50 -03:00
Pablo Hoffman
b1dad251ae Deprecated Common Downloader Middleware and added DefaultHeaders Downloader
Middleware
2009-05-25 14:41:06 -03:00
Pablo Hoffman
90d408b04f Some changes to HTTP cache middleware:
* documented
* moved from scrapy.contrib.downloadermiddleware.cache.CacheMiddleware to
  scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware
* settings prefix changed from CACHE2_ to HTTPCACHE_

--HG--
rename : scrapy/contrib/downloadermiddleware/cache.py => scrapy/contrib/downloadermiddleware/httpcache.py
2009-05-24 19:13:06 -03:00
Pablo Hoffman
19f2992b26 applied Patrick patch: test_storedb: add base class for both mysql tests 2009-05-23 18:31:54 -03:00
Daniel Grana
dae0b1973b aws: missing import 2009-05-22 13:21:46 -03:00
Daniel Grana
4efcf78a4a aws: take AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from enviroment just like boto does 2009-05-22 13:14:16 -03:00
Ismael Carnales
3955844115 Removed FieldValueError in favour of ValueError 2009-05-21 15:01:48 +00:00
Ismael Carnales
c03e246002 Added DateTimeField 2009-05-21 14:57:52 +00:00
Ismael Carnales
d5f0cae776 New implementation of Field and MultiValuedField 2009-05-21 14:55:14 +00:00
Ismael Carnales
0cc289ac84 New and simpler implementation of BooleanField 2009-05-21 14:51:50 +00:00
Ismael Carnales
55d922a4b0 Fixed BooleanField default value 2009-05-21 14:50:35 +00:00
Ismael Carnales
1ffe64dab3 Added test for newitem fields 2009-05-21 14:48:43 +00:00
Pablo Hoffman
48bfd3fe4b renamed old setting 2009-05-20 02:15:31 -03:00
Pablo Hoffman
befd28eef4 docs/tutorial: added reminder about adding pipeline to ITEM_PIPELINES settings (thanks jamie) 2009-05-20 00:57:44 -03:00
Pablo Hoffman
04610a25dc fixed bug in tutorial regarding csv writer pipeline, and other minor corrections 2009-05-19 03:07:08 -03:00
Daniel Grana
abfc52cd17 docs: modify install document to mercurial based installation instructions 2009-05-19 01:50:44 -03:00
Pablo Hoffman
13bb9934f9 moved htmlparser and lxml based link extractors to scrapy.contrib.linkextractors, with the rest of the link extractors 2009-05-18 23:06:27 -03:00
Pablo Hoffman
c161c29e08 simplified some scrapy.log implementation code 2009-05-18 21:32:17 -03:00
Pablo Hoffman
a8a3de17ef removed unused line 2009-05-18 21:11:03 -03:00
Pablo Hoffman
b87734341d fixed docstring 2009-05-18 20:59:26 -03:00
Pablo Hoffman
59e504a003 removed code from scrapy.link to avoid cyclic imports from scrapy.contrib.linkextractors.sgml 2009-05-18 19:27:51 -03:00
Pablo Hoffman
86498abdf1 Sorted out Link Extractors organization by moving all them to
scrapy.contrib.linkextractors.

The most relevant being:
    scrapy.link.extractors.RegexLinkExtractor

which was moved to:
    scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor

The old location still works but throws a deprecation warning. It will be
removed before the 0.7 release.

Documentation and tests were also updated.

Also, in this changeset, a new regex-based link extractor was added to
scrapy.contrib.linkextractors.regex.

--HG--
rename : scrapy/tests/sample_data/link_extractor/regex_linkextractor.html => scrapy/tests/sample_data/link_extractor/sgml_linkextractor.html
rename : scrapy/tests/test_link.py => scrapy/tests/test_contrib_linkextractors.py
2009-05-18 19:19:37 -03:00
pablo
7b34e08392 sorted out running of unittests:
1. removed scrapy.tests.run module which didn't work well because of a problem
   with Twisted trial
2. added runtests.bat script for running tests in windows
3. added additional lookup path for trial in unix systems
2009-05-16 20:11:23 -03:00
Pablo Hoffman
eb649d661d fixed bug with unittests data in win32 2009-05-15 19:19:05 -03:00
Pablo Hoffman
ba48a24bb7 sorted out some tests sample data paths and fixed bug with test in windows 2009-05-15 15:03:42 -03:00
Pablo Hoffman
85cd7ea140 fixed encoding bug in xmliter (thanks Atamert!), added unittests and updated utils.iterator unittest names for consistency 2009-05-14 20:21:02 -03:00
Daniel Grana
7166f9f6f9 url2guid: allow reutrning None and single values
--HG--
extra : rebase_source : 1a24237cb6d90d30fe8f086dbb210858f1627620
2009-05-14 14:10:09 -03:00
Pablo Hoffman
766c8a4ea8 fixed some doc typos reported by phaithful 2009-05-14 08:32:48 -03:00
Pablo Hoffman
314a1c2bc2 improved configuration of middlewares using dicts and orders (closes #85) 2009-05-11 01:40:40 -03:00
Daniel Grana
3dee4b6728 redirect: 3xx status code based redirection requires Location header to be set 2009-05-08 12:01:02 -03:00
Daniel Grana
05f1e0c12d adding .svn to hgignore to help with hg2svn autocommits 2009-05-07 17:29:30 -03:00
Pablo Hoffman
9be900706e merge 2009-05-07 16:35:13 -03:00
Pablo Hoffman
edf5b6723a renamed remove_escape_chars to replace_escape_chars (adaptor and function), added more tests to replace_escape_chars, keeping backwards compatibility 2009-05-07 16:33:06 -03:00
Pablo Hoffman
91657c0d12 Sorted exceptions reference alphabetically 2009-05-07 14:52:32 -03:00
Pablo Hoffman
c1c7b2d6c6 Sorted exceptions reference alphabetically
--HG--
extra : rebase_source : 1c6a192a76fcc90103ea324f6baf4387ba65e14a
2009-05-07 14:52:32 -03:00
Ismael Carnales
da7b9358a4 updated docstrings for remove_escape_chars and its adaptor factory 2009-05-07 16:03:41 +00:00
Ismael Carnales
1b3d40e639 force replace_by to be unicode in remove_escape_chars, added tests 2009-05-07 15:44:38 +00:00
Ismael Carnales
7eb79488aa added a replace_str param to remove_escape_chars and added remove_escape adaptor using it 2009-05-07 15:24:40 +00:00
Ismael Carnales
77b25d3036 Added serialization functions to newitem 2009-05-07 12:06:18 -03:00
Pablo Hoffman
426282bbc9 fixed bug with FormRequest class which wasn't setting method=POST by default 2009-05-07 00:36:39 -03:00
Daniel Grana
33909e05b4 ignore pycs and twisted temp trial dir and dropin.cache 2009-05-06 15:59:50 -03:00