Pablo Hoffman
fd0e490157
added StatsMailer extension
2009-06-12 15:38:21 -03:00
Pablo Hoffman
7c2476bb25
fixed a couple of bugs caused by adding priority to Requests (thanks Artem for reporting)
2009-06-12 08:31:30 -03:00
Pablo Hoffman
4a1a01354b
Added 'priority' attribute to Requests and removed old 'priority' argument passed through engine, scheduler and scheduler middleware calls
2009-06-11 22:25:47 -03:00
Pablo Hoffman
962dbeba88
fixed typo in docstring
2009-06-11 08:33:01 -03:00
Pablo Hoffman
e55158ebdd
Merged olveyra's patch
2009-06-10 18:00:32 -03:00
Pablo Hoffman
635ac1ca64
Simplified domain prioritizers, so that they don't receive domains in the
...
constructor (domain prioritizers will be refactored later anyway) and
simplified Scrapy Manager code thanks to this.
Added make_request_from_url method to BaseSpider, splitting funtionality to
create requests from URLs which was previously done all in start_requests.
2009-06-10 14:21:36 -03:00
Pablo Hoffman
a74b0b1764
additional simplification of OffsiteMiddleware
2009-06-09 13:09:35 -03:00
Pablo Hoffman
eca05c9e12
OffsiteMiddleware: removed logging and simplified implementation
2009-06-09 12:37:15 -03:00
molveyra
6524def4b8
dont check guid in RobustScrapedItem.validate. Instead, raise
...
NotImplemented.
2009-06-04 10:44:40 -03:00
Daniel Grana
87fbc9c58c
spidermw: add domain name to warning about missing callbacks in requests
2009-05-28 21:47:41 -03:00
Daniel Grana
727e67af5e
spidermw: ignore and warn about requests without callback returned by spiders
2009-05-28 21:41:02 -03:00
Daniel Grana
cfafa01109
spidermw: check for __iter__ instead of trying to iter() that may cause that a string pass as iterable
2009-05-28 21:10:30 -03:00
Pablo Hoffman
0f690b03dc
added deprecation warning to ErrorPages downloader middleware
2009-05-28 13:57:25 -03:00
Pablo Hoffman
1aac694343
updated settings doc
2009-05-28 13:52:56 -03:00
Pablo Hoffman
04e7f8f5f6
merged with Daniel's HttpException-removal branch
2009-05-28 13:45:26 -03:00
Daniel Grana
abda5edf09
decompressionmw: dont try to do decompress empty responses
2009-05-28 09:31:43 -03:00
Daniel Grana
85dbdf5789
finally remove HttpException
...
in this changeset:
* remove HttpException from engine and core exceptions
* replace dwmw ErrorPages with spidermw HttpError
* bugfix image pipeline media_to_download method when stat_key returns None
2009-05-28 09:30:31 -03:00
Daniel Grana
0e5bea67fd
images: adapt images pipeline to recent changes on HttpException topic
2009-05-28 00:27:42 -03:00
Daniel Grana
7eaa3ed24d
stop raising HttpException at download handlers and adapt download middlewares
2009-05-27 16:51:36 -03:00
Daniel Grana
c8827552b6
fix typo at WEBCONSOLE_ENABLED setting documentaion of default value. thanks dzen
2009-05-26 15:48:34 -03:00
Pablo Hoffman
89950af834
cluster: fixed KeyError when crawler process failed to start
2009-05-25 23:45:10 -03:00
Pablo Hoffman
6d1ffa7137
renamed CrawlDebug downloader middleware to DebugMiddleware
2009-05-25 20:14:50 -03:00
Pablo Hoffman
b1dad251ae
Deprecated Common Downloader Middleware and added DefaultHeaders Downloader
...
Middleware
2009-05-25 14:41:06 -03:00
Pablo Hoffman
90d408b04f
Some changes to HTTP cache middleware:
...
* documented
* moved from scrapy.contrib.downloadermiddleware.cache.CacheMiddleware to
scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware
* settings prefix changed from CACHE2_ to HTTPCACHE_
--HG--
rename : scrapy/contrib/downloadermiddleware/cache.py => scrapy/contrib/downloadermiddleware/httpcache.py
2009-05-24 19:13:06 -03:00
Pablo Hoffman
19f2992b26
applied Patrick patch: test_storedb: add base class for both mysql tests
2009-05-23 18:31:54 -03:00
Daniel Grana
dae0b1973b
aws: missing import
2009-05-22 13:21:46 -03:00
Daniel Grana
4efcf78a4a
aws: take AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from enviroment just like boto does
2009-05-22 13:14:16 -03:00
Ismael Carnales
3955844115
Removed FieldValueError in favour of ValueError
2009-05-21 15:01:48 +00:00
Ismael Carnales
c03e246002
Added DateTimeField
2009-05-21 14:57:52 +00:00
Ismael Carnales
d5f0cae776
New implementation of Field and MultiValuedField
2009-05-21 14:55:14 +00:00
Ismael Carnales
0cc289ac84
New and simpler implementation of BooleanField
2009-05-21 14:51:50 +00:00
Ismael Carnales
55d922a4b0
Fixed BooleanField default value
2009-05-21 14:50:35 +00:00
Ismael Carnales
1ffe64dab3
Added test for newitem fields
2009-05-21 14:48:43 +00:00
Pablo Hoffman
48bfd3fe4b
renamed old setting
2009-05-20 02:15:31 -03:00
Pablo Hoffman
befd28eef4
docs/tutorial: added reminder about adding pipeline to ITEM_PIPELINES settings (thanks jamie)
2009-05-20 00:57:44 -03:00
Pablo Hoffman
04610a25dc
fixed bug in tutorial regarding csv writer pipeline, and other minor corrections
2009-05-19 03:07:08 -03:00
Daniel Grana
abfc52cd17
docs: modify install document to mercurial based installation instructions
2009-05-19 01:50:44 -03:00
Pablo Hoffman
13bb9934f9
moved htmlparser and lxml based link extractors to scrapy.contrib.linkextractors, with the rest of the link extractors
2009-05-18 23:06:27 -03:00
Pablo Hoffman
c161c29e08
simplified some scrapy.log implementation code
2009-05-18 21:32:17 -03:00
Pablo Hoffman
a8a3de17ef
removed unused line
2009-05-18 21:11:03 -03:00
Pablo Hoffman
b87734341d
fixed docstring
2009-05-18 20:59:26 -03:00
Pablo Hoffman
59e504a003
removed code from scrapy.link to avoid cyclic imports from scrapy.contrib.linkextractors.sgml
2009-05-18 19:27:51 -03:00
Pablo Hoffman
86498abdf1
Sorted out Link Extractors organization by moving all them to
...
scrapy.contrib.linkextractors.
The most relevant being:
scrapy.link.extractors.RegexLinkExtractor
which was moved to:
scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor
The old location still works but throws a deprecation warning. It will be
removed before the 0.7 release.
Documentation and tests were also updated.
Also, in this changeset, a new regex-based link extractor was added to
scrapy.contrib.linkextractors.regex.
--HG--
rename : scrapy/tests/sample_data/link_extractor/regex_linkextractor.html => scrapy/tests/sample_data/link_extractor/sgml_linkextractor.html
rename : scrapy/tests/test_link.py => scrapy/tests/test_contrib_linkextractors.py
2009-05-18 19:19:37 -03:00
pablo
7b34e08392
sorted out running of unittests:
...
1. removed scrapy.tests.run module which didn't work well because of a problem
with Twisted trial
2. added runtests.bat script for running tests in windows
3. added additional lookup path for trial in unix systems
2009-05-16 20:11:23 -03:00
Pablo Hoffman
eb649d661d
fixed bug with unittests data in win32
2009-05-15 19:19:05 -03:00
Pablo Hoffman
ba48a24bb7
sorted out some tests sample data paths and fixed bug with test in windows
2009-05-15 15:03:42 -03:00
Pablo Hoffman
85cd7ea140
fixed encoding bug in xmliter (thanks Atamert!), added unittests and updated utils.iterator unittest names for consistency
2009-05-14 20:21:02 -03:00
Daniel Grana
7166f9f6f9
url2guid: allow reutrning None and single values
...
--HG--
extra : rebase_source : 1a24237cb6d90d30fe8f086dbb210858f1627620
2009-05-14 14:10:09 -03:00
Pablo Hoffman
766c8a4ea8
fixed some doc typos reported by phaithful
2009-05-14 08:32:48 -03:00
Pablo Hoffman
314a1c2bc2
improved configuration of middlewares using dicts and orders ( closes #85 )
2009-05-11 01:40:40 -03:00