Pablo Hoffman
|
40f7075f11
|
added initial documentation about suspend and resume crawls
|
2011-09-02 13:12:27 -03:00 |
|
Pablo Hoffman
|
27dd68a690
|
added SpiderState extension
|
2011-09-02 13:06:59 -03:00 |
|
Pablo Hoffman
|
c382f2fc8a
|
fixed subtle bug in disk-based priority queues caused by serialization errors, and added tests
|
2011-09-02 09:40:52 -03:00 |
|
Pablo Hoffman
|
cca0b91000
|
add setting to enable logging when unserializable requests are found
|
2011-09-01 19:40:44 -03:00 |
|
Pablo Hoffman
|
789e1493e9
|
PickleDiskQueue: use pickle protocol 2
|
2011-09-01 15:12:13 -03:00 |
|
Pablo Hoffman
|
6a31ab667d
|
minor fix to doc
|
2011-09-01 15:08:23 -03:00 |
|
Pablo Hoffman
|
d98b058c21
|
no longer recommend using labmda's in the doc, as they're not friendly with scheduler persistence
|
2011-09-01 15:06:49 -03:00 |
|
Pablo Hoffman
|
725362fdeb
|
remove redundant code
|
2011-09-01 14:58:50 -03:00 |
|
Pablo Hoffman
|
76af0cdd44
|
updated documentation and code to use -s instead of --set option
|
2011-09-01 14:35:37 -03:00 |
|
Pablo Hoffman
|
46edfd4a9d
|
remove unneeded code to simplify
|
2011-09-01 14:29:11 -03:00 |
|
Pablo Hoffman
|
edefb8ac69
|
scrapy tool: added -s alias for --set option
|
2011-09-01 14:27:47 -03:00 |
|
Pablo Hoffman
|
75284015b5
|
persistent scheduler: use pickle (instead of marshal) as the default serialization format, to support serializing more objects out of the box. also removed __slots__ from Request/Response objects to make them serializable by default.
|
2011-09-01 14:27:29 -03:00 |
|
Daniel Grana
|
f1210aed0b
|
ignore *egg-info added by pip install -e
|
2011-08-29 15:01:18 -03:00 |
|
Pablo Hoffman
|
accac332e3
|
adapted test-scrapyd.sh to be compatible with older versions of mktemp, and to not hang forever is spider doesn't run for some reason
|
2011-08-27 01:43:32 -03:00 |
|
Pablo Hoffman
|
98b68ca89d
|
scrapyd: documented support for passing setting to spiders in schedule.json
|
2011-08-27 01:31:12 -03:00 |
|
Pablo Hoffman
|
6d6cff33ca
|
added scrapyd system test script to extras/test-scrapyd.sh
|
2011-08-27 01:23:36 -03:00 |
|
Pablo Hoffman
|
91b9d89ffd
|
moved scrapy.utils.sqlite to scrapyd.sqlite
--HG--
rename : scrapy/utils/sqlite.py => scrapyd/sqlite.py
rename : scrapy/tests/test_utils_sqlite.py => scrapyd/tests/test_sqlite.py
|
2011-08-27 01:20:57 -03:00 |
|
Pablo Hoffman
|
e1aff779da
|
removed (barely used) spider context extension, to drop dependencies with sqlite
|
2011-08-27 01:03:56 -03:00 |
|
Pablo Hoffman
|
075a2d62d3
|
scrapyd: added support for passing custom settings to schedule.json
|
2011-08-27 01:02:14 -03:00 |
|
Pablo Hoffman
|
ce08504853
|
removed class method from_settings from ISpiderManager interface
|
2011-08-26 09:24:01 -03:00 |
|
Pablo Hoffman
|
47cae5fa35
|
fixed unittest broken by previous commit
|
2011-08-24 11:31:52 -03:00 |
|
Pablo Hoffman
|
669b98c4fc
|
pass close reason to close() method of new DupeFilter
|
2011-08-24 11:26:35 -03:00 |
|
Pablo Hoffman
|
5c6b0631e2
|
minor doc fix
|
2011-08-19 11:42:03 -03:00 |
|
Pablo Hoffman
|
9d97e73a24
|
fixed priority handling on the new scheduler so that it's backwards compatible (ie. bigger priorities are higher). also fixed a few documentation bugs related to requests priority
|
2011-08-19 08:26:41 -03:00 |
|
Pablo Hoffman
|
ee40aa1223
|
added from_crawler class method to SpiderManager
|
2011-08-16 11:16:35 -03:00 |
|
Pablo Hoffman
|
a3697421c0
|
some minor updates to documentation
|
2011-08-11 09:19:59 -03:00 |
|
Pablo Hoffman
|
5da6ffb57b
|
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
|
2011-08-11 09:11:19 -03:00 |
|
Pablo Hoffman
|
bc2d2183e9
|
fixed import in doc
|
2011-08-11 09:11:08 -03:00 |
|
Pablo Hoffman
|
19e6da59d8
|
added new downloader middleware: ChunkedTransferMiddleware
|
2011-08-09 03:03:25 -03:00 |
|
Pablo Hoffman
|
4db2a592e5
|
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
|
2011-08-09 01:42:11 -03:00 |
|
Pablo Hoffman
|
09af0866c7
|
scrapy.utils.python: fixed bug introduced when adding support for new IPython 0.11. refs #335
|
2011-08-09 01:41:48 -03:00 |
|
Pablo Hoffman
|
415933a1f6
|
scrapy.utils.python: fixed bug introduced when adding support for new IPython 0.11. refs #335
|
2011-08-09 01:38:31 -03:00 |
|
Pablo Hoffman
|
061132ce88
|
fixed backwards compability with images/media pipeline, after crawler singleton removal in r2758
|
2011-08-08 17:16:32 -03:00 |
|
Pablo Hoffman
|
2108517ce0
|
removed support for passing more than a single spider on 'scrapy crawl' command
|
2011-08-08 15:51:22 -03:00 |
|
Daniel Grana
|
436ad63930
|
support s3 signing on pre and post boto v2.0
--HG--
extra : rebase_source : 1d8cd5dfceeaf63975c46014b100d70f6ed36147
|
2011-08-08 14:29:28 -03:00 |
|
Pablo Hoffman
|
c64123cc63
|
proper fix to what r2760 is supposed to fix
|
2011-08-08 15:09:43 -03:00 |
|
Pablo Hoffman
|
984be35461
|
Some telnet console changes:
* renamed manager alias to crawler
* added aliases: spider, slot
* fixed est() function
|
2011-08-08 15:01:08 -03:00 |
|
Pablo Hoffman
|
f03af7874d
|
fixed bug in scheduler 'has_pending_requests' method which prevented spiders to close properly in some cases
|
2011-08-08 14:52:54 -03:00 |
|
Daniel Grana
|
c35a7519c0
|
Correctly handle query parameters on s3:// urls
|
2011-08-08 13:23:45 -03:00 |
|
Pablo Hoffman
|
5c63b2307f
|
Another step towards singleton removal: deprecated crawler singleton import (from scrapy.project import crawler) by a new class method that extensions can implement to receive the crawler
|
2011-08-08 11:42:44 -03:00 |
|
Pablo Hoffman
|
0eaa1d95f6
|
replaced DeprecationWarning by a new ScrapyDeprecationWarning category, since the default DeprecationWarning is silenced on Python 2.7+
|
2011-08-08 10:39:53 -03:00 |
|
Pablo Hoffman
|
f7c0aeccc6
|
added note about engine_started signal
|
2011-08-07 03:57:09 -03:00 |
|
Pablo Hoffman
|
a2b0737a1d
|
scrapy.utils.sitemap: added one more case of parsing invalid sitemaps
|
2011-08-07 03:24:32 -03:00 |
|
Pablo Hoffman
|
cea0dae1b2
|
scrapy.utils.sitemap: added support for parsing sitemaps with wrong namespaces, found in some bogus websites
|
2011-08-07 03:13:55 -03:00 |
|
Pablo Hoffman
|
259dccaf58
|
moved module scrapy.core.downloader.responsetypes to scrapy.responsetypes
--HG--
rename : scrapy/core/downloader/responsetypes/mime.types => scrapy/mime.types
rename : scrapy/core/downloader/responsetypes/__init__.py => scrapy/responsetypes.py
|
2011-08-07 02:49:57 -03:00 |
|
Pablo Hoffman
|
9f60c27612
|
added setting to support disabling DNS cache: DNSCACHE_ENABLED
|
2011-08-05 20:41:59 -03:00 |
|
Pablo Hoffman
|
bb67cfd955
|
added MarshalDiskQueue unittests
|
2011-08-05 20:32:22 -03:00 |
|
Pablo Hoffman
|
5c938cc029
|
removed no longer working tests from get_engine_status()
|
2011-08-05 20:26:24 -03:00 |
|
Pablo Hoffman
|
38e193d480
|
MarshalDiskQueue bug fix
|
2011-08-05 17:06:31 -03:00 |
|
Pablo Hoffman
|
1ce84046d8
|
scheduler: bug fix to use in-memory queues when request can't be serialized by the disk-queues
|
2011-08-05 12:39:29 -03:00 |
|