Daniel Graña
|
a5c15004f9
|
Merge branch '0.12'
|
2011-09-18 01:08:04 -03:00 |
|
Daniel Graña
|
fac5e5ea21
|
migrate hgignore to gitignore
|
2011-09-18 01:05:31 -03:00 |
|
Pablo Hoffman
|
d788ba8a44
|
new mechanism to override settings in scrapy commands before the Crawler object is available
|
2011-09-15 13:27:01 -03:00 |
|
Pablo Hoffman
|
77ffaa50e5
|
Merge pull request #38 from kalessin/master
_extract_links require extra parameter base_url
|
2011-09-14 07:58:43 -07:00 |
|
Martin Olveyra
|
509b05db57
|
_extract_links requires extra parameter base_url in order to avoid
exception when called from superclass method
|
2011-09-14 10:58:09 -03:00 |
|
Pablo Hoffman
|
2fcb7097bd
|
removed documentation header notifying about other documentation versions, as that's provided by readthedocs already
|
2011-09-14 02:41:01 -03:00 |
|
Pablo Hoffman
|
ab1c9cfc56
|
removed documentation header notifying about other documentation versions, as that's provided by readthedocs already
|
2011-09-14 02:39:32 -03:00 |
|
Pablo Hoffman
|
3b00b9cb12
|
added support for generating version from git revision
|
2011-09-11 11:24:12 -03:00 |
|
Pablo Hoffman
|
43ae7bdd89
|
added tests for SpiderState extension
|
2011-09-11 08:27:05 -03:00 |
|
Pablo Hoffman
|
1e43afeaea
|
added support for generating version from git revision, and use it in extras/makedeb.py
|
2011-09-09 03:03:46 -03:00 |
|
Daniel Grana
|
5f1b1c05f8
|
Do not filter requests with dont_filter attribute set in OffsiteMiddleware
|
2011-09-08 15:18:10 -03:00 |
|
Pablo Hoffman
|
bff3d31469
|
scrapyd: updated schedule.json response format
|
2011-09-04 09:29:24 -03:00 |
|
Pablo Hoffman
|
17cc90e3fe
|
added unittest for SpiderState extension
|
2011-09-04 08:58:23 -03:00 |
|
Pablo Hoffman
|
e0ec239930
|
restored support for spider.DOWNLOAD_DELAY attribute, with deprecation warning
|
2011-09-04 08:39:57 -03:00 |
|
Pablo Hoffman
|
c8d30c6ffa
|
replaced use of deprecated w3lib.url.urljoin_rfc by stdlib urlparse.urljoin
|
2011-09-02 19:09:21 -03:00 |
|
Pablo Hoffman
|
a1dbc62b45
|
removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead)
|
2011-09-02 18:27:39 -03:00 |
|
Pablo Hoffman
|
40f7075f11
|
added initial documentation about suspend and resume crawls
|
2011-09-02 13:12:27 -03:00 |
|
Pablo Hoffman
|
27dd68a690
|
added SpiderState extension
|
2011-09-02 13:06:59 -03:00 |
|
Pablo Hoffman
|
c382f2fc8a
|
fixed subtle bug in disk-based priority queues caused by serialization errors, and added tests
|
2011-09-02 09:40:52 -03:00 |
|
Pablo Hoffman
|
cca0b91000
|
add setting to enable logging when unserializable requests are found
|
2011-09-01 19:40:44 -03:00 |
|
Pablo Hoffman
|
789e1493e9
|
PickleDiskQueue: use pickle protocol 2
|
2011-09-01 15:12:13 -03:00 |
|
Pablo Hoffman
|
6a31ab667d
|
minor fix to doc
|
2011-09-01 15:08:23 -03:00 |
|
Pablo Hoffman
|
d98b058c21
|
no longer recommend using labmda's in the doc, as they're not friendly with scheduler persistence
|
2011-09-01 15:06:49 -03:00 |
|
Pablo Hoffman
|
725362fdeb
|
remove redundant code
|
2011-09-01 14:58:50 -03:00 |
|
Pablo Hoffman
|
76af0cdd44
|
updated documentation and code to use -s instead of --set option
|
2011-09-01 14:35:37 -03:00 |
|
Pablo Hoffman
|
46edfd4a9d
|
remove unneeded code to simplify
|
2011-09-01 14:29:11 -03:00 |
|
Pablo Hoffman
|
edefb8ac69
|
scrapy tool: added -s alias for --set option
|
2011-09-01 14:27:47 -03:00 |
|
Pablo Hoffman
|
75284015b5
|
persistent scheduler: use pickle (instead of marshal) as the default serialization format, to support serializing more objects out of the box. also removed __slots__ from Request/Response objects to make them serializable by default.
|
2011-09-01 14:27:29 -03:00 |
|
Daniel Grana
|
f1210aed0b
|
ignore *egg-info added by pip install -e
|
2011-08-29 15:01:18 -03:00 |
|
Pablo Hoffman
|
accac332e3
|
adapted test-scrapyd.sh to be compatible with older versions of mktemp, and to not hang forever is spider doesn't run for some reason
|
2011-08-27 01:43:32 -03:00 |
|
Pablo Hoffman
|
98b68ca89d
|
scrapyd: documented support for passing setting to spiders in schedule.json
|
2011-08-27 01:31:12 -03:00 |
|
Pablo Hoffman
|
6d6cff33ca
|
added scrapyd system test script to extras/test-scrapyd.sh
|
2011-08-27 01:23:36 -03:00 |
|
Pablo Hoffman
|
91b9d89ffd
|
moved scrapy.utils.sqlite to scrapyd.sqlite
--HG--
rename : scrapy/utils/sqlite.py => scrapyd/sqlite.py
rename : scrapy/tests/test_utils_sqlite.py => scrapyd/tests/test_sqlite.py
|
2011-08-27 01:20:57 -03:00 |
|
Pablo Hoffman
|
e1aff779da
|
removed (barely used) spider context extension, to drop dependencies with sqlite
|
2011-08-27 01:03:56 -03:00 |
|
Pablo Hoffman
|
075a2d62d3
|
scrapyd: added support for passing custom settings to schedule.json
|
2011-08-27 01:02:14 -03:00 |
|
Pablo Hoffman
|
ce08504853
|
removed class method from_settings from ISpiderManager interface
|
2011-08-26 09:24:01 -03:00 |
|
Pablo Hoffman
|
47cae5fa35
|
fixed unittest broken by previous commit
|
2011-08-24 11:31:52 -03:00 |
|
Pablo Hoffman
|
669b98c4fc
|
pass close reason to close() method of new DupeFilter
|
2011-08-24 11:26:35 -03:00 |
|
Pablo Hoffman
|
5c6b0631e2
|
minor doc fix
|
2011-08-19 11:42:03 -03:00 |
|
Pablo Hoffman
|
9d97e73a24
|
fixed priority handling on the new scheduler so that it's backwards compatible (ie. bigger priorities are higher). also fixed a few documentation bugs related to requests priority
|
2011-08-19 08:26:41 -03:00 |
|
Pablo Hoffman
|
ee40aa1223
|
added from_crawler class method to SpiderManager
|
2011-08-16 11:16:35 -03:00 |
|
Pablo Hoffman
|
a3697421c0
|
some minor updates to documentation
|
2011-08-11 09:19:59 -03:00 |
|
Pablo Hoffman
|
5da6ffb57b
|
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
|
2011-08-11 09:11:19 -03:00 |
|
Pablo Hoffman
|
bc2d2183e9
|
fixed import in doc
|
2011-08-11 09:11:08 -03:00 |
|
Pablo Hoffman
|
19e6da59d8
|
added new downloader middleware: ChunkedTransferMiddleware
|
2011-08-09 03:03:25 -03:00 |
|
Pablo Hoffman
|
4db2a592e5
|
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
|
2011-08-09 01:42:11 -03:00 |
|
Pablo Hoffman
|
09af0866c7
|
scrapy.utils.python: fixed bug introduced when adding support for new IPython 0.11. refs #335
|
2011-08-09 01:41:48 -03:00 |
|
Pablo Hoffman
|
415933a1f6
|
scrapy.utils.python: fixed bug introduced when adding support for new IPython 0.11. refs #335
|
2011-08-09 01:38:31 -03:00 |
|
Pablo Hoffman
|
061132ce88
|
fixed backwards compability with images/media pipeline, after crawler singleton removal in r2758
|
2011-08-08 17:16:32 -03:00 |
|
Pablo Hoffman
|
2108517ce0
|
removed support for passing more than a single spider on 'scrapy crawl' command
|
2011-08-08 15:51:22 -03:00 |
|