1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 06:43:46 +00:00

3062 Commits

Author SHA1 Message Date
Pablo Hoffman
d788ba8a44 new mechanism to override settings in scrapy commands before the Crawler object is available 2011-09-15 13:27:01 -03:00
Pablo Hoffman
77ffaa50e5 Merge pull request #38 from kalessin/master
_extract_links require extra parameter base_url
2011-09-14 07:58:43 -07:00
Martin Olveyra
509b05db57 _extract_links requires extra parameter base_url in order to avoid
exception when called from superclass method
2011-09-14 10:58:09 -03:00
Pablo Hoffman
2fcb7097bd removed documentation header notifying about other documentation versions, as that's provided by readthedocs already 2011-09-14 02:41:01 -03:00
Pablo Hoffman
ab1c9cfc56 removed documentation header notifying about other documentation versions, as that's provided by readthedocs already 2011-09-14 02:39:32 -03:00
Pablo Hoffman
3b00b9cb12 added support for generating version from git revision 2011-09-11 11:24:12 -03:00
Pablo Hoffman
43ae7bdd89 added tests for SpiderState extension 2011-09-11 08:27:05 -03:00
Pablo Hoffman
1e43afeaea added support for generating version from git revision, and use it in extras/makedeb.py 2011-09-09 03:03:46 -03:00
Daniel Grana
5f1b1c05f8 Do not filter requests with dont_filter attribute set in OffsiteMiddleware 2011-09-08 15:18:10 -03:00
Pablo Hoffman
bff3d31469 scrapyd: updated schedule.json response format 2011-09-04 09:29:24 -03:00
Pablo Hoffman
17cc90e3fe added unittest for SpiderState extension 2011-09-04 08:58:23 -03:00
Pablo Hoffman
e0ec239930 restored support for spider.DOWNLOAD_DELAY attribute, with deprecation warning 2011-09-04 08:39:57 -03:00
Pablo Hoffman
c8d30c6ffa replaced use of deprecated w3lib.url.urljoin_rfc by stdlib urlparse.urljoin 2011-09-02 19:09:21 -03:00
Pablo Hoffman
a1dbc62b45 removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead) 2011-09-02 18:27:39 -03:00
Pablo Hoffman
40f7075f11 added initial documentation about suspend and resume crawls 2011-09-02 13:12:27 -03:00
Pablo Hoffman
27dd68a690 added SpiderState extension 2011-09-02 13:06:59 -03:00
Pablo Hoffman
c382f2fc8a fixed subtle bug in disk-based priority queues caused by serialization errors, and added tests 2011-09-02 09:40:52 -03:00
Pablo Hoffman
cca0b91000 add setting to enable logging when unserializable requests are found 2011-09-01 19:40:44 -03:00
Pablo Hoffman
789e1493e9 PickleDiskQueue: use pickle protocol 2 2011-09-01 15:12:13 -03:00
Pablo Hoffman
6a31ab667d minor fix to doc 2011-09-01 15:08:23 -03:00
Pablo Hoffman
d98b058c21 no longer recommend using labmda's in the doc, as they're not friendly with scheduler persistence 2011-09-01 15:06:49 -03:00
Pablo Hoffman
725362fdeb remove redundant code 2011-09-01 14:58:50 -03:00
Pablo Hoffman
76af0cdd44 updated documentation and code to use -s instead of --set option 2011-09-01 14:35:37 -03:00
Pablo Hoffman
46edfd4a9d remove unneeded code to simplify 2011-09-01 14:29:11 -03:00
Pablo Hoffman
edefb8ac69 scrapy tool: added -s alias for --set option 2011-09-01 14:27:47 -03:00
Pablo Hoffman
75284015b5 persistent scheduler: use pickle (instead of marshal) as the default serialization format, to support serializing more objects out of the box. also removed __slots__ from Request/Response objects to make them serializable by default. 2011-09-01 14:27:29 -03:00
Daniel Grana
f1210aed0b ignore *egg-info added by pip install -e 2011-08-29 15:01:18 -03:00
Pablo Hoffman
accac332e3 adapted test-scrapyd.sh to be compatible with older versions of mktemp, and to not hang forever is spider doesn't run for some reason 2011-08-27 01:43:32 -03:00
Pablo Hoffman
98b68ca89d scrapyd: documented support for passing setting to spiders in schedule.json 2011-08-27 01:31:12 -03:00
Pablo Hoffman
6d6cff33ca added scrapyd system test script to extras/test-scrapyd.sh 2011-08-27 01:23:36 -03:00
Pablo Hoffman
91b9d89ffd moved scrapy.utils.sqlite to scrapyd.sqlite
--HG--
rename : scrapy/utils/sqlite.py => scrapyd/sqlite.py
rename : scrapy/tests/test_utils_sqlite.py => scrapyd/tests/test_sqlite.py
2011-08-27 01:20:57 -03:00
Pablo Hoffman
e1aff779da removed (barely used) spider context extension, to drop dependencies with sqlite 2011-08-27 01:03:56 -03:00
Pablo Hoffman
075a2d62d3 scrapyd: added support for passing custom settings to schedule.json 2011-08-27 01:02:14 -03:00
Pablo Hoffman
ce08504853 removed class method from_settings from ISpiderManager interface 2011-08-26 09:24:01 -03:00
Pablo Hoffman
47cae5fa35 fixed unittest broken by previous commit 2011-08-24 11:31:52 -03:00
Pablo Hoffman
669b98c4fc pass close reason to close() method of new DupeFilter 2011-08-24 11:26:35 -03:00
Pablo Hoffman
5c6b0631e2 minor doc fix 2011-08-19 11:42:03 -03:00
Pablo Hoffman
9d97e73a24 fixed priority handling on the new scheduler so that it's backwards compatible (ie. bigger priorities are higher). also fixed a few documentation bugs related to requests priority 2011-08-19 08:26:41 -03:00
Pablo Hoffman
ee40aa1223 added from_crawler class method to SpiderManager 2011-08-16 11:16:35 -03:00
Pablo Hoffman
a3697421c0 some minor updates to documentation 2011-08-11 09:19:59 -03:00
Pablo Hoffman
5da6ffb57b Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-08-11 09:11:19 -03:00
Pablo Hoffman
bc2d2183e9 fixed import in doc 2011-08-11 09:11:08 -03:00
Pablo Hoffman
19e6da59d8 added new downloader middleware: ChunkedTransferMiddleware 2011-08-09 03:03:25 -03:00
Pablo Hoffman
4db2a592e5 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-08-09 01:42:11 -03:00
Pablo Hoffman
09af0866c7 scrapy.utils.python: fixed bug introduced when adding support for new IPython 0.11. refs #335 2011-08-09 01:41:48 -03:00
Pablo Hoffman
415933a1f6 scrapy.utils.python: fixed bug introduced when adding support for new IPython 0.11. refs #335 2011-08-09 01:38:31 -03:00
Pablo Hoffman
061132ce88 fixed backwards compability with images/media pipeline, after crawler singleton removal in r2758 2011-08-08 17:16:32 -03:00
Pablo Hoffman
2108517ce0 removed support for passing more than a single spider on 'scrapy crawl' command 2011-08-08 15:51:22 -03:00
Daniel Grana
436ad63930 support s3 signing on pre and post boto v2.0
--HG--
extra : rebase_source : 1d8cd5dfceeaf63975c46014b100d70f6ed36147
2011-08-08 14:29:28 -03:00
Pablo Hoffman
c64123cc63 proper fix to what r2760 is supposed to fix 2011-08-08 15:09:43 -03:00