Pablo Hoffman
9158e9d682
Some changes to Scrapyd to support multiple configuration files, to make it easier to deploy Scrapyd applications. Also documented 'egg_runner' and 'application' options
...
--HG--
rename : debian/scrapyd.cfg => debian/000-default
rename : scrapyd/default_scrapyd.cfg => scrapyd/default_scrapyd.conf
2010-09-07 09:17:25 -03:00
Daniel Grana
3414bf13ee
remove request_uploaded signal and move response_received and response_downloaded to downloader manager. closes #228
...
--HG--
extra : rebase_source : 4af0d2a01b34de8a21048bb7f4a66bfc484b3b8f
2010-09-06 23:23:14 -03:00
Pablo Hoffman
3c5ab10688
Added FAQ entry about __VIEWSTATE parameter
2010-09-06 13:17:08 -03:00
Pablo Hoffman
3a72e5c051
Removed settings.disabled hack used in some tests. Closes #143
2010-09-06 11:04:27 -03:00
Pablo Hoffman
5f58af2005
Simplified SpiderMiddlewareManager by making it inherit from MiddlewareManager
2010-09-06 10:40:33 -03:00
Pablo Hoffman
cc72f03e10
Added IFeedStorage interface and test all Feed Storages conform to it. Also added test for StdoutFeedStorage
2010-09-06 10:22:28 -03:00
Pablo Hoffman
e3d67d74f7
docs/intro/overview.rst: add example of scraped data and introduce loaders
2010-09-06 10:04:00 -03:00
Pablo Hoffman
ff9de424c8
Added SpiderQueue tests. SQS spider queue not tested because operations take too long to complete and it's not easy to know when they have. Closes #227
2010-09-06 09:47:45 -03:00
Daniel Grana
8d1d3493e7
Added a weak key factory based cache
...
--HG--
extra : rebase_source : 2bc7cb5fdb0fd3adb63cf7fe3aedd2f1d15e49f0
2010-09-06 00:50:56 -03:00
Pablo Hoffman
00d55fbbd1
Updated 'Scrapy at a glance' document replacing item pipeline example by a simpler usage of feed exports
2010-09-05 23:38:37 -03:00
Pablo Hoffman
5ffc7650bd
Removed code no longer needed
2010-09-05 20:08:59 -03:00
Pablo Hoffman
766f2d910d
Renamed Request Handlers to Download Handlers
2010-09-05 19:35:53 -03:00
Pablo Hoffman
067ec65d97
Removed download_any singleton
2010-09-05 19:09:42 -03:00
Pablo Hoffman
a5cf71cb06
Updated Ubuntu package signing key location
2010-09-05 19:04:15 -03:00
Pablo Hoffman
6bf52fb50e
Make telnet console and web service try a range of ports for binding, instead of just one. Closes #226
2010-09-05 06:48:08 -03:00
Pablo Hoffman
ce884192c9
Fixed test broken by previous commit
2010-09-05 06:05:34 -03:00
Pablo Hoffman
630db4fecf
Simplified file:// download handler, adding support for reading binary files
2010-09-05 05:59:40 -03:00
Pablo Hoffman
14e985b076
Updated Command line tool documentation
2010-09-05 05:29:58 -03:00
Pablo Hoffman
1190f97944
Updated settings documentation
2010-09-05 04:58:14 -03:00
Pablo Hoffman
ebdb733e95
Updated some old messages in Scrapy shell doc
2010-09-05 04:45:43 -03:00
Pablo Hoffman
2f12618890
Post reference to Scrapyd in FAQ
2010-09-05 04:35:27 -03:00
Pablo Hoffman
a66bef7925
Make execution queue poll interval configurable through a new QUEUE_POLL_INTERVAL setting
2010-09-05 02:23:08 -03:00
Pablo Hoffman
b800fdcb4d
SqliteSpiderQueue: failback to in-memory SQLite if database cannot be opened (typically due to missing write permissions)
2010-09-04 03:49:46 -03:00
Pablo Hoffman
bf34094e5a
Added versionadded:: notice to new documentation topics
2010-09-04 03:30:45 -03:00
Pablo Hoffman
e3921ab016
Don't set allowed_domains attribute in BaseSpider constructor
2010-09-04 03:20:05 -03:00
Daniel Grana
9f4b1e47a4
damn, really fix httpcache docs
2010-09-04 03:26:41 -03:00
Daniel Grana
7ad901640b
fix httpcache docs
2010-09-04 03:23:08 -03:00
Daniel Grana
1abaa79469
Make ignored schemes configurable in HttpCacheMiddleware. closes #224
...
--HG--
extra : rebase_source : 2e6e8b93c642290f9bd6eb634eb4c8cd6da07c75
2010-09-04 02:58:43 -03:00
Pablo Hoffman
5a6284ceb3
Added TODO:
2010-09-04 02:56:50 -03:00
Daniel Grana
58feb15528
httpcache must restore responses using response.url instead of request.url
...
--HG--
extra : rebase_source : 08fa2c3862bb35db2234e0f9bb9cb9ce4a8f4d8d
2010-09-04 02:53:09 -03:00
Pablo Hoffman
7b9fa7fbaa
Don't filter out requests coming from spiders that don't define allowed_domains. Closes #225
2010-09-04 02:23:04 -03:00
Daniel Grana
1b11563383
monkeypatch urlparse if s3 netloc parsing fails (python issue7904). closes #223
2010-09-04 01:02:52 -03:00
Pablo Hoffman
9cfa8edd14
Automatic merge
2010-09-03 17:46:36 -03:00
Daniel Grana
9b68c3c1b1
Add S3 scheme request handler. closes #222
2010-09-03 16:19:47 -03:00
Daniel Grana
30d94b5bf5
Convert request handlers to classes and support NotConfigured. closes #221
2010-09-03 16:18:46 -03:00
Pablo Hoffman
37e9c5d78e
Added new Scrapy service with support for:
...
* multiple projects
* uploading scrapy projects as Python eggs
* scheduling spiders using a JSON API
Documentation is added along with the code.
Closes #218 .
--HG--
rename : debian/scrapy-service.default => debian/scrapyd.default
rename : debian/scrapy-service.dirs => debian/scrapyd.dirs
rename : debian/scrapy-service.install => debian/scrapyd.install
rename : debian/scrapy-service.lintian-overrides => debian/scrapyd.lintian-overrides
rename : debian/scrapy-service.postinst => debian/scrapyd.postinst
rename : debian/scrapy-service.postrm => debian/scrapyd.postrm
rename : debian/scrapy-service.upstart => debian/scrapyd.upstart
rename : extras/scrapy.tac => extras/scrapyd.tac
2010-09-03 15:54:42 -03:00
Pablo Hoffman
1b766877f1
Added ISpiderManager interface and a test to verify the default SpiderManager comforms to it
2010-09-03 14:29:27 -03:00
Pablo Hoffman
7cfc379230
Execution Queue refactoring by taking out the queue backend to a new Spider
...
Queue API. Also ported SQS Execution Queue to Spider Queue API, and make the
scrapy queue command use the Spider Queue directly, with deferreds support.
Closes #220 .
2010-09-03 14:29:27 -03:00
Pablo Hoffman
37776618a3
Changed format of scrapy.cfg file to contain a [settings] section and a 'default' key inside it, instead of the other way around
2010-09-02 21:22:17 -03:00
Pablo Hoffman
fb69655a9f
Removed unused imports
2010-08-31 21:40:05 -03:00
Pablo Hoffman
758d21b2f9
Simplified images pipeline by allowing it to be used without having to override it in your project. Closes #217
2010-08-31 16:03:08 -03:00
Pablo Hoffman
59f09c50e4
Yet another scrapy.cmdline code refactoring by removing --settings and --version options, adding a version command and adding a UsageError exception for signaling usage errors. Updated all commands accordingly
2010-08-28 18:06:51 -03:00
Pablo Hoffman
7394aa926d
Fixed typo
2010-08-28 14:47:19 -03:00
Pablo Hoffman
616ecc5a1a
Support passing spider arguments in crawl command with -a option. Closes #216
2010-08-28 14:43:28 -03:00
Pablo Hoffman
d4941a0619
Minor change to --pidfile argument
2010-08-28 14:07:03 -03:00
Pablo Hoffman
0ae4c3aa82
call Spider.closed() method (if it exists) on SpiderManager.close_spider()
2010-08-27 18:36:00 -03:00
Pablo Hoffman
35fd1a2660
Fixed typo
2010-08-27 17:21:30 -03:00
Pablo Hoffman
3234d76b8d
Restored SpiderManager.close_spider() method but using signals instead of calling it from the engine
2010-08-27 16:19:51 -03:00
Pablo Hoffman
783887457a
Moved tests to reflect new module location
...
--HG--
rename : scrapy/tests/test_core_queue.py => scrapy/tests/test_queue.py
2010-08-27 15:50:12 -03:00
Pablo Hoffman
88c99e0b83
Check that arguments and keyword arguments are not passed simultaneously in jsonrpc_client_call()
2010-08-27 13:45:52 -03:00