Pablo Hoffman
630db4fecf
Simplified file:// download handler, adding support for reading binary files
2010-09-05 05:59:40 -03:00
Pablo Hoffman
14e985b076
Updated Command line tool documentation
2010-09-05 05:29:58 -03:00
Pablo Hoffman
1190f97944
Updated settings documentation
2010-09-05 04:58:14 -03:00
Pablo Hoffman
ebdb733e95
Updated some old messages in Scrapy shell doc
2010-09-05 04:45:43 -03:00
Pablo Hoffman
2f12618890
Post reference to Scrapyd in FAQ
2010-09-05 04:35:27 -03:00
Pablo Hoffman
a66bef7925
Make execution queue poll interval configurable through a new QUEUE_POLL_INTERVAL setting
2010-09-05 02:23:08 -03:00
Pablo Hoffman
b800fdcb4d
SqliteSpiderQueue: failback to in-memory SQLite if database cannot be opened (typically due to missing write permissions)
2010-09-04 03:49:46 -03:00
Pablo Hoffman
bf34094e5a
Added versionadded:: notice to new documentation topics
2010-09-04 03:30:45 -03:00
Pablo Hoffman
e3921ab016
Don't set allowed_domains attribute in BaseSpider constructor
2010-09-04 03:20:05 -03:00
Daniel Grana
9f4b1e47a4
damn, really fix httpcache docs
2010-09-04 03:26:41 -03:00
Daniel Grana
7ad901640b
fix httpcache docs
2010-09-04 03:23:08 -03:00
Daniel Grana
1abaa79469
Make ignored schemes configurable in HttpCacheMiddleware. closes #224
...
--HG--
extra : rebase_source : 2e6e8b93c642290f9bd6eb634eb4c8cd6da07c75
2010-09-04 02:58:43 -03:00
Pablo Hoffman
5a6284ceb3
Added TODO:
2010-09-04 02:56:50 -03:00
Daniel Grana
58feb15528
httpcache must restore responses using response.url instead of request.url
...
--HG--
extra : rebase_source : 08fa2c3862bb35db2234e0f9bb9cb9ce4a8f4d8d
2010-09-04 02:53:09 -03:00
Pablo Hoffman
7b9fa7fbaa
Don't filter out requests coming from spiders that don't define allowed_domains. Closes #225
2010-09-04 02:23:04 -03:00
Daniel Grana
1b11563383
monkeypatch urlparse if s3 netloc parsing fails (python issue7904). closes #223
2010-09-04 01:02:52 -03:00
Pablo Hoffman
9cfa8edd14
Automatic merge
2010-09-03 17:46:36 -03:00
Daniel Grana
9b68c3c1b1
Add S3 scheme request handler. closes #222
2010-09-03 16:19:47 -03:00
Daniel Grana
30d94b5bf5
Convert request handlers to classes and support NotConfigured. closes #221
2010-09-03 16:18:46 -03:00
Pablo Hoffman
37e9c5d78e
Added new Scrapy service with support for:
...
* multiple projects
* uploading scrapy projects as Python eggs
* scheduling spiders using a JSON API
Documentation is added along with the code.
Closes #218 .
--HG--
rename : debian/scrapy-service.default => debian/scrapyd.default
rename : debian/scrapy-service.dirs => debian/scrapyd.dirs
rename : debian/scrapy-service.install => debian/scrapyd.install
rename : debian/scrapy-service.lintian-overrides => debian/scrapyd.lintian-overrides
rename : debian/scrapy-service.postinst => debian/scrapyd.postinst
rename : debian/scrapy-service.postrm => debian/scrapyd.postrm
rename : debian/scrapy-service.upstart => debian/scrapyd.upstart
rename : extras/scrapy.tac => extras/scrapyd.tac
2010-09-03 15:54:42 -03:00
Pablo Hoffman
1b766877f1
Added ISpiderManager interface and a test to verify the default SpiderManager comforms to it
2010-09-03 14:29:27 -03:00
Pablo Hoffman
7cfc379230
Execution Queue refactoring by taking out the queue backend to a new Spider
...
Queue API. Also ported SQS Execution Queue to Spider Queue API, and make the
scrapy queue command use the Spider Queue directly, with deferreds support.
Closes #220 .
2010-09-03 14:29:27 -03:00
Pablo Hoffman
37776618a3
Changed format of scrapy.cfg file to contain a [settings] section and a 'default' key inside it, instead of the other way around
2010-09-02 21:22:17 -03:00
Pablo Hoffman
fb69655a9f
Removed unused imports
2010-08-31 21:40:05 -03:00
Pablo Hoffman
758d21b2f9
Simplified images pipeline by allowing it to be used without having to override it in your project. Closes #217
2010-08-31 16:03:08 -03:00
Pablo Hoffman
59f09c50e4
Yet another scrapy.cmdline code refactoring by removing --settings and --version options, adding a version command and adding a UsageError exception for signaling usage errors. Updated all commands accordingly
2010-08-28 18:06:51 -03:00
Pablo Hoffman
7394aa926d
Fixed typo
2010-08-28 14:47:19 -03:00
Pablo Hoffman
616ecc5a1a
Support passing spider arguments in crawl command with -a option. Closes #216
2010-08-28 14:43:28 -03:00
Pablo Hoffman
d4941a0619
Minor change to --pidfile argument
2010-08-28 14:07:03 -03:00
Pablo Hoffman
0ae4c3aa82
call Spider.closed() method (if it exists) on SpiderManager.close_spider()
2010-08-27 18:36:00 -03:00
Pablo Hoffman
35fd1a2660
Fixed typo
2010-08-27 17:21:30 -03:00
Pablo Hoffman
3234d76b8d
Restored SpiderManager.close_spider() method but using signals instead of calling it from the engine
2010-08-27 16:19:51 -03:00
Pablo Hoffman
783887457a
Moved tests to reflect new module location
...
--HG--
rename : scrapy/tests/test_core_queue.py => scrapy/tests/test_queue.py
2010-08-27 15:50:12 -03:00
Pablo Hoffman
88c99e0b83
Check that arguments and keyword arguments are not passed simultaneously in jsonrpc_client_call()
2010-08-27 13:45:52 -03:00
Pablo Hoffman
ffad8e08e7
Support passing all keyword arguments to ExecutionQueue append_spider_name and append_url
2010-08-27 13:45:14 -03:00
Pablo Hoffman
4eb0383dd2
Added Scrapy to scrapy --version
2010-08-27 11:15:37 -03:00
Pablo Hoffman
e7b3247a18
Updated some missing references to scrapy-ws script
2010-08-27 01:05:59 -03:00
Pablo Hoffman
aab38be498
Print Scrapy on first log line
2010-08-27 00:53:26 -03:00
Pablo Hoffman
e14cc2c12a
Moved scrapy-ws script to extras/ and fixed broken methods due to changes in web service API
...
--HG--
rename : bin/scrapy-ws.py => extras/scrapy-ws.py
2010-08-27 00:33:08 -03:00
Pablo Hoffman
648f700ed1
Fixed log formatter tests
...
--HG--
rename : scrapy/tests/test_contrib_logformatter.py => scrapy/tests/test_logformatter.py
2010-08-26 23:23:58 -03:00
Pablo Hoffman
ad18d4a70e
Added pluggable log formatter
2010-08-26 23:19:35 -03:00
Pablo Hoffman
d1e260a8d4
Simplified engine by removing the configure() and kill() methods. Also simplified the Spider Manager by removing the close_spider() method
2010-08-26 22:20:04 -03:00
Pablo Hoffman
f6c11af4c2
Moved module: scrapy.core.queue to scrapy.queue
...
--HG--
rename : scrapy/core/queue.py => scrapy/queue.py
2010-08-26 21:15:32 -03:00
Pablo Hoffman
747f090f94
Improved Twisted version detection (wasn't working for Twisted 10.0.0)
2010-08-26 20:32:26 -03:00
Pablo Hoffman
e95f7f63f9
Fixed typo
2010-08-25 21:06:10 -03:00
Pablo Hoffman
a82a4be3ab
Added docstring to test_engine.py
2010-08-25 21:04:47 -03:00
Pablo Hoffman
59d18cf99e
Fixed crawler reference
2010-08-25 19:59:51 -03:00
Pablo Hoffman
40b590cad3
Moved scrapy.cfg auto-discovery to scrapy.conf.EnvironmentSettings class
2010-08-25 19:59:30 -03:00
Pablo Hoffman
ef7a097272
Replaced old manager references with crawler
2010-08-25 19:31:04 -03:00
Pablo Hoffman
8fc78c4d0a
Refactoring of Crawler, Commands, Execution Queue and Spider Manager:
...
Commands changes:
* removed (somewhat hacky) --init argument from settings command
* added set_crawler method to Commands, and a ``crawler`` property that returns
a configured crawler. This way, commands that don't require a crawler (such
as startproject) won't need to configure one.
Execution Queue changes:
* changed SERVICE_QUEUE_FILE setting to SQLITE_DB
* removed SERVICE_QUEUE setting
* added QUEUE_CLASS setting for defining the class to use for the execution queue
* added SERVER_QUEUE_CLASS setting for defining the class to use for the
execution queue in server mode (runserver command)
Spider Manager changes:
* simplified SpiderManager API by removing the load() method
* added from_settings classmethod to SpiderManager
* added spider_modules constructor argument to SpiderManager
Crawler changes:
* added install() method to Crawler (to install it in scrapy.project) and
uninstall() to remove it
* use CrawlerProcess.install() in scrapy.cmdline
* use crawler.install() and crawler.uninstall() in tests that a crawler in
scrapy.project
* make telnet console and webservice play nicer with twisted by stopping
listening when then engine goes down
* refactored Scrapy engine tests - it no longer uses the crawler singleton.
Closes #215 .
2010-08-25 19:24:36 -03:00