Daniel Grana
7ad901640b
fix httpcache docs
2010-09-04 03:23:08 -03:00
Daniel Grana
1abaa79469
Make ignored schemes configurable in HttpCacheMiddleware. closes #224
...
--HG--
extra : rebase_source : 2e6e8b93c642290f9bd6eb634eb4c8cd6da07c75
2010-09-04 02:58:43 -03:00
Pablo Hoffman
5a6284ceb3
Added TODO:
2010-09-04 02:56:50 -03:00
Daniel Grana
58feb15528
httpcache must restore responses using response.url instead of request.url
...
--HG--
extra : rebase_source : 08fa2c3862bb35db2234e0f9bb9cb9ce4a8f4d8d
2010-09-04 02:53:09 -03:00
Pablo Hoffman
7b9fa7fbaa
Don't filter out requests coming from spiders that don't define allowed_domains. Closes #225
2010-09-04 02:23:04 -03:00
Daniel Grana
1b11563383
monkeypatch urlparse if s3 netloc parsing fails (python issue7904). closes #223
2010-09-04 01:02:52 -03:00
Pablo Hoffman
9cfa8edd14
Automatic merge
2010-09-03 17:46:36 -03:00
Daniel Grana
9b68c3c1b1
Add S3 scheme request handler. closes #222
2010-09-03 16:19:47 -03:00
Daniel Grana
30d94b5bf5
Convert request handlers to classes and support NotConfigured. closes #221
2010-09-03 16:18:46 -03:00
Pablo Hoffman
37e9c5d78e
Added new Scrapy service with support for:
...
* multiple projects
* uploading scrapy projects as Python eggs
* scheduling spiders using a JSON API
Documentation is added along with the code.
Closes #218 .
--HG--
rename : debian/scrapy-service.default => debian/scrapyd.default
rename : debian/scrapy-service.dirs => debian/scrapyd.dirs
rename : debian/scrapy-service.install => debian/scrapyd.install
rename : debian/scrapy-service.lintian-overrides => debian/scrapyd.lintian-overrides
rename : debian/scrapy-service.postinst => debian/scrapyd.postinst
rename : debian/scrapy-service.postrm => debian/scrapyd.postrm
rename : debian/scrapy-service.upstart => debian/scrapyd.upstart
rename : extras/scrapy.tac => extras/scrapyd.tac
2010-09-03 15:54:42 -03:00
Pablo Hoffman
1b766877f1
Added ISpiderManager interface and a test to verify the default SpiderManager comforms to it
2010-09-03 14:29:27 -03:00
Pablo Hoffman
7cfc379230
Execution Queue refactoring by taking out the queue backend to a new Spider
...
Queue API. Also ported SQS Execution Queue to Spider Queue API, and make the
scrapy queue command use the Spider Queue directly, with deferreds support.
Closes #220 .
2010-09-03 14:29:27 -03:00
Pablo Hoffman
37776618a3
Changed format of scrapy.cfg file to contain a [settings] section and a 'default' key inside it, instead of the other way around
2010-09-02 21:22:17 -03:00
Pablo Hoffman
fb69655a9f
Removed unused imports
2010-08-31 21:40:05 -03:00
Pablo Hoffman
758d21b2f9
Simplified images pipeline by allowing it to be used without having to override it in your project. Closes #217
2010-08-31 16:03:08 -03:00
Pablo Hoffman
59f09c50e4
Yet another scrapy.cmdline code refactoring by removing --settings and --version options, adding a version command and adding a UsageError exception for signaling usage errors. Updated all commands accordingly
2010-08-28 18:06:51 -03:00
Pablo Hoffman
7394aa926d
Fixed typo
2010-08-28 14:47:19 -03:00
Pablo Hoffman
616ecc5a1a
Support passing spider arguments in crawl command with -a option. Closes #216
2010-08-28 14:43:28 -03:00
Pablo Hoffman
d4941a0619
Minor change to --pidfile argument
2010-08-28 14:07:03 -03:00
Pablo Hoffman
0ae4c3aa82
call Spider.closed() method (if it exists) on SpiderManager.close_spider()
2010-08-27 18:36:00 -03:00
Pablo Hoffman
35fd1a2660
Fixed typo
2010-08-27 17:21:30 -03:00
Pablo Hoffman
3234d76b8d
Restored SpiderManager.close_spider() method but using signals instead of calling it from the engine
2010-08-27 16:19:51 -03:00
Pablo Hoffman
783887457a
Moved tests to reflect new module location
...
--HG--
rename : scrapy/tests/test_core_queue.py => scrapy/tests/test_queue.py
2010-08-27 15:50:12 -03:00
Pablo Hoffman
88c99e0b83
Check that arguments and keyword arguments are not passed simultaneously in jsonrpc_client_call()
2010-08-27 13:45:52 -03:00
Pablo Hoffman
ffad8e08e7
Support passing all keyword arguments to ExecutionQueue append_spider_name and append_url
2010-08-27 13:45:14 -03:00
Pablo Hoffman
4eb0383dd2
Added Scrapy to scrapy --version
2010-08-27 11:15:37 -03:00
Pablo Hoffman
e7b3247a18
Updated some missing references to scrapy-ws script
2010-08-27 01:05:59 -03:00
Pablo Hoffman
aab38be498
Print Scrapy on first log line
2010-08-27 00:53:26 -03:00
Pablo Hoffman
e14cc2c12a
Moved scrapy-ws script to extras/ and fixed broken methods due to changes in web service API
...
--HG--
rename : bin/scrapy-ws.py => extras/scrapy-ws.py
2010-08-27 00:33:08 -03:00
Pablo Hoffman
648f700ed1
Fixed log formatter tests
...
--HG--
rename : scrapy/tests/test_contrib_logformatter.py => scrapy/tests/test_logformatter.py
2010-08-26 23:23:58 -03:00
Pablo Hoffman
ad18d4a70e
Added pluggable log formatter
2010-08-26 23:19:35 -03:00
Pablo Hoffman
d1e260a8d4
Simplified engine by removing the configure() and kill() methods. Also simplified the Spider Manager by removing the close_spider() method
2010-08-26 22:20:04 -03:00
Pablo Hoffman
f6c11af4c2
Moved module: scrapy.core.queue to scrapy.queue
...
--HG--
rename : scrapy/core/queue.py => scrapy/queue.py
2010-08-26 21:15:32 -03:00
Pablo Hoffman
747f090f94
Improved Twisted version detection (wasn't working for Twisted 10.0.0)
2010-08-26 20:32:26 -03:00
Pablo Hoffman
e95f7f63f9
Fixed typo
2010-08-25 21:06:10 -03:00
Pablo Hoffman
a82a4be3ab
Added docstring to test_engine.py
2010-08-25 21:04:47 -03:00
Pablo Hoffman
59d18cf99e
Fixed crawler reference
2010-08-25 19:59:51 -03:00
Pablo Hoffman
40b590cad3
Moved scrapy.cfg auto-discovery to scrapy.conf.EnvironmentSettings class
2010-08-25 19:59:30 -03:00
Pablo Hoffman
ef7a097272
Replaced old manager references with crawler
2010-08-25 19:31:04 -03:00
Pablo Hoffman
8fc78c4d0a
Refactoring of Crawler, Commands, Execution Queue and Spider Manager:
...
Commands changes:
* removed (somewhat hacky) --init argument from settings command
* added set_crawler method to Commands, and a ``crawler`` property that returns
a configured crawler. This way, commands that don't require a crawler (such
as startproject) won't need to configure one.
Execution Queue changes:
* changed SERVICE_QUEUE_FILE setting to SQLITE_DB
* removed SERVICE_QUEUE setting
* added QUEUE_CLASS setting for defining the class to use for the execution queue
* added SERVER_QUEUE_CLASS setting for defining the class to use for the
execution queue in server mode (runserver command)
Spider Manager changes:
* simplified SpiderManager API by removing the load() method
* added from_settings classmethod to SpiderManager
* added spider_modules constructor argument to SpiderManager
Crawler changes:
* added install() method to Crawler (to install it in scrapy.project) and
uninstall() to remove it
* use CrawlerProcess.install() in scrapy.cmdline
* use crawler.install() and crawler.uninstall() in tests that a crawler in
scrapy.project
* make telnet console and webservice play nicer with twisted by stopping
listening when then engine goes down
* refactored Scrapy engine tests - it no longer uses the crawler singleton.
Closes #215 .
2010-08-25 19:24:36 -03:00
Pablo Hoffman
eb51b9f785
Removed obsolete setting
2010-08-25 06:41:19 -03:00
Pablo Hoffman
bea2f94357
Instantiate SpiderManager in Crawler constructor
2010-08-25 05:33:08 -03:00
Pablo Hoffman
8d24175c81
Added CrawlerProcess class, isolating all (Twisted) reactor-controlling code
...
into this class and leaving the Crawler class free of any reactor control.
This allows embedding the Scrapy crawler in other Twisted applications, or any
Twisted asynchronous code.
The "scrapy" command will use the new CrawlerProcess class, which resembles the
behaviour of the old (all-in-one) Crawler class.
Closes #214 .
Also moved log.start() call from Crawler class to scrapy.cmdline module.
2010-08-25 05:24:02 -03:00
Pablo Hoffman
54024d1d3f
Removed unneeded line
2010-08-25 05:06:45 -03:00
Pablo Hoffman
8e83f527b3
Removed scrapy-sqs script, as it has been superseded by the new scrapy 'queue' command
2010-08-23 22:04:49 -03:00
Pablo Hoffman
e2ed27e4fd
Added documentation for Ubuntu packages. Refs #211
2010-08-23 21:28:32 -03:00
Pablo Hoffman
9ee92686c7
Moved spidermanager tests module according to policies
...
--HG--
rename : scrapy/tests/test_contrib_spidermanager/__init__.py => scrapy/tests/test_spidermanager/__init__.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/__init__.py => scrapy/tests/test_spidermanager/test_spiders/__init__.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider0.py => scrapy/tests/test_spidermanager/test_spiders/spider0.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider1.py => scrapy/tests/test_spidermanager/test_spiders/spider1.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider2.py => scrapy/tests/test_spidermanager/test_spiders/spider2.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider3.py => scrapy/tests/test_spidermanager/test_spiders/spider3.py
2010-08-23 00:25:03 -03:00
Pablo Hoffman
58b4cc2c32
Some minor fixes to contribution Contributing documentation
2010-08-23 00:23:14 -03:00
Pablo Hoffman
6585c1a28f
removed (somewhat hacky) MAIL_DEBUG setting
2010-08-22 22:42:00 -03:00
Pablo Hoffman
e189861b46
Fixed Item Loader bug that was preventing values that evaluate to False from being loaded. Patch contributed by Anibal Pacheco. Closes #174
2010-08-22 22:07:44 -03:00