1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 21:23:44 +00:00

2256 Commits

Author SHA1 Message Date
Daniel Grana
7ad901640b fix httpcache docs 2010-09-04 03:23:08 -03:00
Daniel Grana
1abaa79469 Make ignored schemes configurable in HttpCacheMiddleware. closes #224
--HG--
extra : rebase_source : 2e6e8b93c642290f9bd6eb634eb4c8cd6da07c75
2010-09-04 02:58:43 -03:00
Pablo Hoffman
5a6284ceb3 Added TODO: 2010-09-04 02:56:50 -03:00
Daniel Grana
58feb15528 httpcache must restore responses using response.url instead of request.url
--HG--
extra : rebase_source : 08fa2c3862bb35db2234e0f9bb9cb9ce4a8f4d8d
2010-09-04 02:53:09 -03:00
Pablo Hoffman
7b9fa7fbaa Don't filter out requests coming from spiders that don't define allowed_domains. Closes #225 2010-09-04 02:23:04 -03:00
Daniel Grana
1b11563383 monkeypatch urlparse if s3 netloc parsing fails (python issue7904). closes #223 2010-09-04 01:02:52 -03:00
Pablo Hoffman
9cfa8edd14 Automatic merge 2010-09-03 17:46:36 -03:00
Daniel Grana
9b68c3c1b1 Add S3 scheme request handler. closes #222 2010-09-03 16:19:47 -03:00
Daniel Grana
30d94b5bf5 Convert request handlers to classes and support NotConfigured. closes #221 2010-09-03 16:18:46 -03:00
Pablo Hoffman
37e9c5d78e Added new Scrapy service with support for:
* multiple projects
* uploading scrapy projects as Python eggs
* scheduling spiders using a JSON API

Documentation is added along with the code.

Closes #218.

--HG--
rename : debian/scrapy-service.default => debian/scrapyd.default
rename : debian/scrapy-service.dirs => debian/scrapyd.dirs
rename : debian/scrapy-service.install => debian/scrapyd.install
rename : debian/scrapy-service.lintian-overrides => debian/scrapyd.lintian-overrides
rename : debian/scrapy-service.postinst => debian/scrapyd.postinst
rename : debian/scrapy-service.postrm => debian/scrapyd.postrm
rename : debian/scrapy-service.upstart => debian/scrapyd.upstart
rename : extras/scrapy.tac => extras/scrapyd.tac
2010-09-03 15:54:42 -03:00
Pablo Hoffman
1b766877f1 Added ISpiderManager interface and a test to verify the default SpiderManager comforms to it 2010-09-03 14:29:27 -03:00
Pablo Hoffman
7cfc379230 Execution Queue refactoring by taking out the queue backend to a new Spider
Queue API. Also ported SQS Execution Queue to Spider Queue API, and make the
scrapy queue command use the Spider Queue directly, with deferreds support.

Closes #220.
2010-09-03 14:29:27 -03:00
Pablo Hoffman
37776618a3 Changed format of scrapy.cfg file to contain a [settings] section and a 'default' key inside it, instead of the other way around 2010-09-02 21:22:17 -03:00
Pablo Hoffman
fb69655a9f Removed unused imports 2010-08-31 21:40:05 -03:00
Pablo Hoffman
758d21b2f9 Simplified images pipeline by allowing it to be used without having to override it in your project. Closes #217 2010-08-31 16:03:08 -03:00
Pablo Hoffman
59f09c50e4 Yet another scrapy.cmdline code refactoring by removing --settings and --version options, adding a version command and adding a UsageError exception for signaling usage errors. Updated all commands accordingly 2010-08-28 18:06:51 -03:00
Pablo Hoffman
7394aa926d Fixed typo 2010-08-28 14:47:19 -03:00
Pablo Hoffman
616ecc5a1a Support passing spider arguments in crawl command with -a option. Closes #216 2010-08-28 14:43:28 -03:00
Pablo Hoffman
d4941a0619 Minor change to --pidfile argument 2010-08-28 14:07:03 -03:00
Pablo Hoffman
0ae4c3aa82 call Spider.closed() method (if it exists) on SpiderManager.close_spider() 2010-08-27 18:36:00 -03:00
Pablo Hoffman
35fd1a2660 Fixed typo 2010-08-27 17:21:30 -03:00
Pablo Hoffman
3234d76b8d Restored SpiderManager.close_spider() method but using signals instead of calling it from the engine 2010-08-27 16:19:51 -03:00
Pablo Hoffman
783887457a Moved tests to reflect new module location
--HG--
rename : scrapy/tests/test_core_queue.py => scrapy/tests/test_queue.py
2010-08-27 15:50:12 -03:00
Pablo Hoffman
88c99e0b83 Check that arguments and keyword arguments are not passed simultaneously in jsonrpc_client_call() 2010-08-27 13:45:52 -03:00
Pablo Hoffman
ffad8e08e7 Support passing all keyword arguments to ExecutionQueue append_spider_name and append_url 2010-08-27 13:45:14 -03:00
Pablo Hoffman
4eb0383dd2 Added Scrapy to scrapy --version 2010-08-27 11:15:37 -03:00
Pablo Hoffman
e7b3247a18 Updated some missing references to scrapy-ws script 2010-08-27 01:05:59 -03:00
Pablo Hoffman
aab38be498 Print Scrapy on first log line 2010-08-27 00:53:26 -03:00
Pablo Hoffman
e14cc2c12a Moved scrapy-ws script to extras/ and fixed broken methods due to changes in web service API
--HG--
rename : bin/scrapy-ws.py => extras/scrapy-ws.py
2010-08-27 00:33:08 -03:00
Pablo Hoffman
648f700ed1 Fixed log formatter tests
--HG--
rename : scrapy/tests/test_contrib_logformatter.py => scrapy/tests/test_logformatter.py
2010-08-26 23:23:58 -03:00
Pablo Hoffman
ad18d4a70e Added pluggable log formatter 2010-08-26 23:19:35 -03:00
Pablo Hoffman
d1e260a8d4 Simplified engine by removing the configure() and kill() methods. Also simplified the Spider Manager by removing the close_spider() method 2010-08-26 22:20:04 -03:00
Pablo Hoffman
f6c11af4c2 Moved module: scrapy.core.queue to scrapy.queue
--HG--
rename : scrapy/core/queue.py => scrapy/queue.py
2010-08-26 21:15:32 -03:00
Pablo Hoffman
747f090f94 Improved Twisted version detection (wasn't working for Twisted 10.0.0) 2010-08-26 20:32:26 -03:00
Pablo Hoffman
e95f7f63f9 Fixed typo 2010-08-25 21:06:10 -03:00
Pablo Hoffman
a82a4be3ab Added docstring to test_engine.py 2010-08-25 21:04:47 -03:00
Pablo Hoffman
59d18cf99e Fixed crawler reference 2010-08-25 19:59:51 -03:00
Pablo Hoffman
40b590cad3 Moved scrapy.cfg auto-discovery to scrapy.conf.EnvironmentSettings class 2010-08-25 19:59:30 -03:00
Pablo Hoffman
ef7a097272 Replaced old manager references with crawler 2010-08-25 19:31:04 -03:00
Pablo Hoffman
8fc78c4d0a Refactoring of Crawler, Commands, Execution Queue and Spider Manager:
Commands changes:

* removed (somewhat hacky) --init argument from settings command
* added set_crawler method to Commands, and a ``crawler`` property that returns
  a configured crawler. This way, commands that don't require a crawler (such
  as startproject) won't need to configure one.

Execution Queue changes:

* changed SERVICE_QUEUE_FILE setting to SQLITE_DB
* removed SERVICE_QUEUE setting
* added QUEUE_CLASS setting for defining the class to use for the execution queue
* added SERVER_QUEUE_CLASS setting for defining the class to use for the
  execution queue in server mode (runserver command)

Spider Manager changes:

* simplified SpiderManager API by removing the load() method
* added from_settings classmethod to SpiderManager
* added spider_modules constructor argument to SpiderManager

Crawler changes:

* added install() method to Crawler (to install it in scrapy.project) and
  uninstall() to remove it
* use CrawlerProcess.install() in scrapy.cmdline
* use crawler.install() and crawler.uninstall() in tests that a crawler in
  scrapy.project
* make telnet console and webservice play nicer with twisted by stopping
  listening when then engine goes down
* refactored Scrapy engine tests - it no longer uses the crawler singleton.
  Closes #215.
2010-08-25 19:24:36 -03:00
Pablo Hoffman
eb51b9f785 Removed obsolete setting 2010-08-25 06:41:19 -03:00
Pablo Hoffman
bea2f94357 Instantiate SpiderManager in Crawler constructor 2010-08-25 05:33:08 -03:00
Pablo Hoffman
8d24175c81 Added CrawlerProcess class, isolating all (Twisted) reactor-controlling code
into this class and leaving the Crawler class free of any reactor control.

This allows embedding the Scrapy crawler in other Twisted applications, or any
Twisted asynchronous code.

The "scrapy" command will use the new CrawlerProcess class, which resembles the
behaviour of the old (all-in-one) Crawler class.

Closes #214.

Also moved log.start() call from Crawler class to scrapy.cmdline module.
2010-08-25 05:24:02 -03:00
Pablo Hoffman
54024d1d3f Removed unneeded line 2010-08-25 05:06:45 -03:00
Pablo Hoffman
8e83f527b3 Removed scrapy-sqs script, as it has been superseded by the new scrapy 'queue' command 2010-08-23 22:04:49 -03:00
Pablo Hoffman
e2ed27e4fd Added documentation for Ubuntu packages. Refs #211 2010-08-23 21:28:32 -03:00
Pablo Hoffman
9ee92686c7 Moved spidermanager tests module according to policies
--HG--
rename : scrapy/tests/test_contrib_spidermanager/__init__.py => scrapy/tests/test_spidermanager/__init__.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/__init__.py => scrapy/tests/test_spidermanager/test_spiders/__init__.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider0.py => scrapy/tests/test_spidermanager/test_spiders/spider0.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider1.py => scrapy/tests/test_spidermanager/test_spiders/spider1.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider2.py => scrapy/tests/test_spidermanager/test_spiders/spider2.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider3.py => scrapy/tests/test_spidermanager/test_spiders/spider3.py
2010-08-23 00:25:03 -03:00
Pablo Hoffman
58b4cc2c32 Some minor fixes to contribution Contributing documentation 2010-08-23 00:23:14 -03:00
Pablo Hoffman
6585c1a28f removed (somewhat hacky) MAIL_DEBUG setting 2010-08-22 22:42:00 -03:00
Pablo Hoffman
e189861b46 Fixed Item Loader bug that was preventing values that evaluate to False from being loaded. Patch contributed by Anibal Pacheco. Closes #174 2010-08-22 22:07:44 -03:00