1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-28 17:03:13 +00:00

2616 Commits

Author SHA1 Message Date
Pablo Hoffman
630db4fecf Simplified file:// download handler, adding support for reading binary files 2010-09-05 05:59:40 -03:00
Pablo Hoffman
14e985b076 Updated Command line tool documentation 2010-09-05 05:29:58 -03:00
Pablo Hoffman
1190f97944 Updated settings documentation 2010-09-05 04:58:14 -03:00
Pablo Hoffman
ebdb733e95 Updated some old messages in Scrapy shell doc 2010-09-05 04:45:43 -03:00
Pablo Hoffman
2f12618890 Post reference to Scrapyd in FAQ 2010-09-05 04:35:27 -03:00
Pablo Hoffman
a66bef7925 Make execution queue poll interval configurable through a new QUEUE_POLL_INTERVAL setting 2010-09-05 02:23:08 -03:00
Pablo Hoffman
b800fdcb4d SqliteSpiderQueue: failback to in-memory SQLite if database cannot be opened (typically due to missing write permissions) 2010-09-04 03:49:46 -03:00
Pablo Hoffman
bf34094e5a Added versionadded:: notice to new documentation topics 2010-09-04 03:30:45 -03:00
Pablo Hoffman
e3921ab016 Don't set allowed_domains attribute in BaseSpider constructor 2010-09-04 03:20:05 -03:00
Daniel Grana
9f4b1e47a4 damn, really fix httpcache docs 2010-09-04 03:26:41 -03:00
Daniel Grana
7ad901640b fix httpcache docs 2010-09-04 03:23:08 -03:00
Daniel Grana
1abaa79469 Make ignored schemes configurable in HttpCacheMiddleware. closes #224
--HG--
extra : rebase_source : 2e6e8b93c642290f9bd6eb634eb4c8cd6da07c75
2010-09-04 02:58:43 -03:00
Pablo Hoffman
5a6284ceb3 Added TODO: 2010-09-04 02:56:50 -03:00
Daniel Grana
58feb15528 httpcache must restore responses using response.url instead of request.url
--HG--
extra : rebase_source : 08fa2c3862bb35db2234e0f9bb9cb9ce4a8f4d8d
2010-09-04 02:53:09 -03:00
Pablo Hoffman
7b9fa7fbaa Don't filter out requests coming from spiders that don't define allowed_domains. Closes #225 2010-09-04 02:23:04 -03:00
Daniel Grana
1b11563383 monkeypatch urlparse if s3 netloc parsing fails (python issue7904). closes #223 2010-09-04 01:02:52 -03:00
Pablo Hoffman
9cfa8edd14 Automatic merge 2010-09-03 17:46:36 -03:00
Daniel Grana
9b68c3c1b1 Add S3 scheme request handler. closes #222 2010-09-03 16:19:47 -03:00
Daniel Grana
30d94b5bf5 Convert request handlers to classes and support NotConfigured. closes #221 2010-09-03 16:18:46 -03:00
Pablo Hoffman
37e9c5d78e Added new Scrapy service with support for:
* multiple projects
* uploading scrapy projects as Python eggs
* scheduling spiders using a JSON API

Documentation is added along with the code.

Closes #218.

--HG--
rename : debian/scrapy-service.default => debian/scrapyd.default
rename : debian/scrapy-service.dirs => debian/scrapyd.dirs
rename : debian/scrapy-service.install => debian/scrapyd.install
rename : debian/scrapy-service.lintian-overrides => debian/scrapyd.lintian-overrides
rename : debian/scrapy-service.postinst => debian/scrapyd.postinst
rename : debian/scrapy-service.postrm => debian/scrapyd.postrm
rename : debian/scrapy-service.upstart => debian/scrapyd.upstart
rename : extras/scrapy.tac => extras/scrapyd.tac
2010-09-03 15:54:42 -03:00
Pablo Hoffman
1b766877f1 Added ISpiderManager interface and a test to verify the default SpiderManager comforms to it 2010-09-03 14:29:27 -03:00
Pablo Hoffman
7cfc379230 Execution Queue refactoring by taking out the queue backend to a new Spider
Queue API. Also ported SQS Execution Queue to Spider Queue API, and make the
scrapy queue command use the Spider Queue directly, with deferreds support.

Closes #220.
2010-09-03 14:29:27 -03:00
Pablo Hoffman
37776618a3 Changed format of scrapy.cfg file to contain a [settings] section and a 'default' key inside it, instead of the other way around 2010-09-02 21:22:17 -03:00
Pablo Hoffman
fb69655a9f Removed unused imports 2010-08-31 21:40:05 -03:00
Pablo Hoffman
758d21b2f9 Simplified images pipeline by allowing it to be used without having to override it in your project. Closes #217 2010-08-31 16:03:08 -03:00
Pablo Hoffman
59f09c50e4 Yet another scrapy.cmdline code refactoring by removing --settings and --version options, adding a version command and adding a UsageError exception for signaling usage errors. Updated all commands accordingly 2010-08-28 18:06:51 -03:00
Pablo Hoffman
7394aa926d Fixed typo 2010-08-28 14:47:19 -03:00
Pablo Hoffman
616ecc5a1a Support passing spider arguments in crawl command with -a option. Closes #216 2010-08-28 14:43:28 -03:00
Pablo Hoffman
d4941a0619 Minor change to --pidfile argument 2010-08-28 14:07:03 -03:00
Pablo Hoffman
0ae4c3aa82 call Spider.closed() method (if it exists) on SpiderManager.close_spider() 2010-08-27 18:36:00 -03:00
Pablo Hoffman
35fd1a2660 Fixed typo 2010-08-27 17:21:30 -03:00
Pablo Hoffman
3234d76b8d Restored SpiderManager.close_spider() method but using signals instead of calling it from the engine 2010-08-27 16:19:51 -03:00
Pablo Hoffman
783887457a Moved tests to reflect new module location
--HG--
rename : scrapy/tests/test_core_queue.py => scrapy/tests/test_queue.py
2010-08-27 15:50:12 -03:00
Pablo Hoffman
88c99e0b83 Check that arguments and keyword arguments are not passed simultaneously in jsonrpc_client_call() 2010-08-27 13:45:52 -03:00
Pablo Hoffman
ffad8e08e7 Support passing all keyword arguments to ExecutionQueue append_spider_name and append_url 2010-08-27 13:45:14 -03:00
Pablo Hoffman
4eb0383dd2 Added Scrapy to scrapy --version 2010-08-27 11:15:37 -03:00
Pablo Hoffman
e7b3247a18 Updated some missing references to scrapy-ws script 2010-08-27 01:05:59 -03:00
Pablo Hoffman
aab38be498 Print Scrapy on first log line 2010-08-27 00:53:26 -03:00
Pablo Hoffman
e14cc2c12a Moved scrapy-ws script to extras/ and fixed broken methods due to changes in web service API
--HG--
rename : bin/scrapy-ws.py => extras/scrapy-ws.py
2010-08-27 00:33:08 -03:00
Pablo Hoffman
648f700ed1 Fixed log formatter tests
--HG--
rename : scrapy/tests/test_contrib_logformatter.py => scrapy/tests/test_logformatter.py
2010-08-26 23:23:58 -03:00
Pablo Hoffman
ad18d4a70e Added pluggable log formatter 2010-08-26 23:19:35 -03:00
Pablo Hoffman
d1e260a8d4 Simplified engine by removing the configure() and kill() methods. Also simplified the Spider Manager by removing the close_spider() method 2010-08-26 22:20:04 -03:00
Pablo Hoffman
f6c11af4c2 Moved module: scrapy.core.queue to scrapy.queue
--HG--
rename : scrapy/core/queue.py => scrapy/queue.py
2010-08-26 21:15:32 -03:00
Pablo Hoffman
747f090f94 Improved Twisted version detection (wasn't working for Twisted 10.0.0) 2010-08-26 20:32:26 -03:00
Pablo Hoffman
e95f7f63f9 Fixed typo 2010-08-25 21:06:10 -03:00
Pablo Hoffman
a82a4be3ab Added docstring to test_engine.py 2010-08-25 21:04:47 -03:00
Pablo Hoffman
59d18cf99e Fixed crawler reference 2010-08-25 19:59:51 -03:00
Pablo Hoffman
40b590cad3 Moved scrapy.cfg auto-discovery to scrapy.conf.EnvironmentSettings class 2010-08-25 19:59:30 -03:00
Pablo Hoffman
ef7a097272 Replaced old manager references with crawler 2010-08-25 19:31:04 -03:00
Pablo Hoffman
8fc78c4d0a Refactoring of Crawler, Commands, Execution Queue and Spider Manager:
Commands changes:

* removed (somewhat hacky) --init argument from settings command
* added set_crawler method to Commands, and a ``crawler`` property that returns
  a configured crawler. This way, commands that don't require a crawler (such
  as startproject) won't need to configure one.

Execution Queue changes:

* changed SERVICE_QUEUE_FILE setting to SQLITE_DB
* removed SERVICE_QUEUE setting
* added QUEUE_CLASS setting for defining the class to use for the execution queue
* added SERVER_QUEUE_CLASS setting for defining the class to use for the
  execution queue in server mode (runserver command)

Spider Manager changes:

* simplified SpiderManager API by removing the load() method
* added from_settings classmethod to SpiderManager
* added spider_modules constructor argument to SpiderManager

Crawler changes:

* added install() method to Crawler (to install it in scrapy.project) and
  uninstall() to remove it
* use CrawlerProcess.install() in scrapy.cmdline
* use crawler.install() and crawler.uninstall() in tests that a crawler in
  scrapy.project
* make telnet console and webservice play nicer with twisted by stopping
  listening when then engine goes down
* refactored Scrapy engine tests - it no longer uses the crawler singleton.
  Closes #215.
2010-08-25 19:24:36 -03:00