1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 00:59:05 +00:00

2525 Commits

Author SHA1 Message Date
Pablo Hoffman
ff9de424c8 Added SpiderQueue tests. SQS spider queue not tested because operations take too long to complete and it's not easy to know when they have. Closes #227 2010-09-06 09:47:45 -03:00
Daniel Grana
8d1d3493e7 Added a weak key factory based cache
--HG--
extra : rebase_source : 2bc7cb5fdb0fd3adb63cf7fe3aedd2f1d15e49f0
2010-09-06 00:50:56 -03:00
Pablo Hoffman
00d55fbbd1 Updated 'Scrapy at a glance' document replacing item pipeline example by a simpler usage of feed exports 2010-09-05 23:38:37 -03:00
Pablo Hoffman
5ffc7650bd Removed code no longer needed 2010-09-05 20:08:59 -03:00
Pablo Hoffman
766f2d910d Renamed Request Handlers to Download Handlers 2010-09-05 19:35:53 -03:00
Pablo Hoffman
067ec65d97 Removed download_any singleton 2010-09-05 19:09:42 -03:00
Pablo Hoffman
a5cf71cb06 Updated Ubuntu package signing key location 2010-09-05 19:04:15 -03:00
Pablo Hoffman
6bf52fb50e Make telnet console and web service try a range of ports for binding, instead of just one. Closes #226 2010-09-05 06:48:08 -03:00
Pablo Hoffman
ce884192c9 Fixed test broken by previous commit 2010-09-05 06:05:34 -03:00
Pablo Hoffman
630db4fecf Simplified file:// download handler, adding support for reading binary files 2010-09-05 05:59:40 -03:00
Pablo Hoffman
14e985b076 Updated Command line tool documentation 2010-09-05 05:29:58 -03:00
Pablo Hoffman
1190f97944 Updated settings documentation 2010-09-05 04:58:14 -03:00
Pablo Hoffman
ebdb733e95 Updated some old messages in Scrapy shell doc 2010-09-05 04:45:43 -03:00
Pablo Hoffman
2f12618890 Post reference to Scrapyd in FAQ 2010-09-05 04:35:27 -03:00
Pablo Hoffman
a66bef7925 Make execution queue poll interval configurable through a new QUEUE_POLL_INTERVAL setting 2010-09-05 02:23:08 -03:00
Pablo Hoffman
b800fdcb4d SqliteSpiderQueue: failback to in-memory SQLite if database cannot be opened (typically due to missing write permissions) 2010-09-04 03:49:46 -03:00
Pablo Hoffman
bf34094e5a Added versionadded:: notice to new documentation topics 2010-09-04 03:30:45 -03:00
Pablo Hoffman
e3921ab016 Don't set allowed_domains attribute in BaseSpider constructor 2010-09-04 03:20:05 -03:00
Daniel Grana
9f4b1e47a4 damn, really fix httpcache docs 2010-09-04 03:26:41 -03:00
Daniel Grana
7ad901640b fix httpcache docs 2010-09-04 03:23:08 -03:00
Daniel Grana
1abaa79469 Make ignored schemes configurable in HttpCacheMiddleware. closes #224
--HG--
extra : rebase_source : 2e6e8b93c642290f9bd6eb634eb4c8cd6da07c75
2010-09-04 02:58:43 -03:00
Pablo Hoffman
5a6284ceb3 Added TODO: 2010-09-04 02:56:50 -03:00
Daniel Grana
58feb15528 httpcache must restore responses using response.url instead of request.url
--HG--
extra : rebase_source : 08fa2c3862bb35db2234e0f9bb9cb9ce4a8f4d8d
2010-09-04 02:53:09 -03:00
Pablo Hoffman
7b9fa7fbaa Don't filter out requests coming from spiders that don't define allowed_domains. Closes #225 2010-09-04 02:23:04 -03:00
Daniel Grana
1b11563383 monkeypatch urlparse if s3 netloc parsing fails (python issue7904). closes #223 2010-09-04 01:02:52 -03:00
Pablo Hoffman
9cfa8edd14 Automatic merge 2010-09-03 17:46:36 -03:00
Daniel Grana
9b68c3c1b1 Add S3 scheme request handler. closes #222 2010-09-03 16:19:47 -03:00
Daniel Grana
30d94b5bf5 Convert request handlers to classes and support NotConfigured. closes #221 2010-09-03 16:18:46 -03:00
Pablo Hoffman
37e9c5d78e Added new Scrapy service with support for:
* multiple projects
* uploading scrapy projects as Python eggs
* scheduling spiders using a JSON API

Documentation is added along with the code.

Closes #218.

--HG--
rename : debian/scrapy-service.default => debian/scrapyd.default
rename : debian/scrapy-service.dirs => debian/scrapyd.dirs
rename : debian/scrapy-service.install => debian/scrapyd.install
rename : debian/scrapy-service.lintian-overrides => debian/scrapyd.lintian-overrides
rename : debian/scrapy-service.postinst => debian/scrapyd.postinst
rename : debian/scrapy-service.postrm => debian/scrapyd.postrm
rename : debian/scrapy-service.upstart => debian/scrapyd.upstart
rename : extras/scrapy.tac => extras/scrapyd.tac
2010-09-03 15:54:42 -03:00
Pablo Hoffman
1b766877f1 Added ISpiderManager interface and a test to verify the default SpiderManager comforms to it 2010-09-03 14:29:27 -03:00
Pablo Hoffman
7cfc379230 Execution Queue refactoring by taking out the queue backend to a new Spider
Queue API. Also ported SQS Execution Queue to Spider Queue API, and make the
scrapy queue command use the Spider Queue directly, with deferreds support.

Closes #220.
2010-09-03 14:29:27 -03:00
Pablo Hoffman
37776618a3 Changed format of scrapy.cfg file to contain a [settings] section and a 'default' key inside it, instead of the other way around 2010-09-02 21:22:17 -03:00
Pablo Hoffman
fb69655a9f Removed unused imports 2010-08-31 21:40:05 -03:00
Pablo Hoffman
758d21b2f9 Simplified images pipeline by allowing it to be used without having to override it in your project. Closes #217 2010-08-31 16:03:08 -03:00
Pablo Hoffman
59f09c50e4 Yet another scrapy.cmdline code refactoring by removing --settings and --version options, adding a version command and adding a UsageError exception for signaling usage errors. Updated all commands accordingly 2010-08-28 18:06:51 -03:00
Pablo Hoffman
7394aa926d Fixed typo 2010-08-28 14:47:19 -03:00
Pablo Hoffman
616ecc5a1a Support passing spider arguments in crawl command with -a option. Closes #216 2010-08-28 14:43:28 -03:00
Pablo Hoffman
d4941a0619 Minor change to --pidfile argument 2010-08-28 14:07:03 -03:00
Pablo Hoffman
0ae4c3aa82 call Spider.closed() method (if it exists) on SpiderManager.close_spider() 2010-08-27 18:36:00 -03:00
Pablo Hoffman
35fd1a2660 Fixed typo 2010-08-27 17:21:30 -03:00
Pablo Hoffman
3234d76b8d Restored SpiderManager.close_spider() method but using signals instead of calling it from the engine 2010-08-27 16:19:51 -03:00
Pablo Hoffman
783887457a Moved tests to reflect new module location
--HG--
rename : scrapy/tests/test_core_queue.py => scrapy/tests/test_queue.py
2010-08-27 15:50:12 -03:00
Pablo Hoffman
88c99e0b83 Check that arguments and keyword arguments are not passed simultaneously in jsonrpc_client_call() 2010-08-27 13:45:52 -03:00
Pablo Hoffman
ffad8e08e7 Support passing all keyword arguments to ExecutionQueue append_spider_name and append_url 2010-08-27 13:45:14 -03:00
Pablo Hoffman
4eb0383dd2 Added Scrapy to scrapy --version 2010-08-27 11:15:37 -03:00
Pablo Hoffman
e7b3247a18 Updated some missing references to scrapy-ws script 2010-08-27 01:05:59 -03:00
Pablo Hoffman
aab38be498 Print Scrapy on first log line 2010-08-27 00:53:26 -03:00
Pablo Hoffman
e14cc2c12a Moved scrapy-ws script to extras/ and fixed broken methods due to changes in web service API
--HG--
rename : bin/scrapy-ws.py => extras/scrapy-ws.py
2010-08-27 00:33:08 -03:00
Pablo Hoffman
648f700ed1 Fixed log formatter tests
--HG--
rename : scrapy/tests/test_contrib_logformatter.py => scrapy/tests/test_logformatter.py
2010-08-26 23:23:58 -03:00
Pablo Hoffman
ad18d4a70e Added pluggable log formatter 2010-08-26 23:19:35 -03:00