1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 17:04:03 +00:00

2532 Commits

Author SHA1 Message Date
Pablo Hoffman
9158e9d682 Some changes to Scrapyd to support multiple configuration files, to make it easier to deploy Scrapyd applications. Also documented 'egg_runner' and 'application' options
--HG--
rename : debian/scrapyd.cfg => debian/000-default
rename : scrapyd/default_scrapyd.cfg => scrapyd/default_scrapyd.conf
2010-09-07 09:17:25 -03:00
Daniel Grana
3414bf13ee remove request_uploaded signal and move response_received and response_downloaded to downloader manager. closes #228
--HG--
extra : rebase_source : 4af0d2a01b34de8a21048bb7f4a66bfc484b3b8f
2010-09-06 23:23:14 -03:00
Pablo Hoffman
3c5ab10688 Added FAQ entry about __VIEWSTATE parameter 2010-09-06 13:17:08 -03:00
Pablo Hoffman
3a72e5c051 Removed settings.disabled hack used in some tests. Closes #143 2010-09-06 11:04:27 -03:00
Pablo Hoffman
5f58af2005 Simplified SpiderMiddlewareManager by making it inherit from MiddlewareManager 2010-09-06 10:40:33 -03:00
Pablo Hoffman
cc72f03e10 Added IFeedStorage interface and test all Feed Storages conform to it. Also added test for StdoutFeedStorage 2010-09-06 10:22:28 -03:00
Pablo Hoffman
e3d67d74f7 docs/intro/overview.rst: add example of scraped data and introduce loaders 2010-09-06 10:04:00 -03:00
Pablo Hoffman
ff9de424c8 Added SpiderQueue tests. SQS spider queue not tested because operations take too long to complete and it's not easy to know when they have. Closes #227 2010-09-06 09:47:45 -03:00
Daniel Grana
8d1d3493e7 Added a weak key factory based cache
--HG--
extra : rebase_source : 2bc7cb5fdb0fd3adb63cf7fe3aedd2f1d15e49f0
2010-09-06 00:50:56 -03:00
Pablo Hoffman
00d55fbbd1 Updated 'Scrapy at a glance' document replacing item pipeline example by a simpler usage of feed exports 2010-09-05 23:38:37 -03:00
Pablo Hoffman
5ffc7650bd Removed code no longer needed 2010-09-05 20:08:59 -03:00
Pablo Hoffman
766f2d910d Renamed Request Handlers to Download Handlers 2010-09-05 19:35:53 -03:00
Pablo Hoffman
067ec65d97 Removed download_any singleton 2010-09-05 19:09:42 -03:00
Pablo Hoffman
a5cf71cb06 Updated Ubuntu package signing key location 2010-09-05 19:04:15 -03:00
Pablo Hoffman
6bf52fb50e Make telnet console and web service try a range of ports for binding, instead of just one. Closes #226 2010-09-05 06:48:08 -03:00
Pablo Hoffman
ce884192c9 Fixed test broken by previous commit 2010-09-05 06:05:34 -03:00
Pablo Hoffman
630db4fecf Simplified file:// download handler, adding support for reading binary files 2010-09-05 05:59:40 -03:00
Pablo Hoffman
14e985b076 Updated Command line tool documentation 2010-09-05 05:29:58 -03:00
Pablo Hoffman
1190f97944 Updated settings documentation 2010-09-05 04:58:14 -03:00
Pablo Hoffman
ebdb733e95 Updated some old messages in Scrapy shell doc 2010-09-05 04:45:43 -03:00
Pablo Hoffman
2f12618890 Post reference to Scrapyd in FAQ 2010-09-05 04:35:27 -03:00
Pablo Hoffman
a66bef7925 Make execution queue poll interval configurable through a new QUEUE_POLL_INTERVAL setting 2010-09-05 02:23:08 -03:00
Pablo Hoffman
b800fdcb4d SqliteSpiderQueue: failback to in-memory SQLite if database cannot be opened (typically due to missing write permissions) 2010-09-04 03:49:46 -03:00
Pablo Hoffman
bf34094e5a Added versionadded:: notice to new documentation topics 2010-09-04 03:30:45 -03:00
Pablo Hoffman
e3921ab016 Don't set allowed_domains attribute in BaseSpider constructor 2010-09-04 03:20:05 -03:00
Daniel Grana
9f4b1e47a4 damn, really fix httpcache docs 2010-09-04 03:26:41 -03:00
Daniel Grana
7ad901640b fix httpcache docs 2010-09-04 03:23:08 -03:00
Daniel Grana
1abaa79469 Make ignored schemes configurable in HttpCacheMiddleware. closes #224
--HG--
extra : rebase_source : 2e6e8b93c642290f9bd6eb634eb4c8cd6da07c75
2010-09-04 02:58:43 -03:00
Pablo Hoffman
5a6284ceb3 Added TODO: 2010-09-04 02:56:50 -03:00
Daniel Grana
58feb15528 httpcache must restore responses using response.url instead of request.url
--HG--
extra : rebase_source : 08fa2c3862bb35db2234e0f9bb9cb9ce4a8f4d8d
2010-09-04 02:53:09 -03:00
Pablo Hoffman
7b9fa7fbaa Don't filter out requests coming from spiders that don't define allowed_domains. Closes #225 2010-09-04 02:23:04 -03:00
Daniel Grana
1b11563383 monkeypatch urlparse if s3 netloc parsing fails (python issue7904). closes #223 2010-09-04 01:02:52 -03:00
Pablo Hoffman
9cfa8edd14 Automatic merge 2010-09-03 17:46:36 -03:00
Daniel Grana
9b68c3c1b1 Add S3 scheme request handler. closes #222 2010-09-03 16:19:47 -03:00
Daniel Grana
30d94b5bf5 Convert request handlers to classes and support NotConfigured. closes #221 2010-09-03 16:18:46 -03:00
Pablo Hoffman
37e9c5d78e Added new Scrapy service with support for:
* multiple projects
* uploading scrapy projects as Python eggs
* scheduling spiders using a JSON API

Documentation is added along with the code.

Closes #218.

--HG--
rename : debian/scrapy-service.default => debian/scrapyd.default
rename : debian/scrapy-service.dirs => debian/scrapyd.dirs
rename : debian/scrapy-service.install => debian/scrapyd.install
rename : debian/scrapy-service.lintian-overrides => debian/scrapyd.lintian-overrides
rename : debian/scrapy-service.postinst => debian/scrapyd.postinst
rename : debian/scrapy-service.postrm => debian/scrapyd.postrm
rename : debian/scrapy-service.upstart => debian/scrapyd.upstart
rename : extras/scrapy.tac => extras/scrapyd.tac
2010-09-03 15:54:42 -03:00
Pablo Hoffman
1b766877f1 Added ISpiderManager interface and a test to verify the default SpiderManager comforms to it 2010-09-03 14:29:27 -03:00
Pablo Hoffman
7cfc379230 Execution Queue refactoring by taking out the queue backend to a new Spider
Queue API. Also ported SQS Execution Queue to Spider Queue API, and make the
scrapy queue command use the Spider Queue directly, with deferreds support.

Closes #220.
2010-09-03 14:29:27 -03:00
Pablo Hoffman
37776618a3 Changed format of scrapy.cfg file to contain a [settings] section and a 'default' key inside it, instead of the other way around 2010-09-02 21:22:17 -03:00
Pablo Hoffman
fb69655a9f Removed unused imports 2010-08-31 21:40:05 -03:00
Pablo Hoffman
758d21b2f9 Simplified images pipeline by allowing it to be used without having to override it in your project. Closes #217 2010-08-31 16:03:08 -03:00
Pablo Hoffman
59f09c50e4 Yet another scrapy.cmdline code refactoring by removing --settings and --version options, adding a version command and adding a UsageError exception for signaling usage errors. Updated all commands accordingly 2010-08-28 18:06:51 -03:00
Pablo Hoffman
7394aa926d Fixed typo 2010-08-28 14:47:19 -03:00
Pablo Hoffman
616ecc5a1a Support passing spider arguments in crawl command with -a option. Closes #216 2010-08-28 14:43:28 -03:00
Pablo Hoffman
d4941a0619 Minor change to --pidfile argument 2010-08-28 14:07:03 -03:00
Pablo Hoffman
0ae4c3aa82 call Spider.closed() method (if it exists) on SpiderManager.close_spider() 2010-08-27 18:36:00 -03:00
Pablo Hoffman
35fd1a2660 Fixed typo 2010-08-27 17:21:30 -03:00
Pablo Hoffman
3234d76b8d Restored SpiderManager.close_spider() method but using signals instead of calling it from the engine 2010-08-27 16:19:51 -03:00
Pablo Hoffman
783887457a Moved tests to reflect new module location
--HG--
rename : scrapy/tests/test_core_queue.py => scrapy/tests/test_queue.py
2010-08-27 15:50:12 -03:00
Pablo Hoffman
88c99e0b83 Check that arguments and keyword arguments are not passed simultaneously in jsonrpc_client_call() 2010-08-27 13:45:52 -03:00