1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-28 09:23:51 +00:00

1341 Commits

Author SHA1 Message Date
Daniel Grana
a15dc94340 images: images uploaded trough amazon s3 special spider must be scheduled 2009-07-06 16:16:49 -03:00
Daniel Grana
2e52005847 rewrite RequestLimitMiddleware spidermw so it does not consume spider output at once
--HG--
rename : scrapy/contrib/spidermiddleware/limit.py => scrapy/contrib/spidermiddleware/requestlimit.py
2009-07-06 15:35:36 -03:00
Pablo Hoffman
31b3d7ce1e Added flow control mechanism to new Scraper component, to prevent cases where memory fills because of requests being downloaded much faster than they can be processed (by the spiders) 2009-07-06 15:31:50 -03:00
Daniel Grana
4f1d388733 Cleanup scrapyengine.crawl by moving functionality inside a new component named Scraper 2009-07-06 15:31:50 -03:00
Daniel Grana
3cb18dbbbb Move itempipeline functionality outside of engine as a spidermiddleware 2009-07-06 15:31:50 -03:00
pablo
2ce43ebbec made downloader/scheduler/spider middlewares code more consistent, added enabled/disabled/loaded informational attributes to all of them 2009-07-06 01:07:45 -03:00
Daniel Grana
f467c233b2 downloader: process queue inmediately after downloading the response 2009-07-03 01:32:24 -03:00
Pablo Hoffman
0c4c153819 improved Scrapy documentation index for better usability 2009-07-01 09:51:57 -03:00
Pablo Hoffman
af6db1691e added scrapy.log.logmessage_received signal 2009-06-26 12:27:03 -03:00
Pablo Hoffman
80cd534f92 removed redundant botname from log lines 2009-06-25 16:48:04 -03:00
Pablo Hoffman
18301b7e66 downloader: performance improvement for sites that use download delay (replace datetime by time) 2009-06-25 14:13:45 -03:00
Pablo Hoffman
7933e00ebd set more proper request priority for robots middleware and media pipeline 2009-06-25 12:10:55 -03:00
Pablo Hoffman
c22d2b1587 engine: added domain_is_open() method, added docstring for domain_is_closed() method 2009-06-25 09:56:38 -03:00
Pablo Hoffman
8de09fe4dd improved documentation of Downloader._download() method and fixed bug with process_queue() being called too early 2009-06-24 17:08:16 -03:00
Daniel Grana
830cd4f19f Restore download process queue processing after finish with recent transferred response 2009-06-24 13:45:50 -03:00
Pablo Hoffman
87df33ce0a s/_next_request_called/_next_request_pending/ 2009-06-24 10:36:36 -03:00
Pablo Hoffman
51029e37a3 engine: removed obsolete docstring and simplified next_request method 2009-06-24 10:34:44 -03:00
Daniel Grana
d7d18d27df avoid rescheduling next_request calls 2009-06-24 10:28:34 -03:00
Pablo Hoffman
2b65f20c26 engine: removed redundant line and unused import 2009-06-23 21:50:46 -03:00
Daniel Grana
7578ab00a2 Automated merge with ssh://hg.scrapy.org/scrapy 2009-06-23 16:47:58 -03:00
Daniel Grana
73b60788c1 log framework errors at the end of crawling 2009-06-23 16:47:32 -03:00
Pablo Hoffman
93fcf6e314 added web console docstring pointing to documentation, improved telnet console docstring 2009-06-23 16:11:23 -03:00
Pablo Hoffman
834ac9fca0 Some telnet console changes:
- added telnet console documentation
- added documentation for debugging memory leaks with guppy
- sorted out shell alises
- set default port (TELNETCONSOLE_PORT) to 6023
2009-06-23 16:08:58 -03:00
Daniel Grana
32bae8040d add basic mustbe_deferred tests 2009-06-23 14:59:03 -03:00
daniel
9b78f929de remove obsolete deferred_imap util, use coiterate+imap instead 2009-06-23 14:45:16 -07:00
dgrana
e9c724e5b7 no need for two callbacks while processing scraping responses 2009-06-23 13:26:24 -07:00
dgrana
6809375815 restore call to next_request inside pipeline output processor 2009-06-23 13:19:35 -07:00
dgrana
21ed9a24a3 fix replace of deferred_imap by coiterate+imap and fix broken engine test 2009-06-23 13:14:22 -07:00
dgrana
49b413ba40 merge 2009-06-23 13:00:01 -07:00
dgrana
38f184e42a remove calls to chain_deferred and deferred_imap 2009-06-23 12:53:13 -07:00
Pablo Hoffman
a8fd107ad4 engine: some extra simplifications and removed debug mode 2009-06-22 23:01:41 -03:00
Pablo Hoffman
a1cca0da50 removed obsolete file 2009-06-22 22:55:18 -03:00
Pablo Hoffman
a8a44aa035 engine: domains are now polled and closed when they're idle, instead of being notified by the downloader 2009-06-22 21:28:35 -03:00
Pablo Hoffman
5271d1f185 renamed engine.resume() method to engine.unpause() 2009-06-22 20:01:29 -03:00
Pablo Hoffman
b3a624ed6b engine: simplified next_request and removed 'domain in self.closing' check 2009-06-22 19:59:36 -03:00
Pablo Hoffman
ab0950b67e added exception reporting to global_tests in engine.get_status() 2009-06-22 18:40:17 -03:00
Pablo Hoffman
b037ef6a27 added clear_pending_requests to scheduler 2009-06-22 18:37:39 -03:00
Pablo Hoffman
fb8e24acb5 more downloader cleanup and fixed bug which was preventing domains to get properly closed 2009-06-22 16:25:44 -03:00
Pablo Hoffman
e8543ca1f9 minor clean up to engine domain closing 2009-06-22 15:05:26 -03:00
Daniel Grana
b488e3b4a9 catch downloader process_queue exceptions 2009-06-22 14:24:04 -03:00
Daniel Grana
e6bf2821ef remove request from transferring state prior to returning downloaded response 2009-06-22 14:06:15 -03:00
Pablo Hoffman
400a54bf7c Added reasons when closing domains ('reason' argument to engine close_domain method), replaced old 'status' parameter by new 'reason' parameter in domain_closed signal. updated and improved signals doc 2009-06-21 22:00:16 -03:00
Pablo Hoffman
ab2dd764e0 downloader: some improvements to instantiation of SiteInfo (ex. SiteDetails) objects 2009-06-21 16:27:48 -03:00
Pablo Hoffman
7bf610d0b7 additional simplifications to downloader (several methods removed) and added more info to engine getstatus() method 2009-06-21 16:06:36 -03:00
Pablo Hoffman
fda1fe0eb8 decreased enabled extension/middlewares/pipelines log messages level to DEBUG 2009-06-21 15:38:26 -03:00
Pablo Hoffman
89712a4adf downloader: renamed SiteDetails.downloading to SiteDetails.transferring, for clarity 2009-06-21 14:23:51 -03:00
Pablo Hoffman
abe4f812e9 downloader: added site.closed additional check to domain already closed 2009-06-21 14:16:40 -03:00
Daniel Grana
5944f6590a restore downloader enqueing after middleware 2009-06-21 03:03:14 -03:00
Daniel Grana
cd1ad337b1 Downloader cleanup
* remove debug messages
* move deactivating of downloads to last callback
* simplify calling of download_any function
* raises RuntimeError while openinng/closing twice
2009-06-21 02:54:31 -03:00
Daniel Grana
31d5d519fc remove obsolete lambda_deferred function 2009-06-21 01:37:58 -03:00