1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 15:03:56 +00:00

1250 Commits

Author SHA1 Message Date
Daniel Grana
a8de5cef6e remove xlib hack that appends scrapy/xlib to sys.path 2009-07-09 14:37:59 -03:00
Pablo Hoffman
8b26e49636 Added new ItemProcessor component to Scraper component 2009-07-08 23:48:06 -03:00
Pablo Hoffman
42b86a385f removed wtf line 2009-07-08 18:19:54 -03:00
pablo
5cbafaea7f StackTraceDump extension: using USR2 signal to avoid collision with other stuff that uses USR1 (such as twistd log rotation) 2009-07-08 09:19:35 -03:00
Daniel Grana
b83851dcc3 remove unused lines from shell command 2009-07-07 16:24:59 -03:00
Daniel Grana
8e5ede7179 shell command was broken by recent commits because scrapyengine.crawl does not returns a deferred anymore, now we use scrapyengine.schedule that returns the deferred of the download response 2009-07-07 16:22:23 -03:00
damian
1ba98606c2 test.test_utils_url: update parameter name; utils.url: minor code clean up 2009-07-07 12:35:24 -03:00
damian
460f690c5c utils.url: add_or_replace_parameter function fixed, quoted urls support and test cases added 2009-07-07 11:20:26 -03:00
pablo
c205f7d8e5 added missing comment for non-trivial code 2009-07-06 20:38:39 -03:00
Daniel Grana
a15dc94340 images: images uploaded trough amazon s3 special spider must be scheduled 2009-07-06 16:16:49 -03:00
Daniel Grana
2e52005847 rewrite RequestLimitMiddleware spidermw so it does not consume spider output at once
--HG--
rename : scrapy/contrib/spidermiddleware/limit.py => scrapy/contrib/spidermiddleware/requestlimit.py
2009-07-06 15:35:36 -03:00
Pablo Hoffman
31b3d7ce1e Added flow control mechanism to new Scraper component, to prevent cases where memory fills because of requests being downloaded much faster than they can be processed (by the spiders) 2009-07-06 15:31:50 -03:00
Daniel Grana
4f1d388733 Cleanup scrapyengine.crawl by moving functionality inside a new component named Scraper 2009-07-06 15:31:50 -03:00
Daniel Grana
3cb18dbbbb Move itempipeline functionality outside of engine as a spidermiddleware 2009-07-06 15:31:50 -03:00
pablo
2ce43ebbec made downloader/scheduler/spider middlewares code more consistent, added enabled/disabled/loaded informational attributes to all of them 2009-07-06 01:07:45 -03:00
Daniel Grana
f467c233b2 downloader: process queue inmediately after downloading the response 2009-07-03 01:32:24 -03:00
Pablo Hoffman
0c4c153819 improved Scrapy documentation index for better usability 2009-07-01 09:51:57 -03:00
Pablo Hoffman
af6db1691e added scrapy.log.logmessage_received signal 2009-06-26 12:27:03 -03:00
Pablo Hoffman
80cd534f92 removed redundant botname from log lines 2009-06-25 16:48:04 -03:00
Pablo Hoffman
18301b7e66 downloader: performance improvement for sites that use download delay (replace datetime by time) 2009-06-25 14:13:45 -03:00
Pablo Hoffman
7933e00ebd set more proper request priority for robots middleware and media pipeline 2009-06-25 12:10:55 -03:00
Pablo Hoffman
c22d2b1587 engine: added domain_is_open() method, added docstring for domain_is_closed() method 2009-06-25 09:56:38 -03:00
Pablo Hoffman
8de09fe4dd improved documentation of Downloader._download() method and fixed bug with process_queue() being called too early 2009-06-24 17:08:16 -03:00
Daniel Grana
830cd4f19f Restore download process queue processing after finish with recent transferred response 2009-06-24 13:45:50 -03:00
Pablo Hoffman
87df33ce0a s/_next_request_called/_next_request_pending/ 2009-06-24 10:36:36 -03:00
Pablo Hoffman
51029e37a3 engine: removed obsolete docstring and simplified next_request method 2009-06-24 10:34:44 -03:00
Daniel Grana
d7d18d27df avoid rescheduling next_request calls 2009-06-24 10:28:34 -03:00
Pablo Hoffman
2b65f20c26 engine: removed redundant line and unused import 2009-06-23 21:50:46 -03:00
Daniel Grana
7578ab00a2 Automated merge with ssh://hg.scrapy.org/scrapy 2009-06-23 16:47:58 -03:00
Daniel Grana
73b60788c1 log framework errors at the end of crawling 2009-06-23 16:47:32 -03:00
Pablo Hoffman
93fcf6e314 added web console docstring pointing to documentation, improved telnet console docstring 2009-06-23 16:11:23 -03:00
Pablo Hoffman
834ac9fca0 Some telnet console changes:
- added telnet console documentation
- added documentation for debugging memory leaks with guppy
- sorted out shell alises
- set default port (TELNETCONSOLE_PORT) to 6023
2009-06-23 16:08:58 -03:00
Daniel Grana
32bae8040d add basic mustbe_deferred tests 2009-06-23 14:59:03 -03:00
daniel
9b78f929de remove obsolete deferred_imap util, use coiterate+imap instead 2009-06-23 14:45:16 -07:00
dgrana
e9c724e5b7 no need for two callbacks while processing scraping responses 2009-06-23 13:26:24 -07:00
dgrana
6809375815 restore call to next_request inside pipeline output processor 2009-06-23 13:19:35 -07:00
dgrana
21ed9a24a3 fix replace of deferred_imap by coiterate+imap and fix broken engine test 2009-06-23 13:14:22 -07:00
dgrana
49b413ba40 merge 2009-06-23 13:00:01 -07:00
dgrana
38f184e42a remove calls to chain_deferred and deferred_imap 2009-06-23 12:53:13 -07:00
Pablo Hoffman
a8fd107ad4 engine: some extra simplifications and removed debug mode 2009-06-22 23:01:41 -03:00
Pablo Hoffman
a1cca0da50 removed obsolete file 2009-06-22 22:55:18 -03:00
Pablo Hoffman
a8a44aa035 engine: domains are now polled and closed when they're idle, instead of being notified by the downloader 2009-06-22 21:28:35 -03:00
Pablo Hoffman
5271d1f185 renamed engine.resume() method to engine.unpause() 2009-06-22 20:01:29 -03:00
Pablo Hoffman
b3a624ed6b engine: simplified next_request and removed 'domain in self.closing' check 2009-06-22 19:59:36 -03:00
Pablo Hoffman
ab0950b67e added exception reporting to global_tests in engine.get_status() 2009-06-22 18:40:17 -03:00
Pablo Hoffman
b037ef6a27 added clear_pending_requests to scheduler 2009-06-22 18:37:39 -03:00
Pablo Hoffman
fb8e24acb5 more downloader cleanup and fixed bug which was preventing domains to get properly closed 2009-06-22 16:25:44 -03:00
Pablo Hoffman
e8543ca1f9 minor clean up to engine domain closing 2009-06-22 15:05:26 -03:00
Daniel Grana
b488e3b4a9 catch downloader process_queue exceptions 2009-06-22 14:24:04 -03:00
Daniel Grana
e6bf2821ef remove request from transferring state prior to returning downloaded response 2009-06-22 14:06:15 -03:00