Daniel Grana
a8de5cef6e
remove xlib hack that appends scrapy/xlib to sys.path
2009-07-09 14:37:59 -03:00
Pablo Hoffman
8b26e49636
Added new ItemProcessor component to Scraper component
2009-07-08 23:48:06 -03:00
Pablo Hoffman
42b86a385f
removed wtf line
2009-07-08 18:19:54 -03:00
pablo
5cbafaea7f
StackTraceDump extension: using USR2 signal to avoid collision with other stuff that uses USR1 (such as twistd log rotation)
2009-07-08 09:19:35 -03:00
Daniel Grana
b83851dcc3
remove unused lines from shell command
2009-07-07 16:24:59 -03:00
Daniel Grana
8e5ede7179
shell command was broken by recent commits because scrapyengine.crawl does not returns a deferred anymore, now we use scrapyengine.schedule that returns the deferred of the download response
2009-07-07 16:22:23 -03:00
damian
1ba98606c2
test.test_utils_url: update parameter name; utils.url: minor code clean up
2009-07-07 12:35:24 -03:00
damian
460f690c5c
utils.url: add_or_replace_parameter function fixed, quoted urls support and test cases added
2009-07-07 11:20:26 -03:00
pablo
c205f7d8e5
added missing comment for non-trivial code
2009-07-06 20:38:39 -03:00
Daniel Grana
a15dc94340
images: images uploaded trough amazon s3 special spider must be scheduled
2009-07-06 16:16:49 -03:00
Daniel Grana
2e52005847
rewrite RequestLimitMiddleware spidermw so it does not consume spider output at once
...
--HG--
rename : scrapy/contrib/spidermiddleware/limit.py => scrapy/contrib/spidermiddleware/requestlimit.py
2009-07-06 15:35:36 -03:00
Pablo Hoffman
31b3d7ce1e
Added flow control mechanism to new Scraper component, to prevent cases where memory fills because of requests being downloaded much faster than they can be processed (by the spiders)
2009-07-06 15:31:50 -03:00
Daniel Grana
4f1d388733
Cleanup scrapyengine.crawl by moving functionality inside a new component named Scraper
2009-07-06 15:31:50 -03:00
Daniel Grana
3cb18dbbbb
Move itempipeline functionality outside of engine as a spidermiddleware
2009-07-06 15:31:50 -03:00
pablo
2ce43ebbec
made downloader/scheduler/spider middlewares code more consistent, added enabled/disabled/loaded informational attributes to all of them
2009-07-06 01:07:45 -03:00
Daniel Grana
f467c233b2
downloader: process queue inmediately after downloading the response
2009-07-03 01:32:24 -03:00
Pablo Hoffman
0c4c153819
improved Scrapy documentation index for better usability
2009-07-01 09:51:57 -03:00
Pablo Hoffman
af6db1691e
added scrapy.log.logmessage_received signal
2009-06-26 12:27:03 -03:00
Pablo Hoffman
80cd534f92
removed redundant botname from log lines
2009-06-25 16:48:04 -03:00
Pablo Hoffman
18301b7e66
downloader: performance improvement for sites that use download delay (replace datetime by time)
2009-06-25 14:13:45 -03:00
Pablo Hoffman
7933e00ebd
set more proper request priority for robots middleware and media pipeline
2009-06-25 12:10:55 -03:00
Pablo Hoffman
c22d2b1587
engine: added domain_is_open() method, added docstring for domain_is_closed() method
2009-06-25 09:56:38 -03:00
Pablo Hoffman
8de09fe4dd
improved documentation of Downloader._download() method and fixed bug with process_queue() being called too early
2009-06-24 17:08:16 -03:00
Daniel Grana
830cd4f19f
Restore download process queue processing after finish with recent transferred response
2009-06-24 13:45:50 -03:00
Pablo Hoffman
87df33ce0a
s/_next_request_called/_next_request_pending/
2009-06-24 10:36:36 -03:00
Pablo Hoffman
51029e37a3
engine: removed obsolete docstring and simplified next_request method
2009-06-24 10:34:44 -03:00
Daniel Grana
d7d18d27df
avoid rescheduling next_request calls
2009-06-24 10:28:34 -03:00
Pablo Hoffman
2b65f20c26
engine: removed redundant line and unused import
2009-06-23 21:50:46 -03:00
Daniel Grana
7578ab00a2
Automated merge with ssh://hg.scrapy.org/scrapy
2009-06-23 16:47:58 -03:00
Daniel Grana
73b60788c1
log framework errors at the end of crawling
2009-06-23 16:47:32 -03:00
Pablo Hoffman
93fcf6e314
added web console docstring pointing to documentation, improved telnet console docstring
2009-06-23 16:11:23 -03:00
Pablo Hoffman
834ac9fca0
Some telnet console changes:
...
- added telnet console documentation
- added documentation for debugging memory leaks with guppy
- sorted out shell alises
- set default port (TELNETCONSOLE_PORT) to 6023
2009-06-23 16:08:58 -03:00
Daniel Grana
32bae8040d
add basic mustbe_deferred tests
2009-06-23 14:59:03 -03:00
daniel
9b78f929de
remove obsolete deferred_imap util, use coiterate+imap instead
2009-06-23 14:45:16 -07:00
dgrana
e9c724e5b7
no need for two callbacks while processing scraping responses
2009-06-23 13:26:24 -07:00
dgrana
6809375815
restore call to next_request inside pipeline output processor
2009-06-23 13:19:35 -07:00
dgrana
21ed9a24a3
fix replace of deferred_imap by coiterate+imap and fix broken engine test
2009-06-23 13:14:22 -07:00
dgrana
49b413ba40
merge
2009-06-23 13:00:01 -07:00
dgrana
38f184e42a
remove calls to chain_deferred and deferred_imap
2009-06-23 12:53:13 -07:00
Pablo Hoffman
a8fd107ad4
engine: some extra simplifications and removed debug mode
2009-06-22 23:01:41 -03:00
Pablo Hoffman
a1cca0da50
removed obsolete file
2009-06-22 22:55:18 -03:00
Pablo Hoffman
a8a44aa035
engine: domains are now polled and closed when they're idle, instead of being notified by the downloader
2009-06-22 21:28:35 -03:00
Pablo Hoffman
5271d1f185
renamed engine.resume() method to engine.unpause()
2009-06-22 20:01:29 -03:00
Pablo Hoffman
b3a624ed6b
engine: simplified next_request and removed 'domain in self.closing' check
2009-06-22 19:59:36 -03:00
Pablo Hoffman
ab0950b67e
added exception reporting to global_tests in engine.get_status()
2009-06-22 18:40:17 -03:00
Pablo Hoffman
b037ef6a27
added clear_pending_requests to scheduler
2009-06-22 18:37:39 -03:00
Pablo Hoffman
fb8e24acb5
more downloader cleanup and fixed bug which was preventing domains to get properly closed
2009-06-22 16:25:44 -03:00
Pablo Hoffman
e8543ca1f9
minor clean up to engine domain closing
2009-06-22 15:05:26 -03:00
Daniel Grana
b488e3b4a9
catch downloader process_queue exceptions
2009-06-22 14:24:04 -03:00
Daniel Grana
e6bf2821ef
remove request from transferring state prior to returning downloaded response
2009-06-22 14:06:15 -03:00