Pablo Hoffman
80cd534f92
removed redundant botname from log lines
2009-06-25 16:48:04 -03:00
Pablo Hoffman
18301b7e66
downloader: performance improvement for sites that use download delay (replace datetime by time)
2009-06-25 14:13:45 -03:00
Pablo Hoffman
7933e00ebd
set more proper request priority for robots middleware and media pipeline
2009-06-25 12:10:55 -03:00
Pablo Hoffman
c22d2b1587
engine: added domain_is_open() method, added docstring for domain_is_closed() method
2009-06-25 09:56:38 -03:00
Pablo Hoffman
8de09fe4dd
improved documentation of Downloader._download() method and fixed bug with process_queue() being called too early
2009-06-24 17:08:16 -03:00
Daniel Grana
830cd4f19f
Restore download process queue processing after finish with recent transferred response
2009-06-24 13:45:50 -03:00
Pablo Hoffman
87df33ce0a
s/_next_request_called/_next_request_pending/
2009-06-24 10:36:36 -03:00
Pablo Hoffman
51029e37a3
engine: removed obsolete docstring and simplified next_request method
2009-06-24 10:34:44 -03:00
Daniel Grana
d7d18d27df
avoid rescheduling next_request calls
2009-06-24 10:28:34 -03:00
Pablo Hoffman
2b65f20c26
engine: removed redundant line and unused import
2009-06-23 21:50:46 -03:00
Daniel Grana
7578ab00a2
Automated merge with ssh://hg.scrapy.org/scrapy
2009-06-23 16:47:58 -03:00
Daniel Grana
73b60788c1
log framework errors at the end of crawling
2009-06-23 16:47:32 -03:00
Pablo Hoffman
93fcf6e314
added web console docstring pointing to documentation, improved telnet console docstring
2009-06-23 16:11:23 -03:00
Pablo Hoffman
834ac9fca0
Some telnet console changes:
...
- added telnet console documentation
- added documentation for debugging memory leaks with guppy
- sorted out shell alises
- set default port (TELNETCONSOLE_PORT) to 6023
2009-06-23 16:08:58 -03:00
Daniel Grana
32bae8040d
add basic mustbe_deferred tests
2009-06-23 14:59:03 -03:00
daniel
9b78f929de
remove obsolete deferred_imap util, use coiterate+imap instead
2009-06-23 14:45:16 -07:00
dgrana
e9c724e5b7
no need for two callbacks while processing scraping responses
2009-06-23 13:26:24 -07:00
dgrana
6809375815
restore call to next_request inside pipeline output processor
2009-06-23 13:19:35 -07:00
dgrana
21ed9a24a3
fix replace of deferred_imap by coiterate+imap and fix broken engine test
2009-06-23 13:14:22 -07:00
dgrana
49b413ba40
merge
2009-06-23 13:00:01 -07:00
dgrana
38f184e42a
remove calls to chain_deferred and deferred_imap
2009-06-23 12:53:13 -07:00
Pablo Hoffman
a8fd107ad4
engine: some extra simplifications and removed debug mode
2009-06-22 23:01:41 -03:00
Pablo Hoffman
a1cca0da50
removed obsolete file
2009-06-22 22:55:18 -03:00
Pablo Hoffman
a8a44aa035
engine: domains are now polled and closed when they're idle, instead of being notified by the downloader
2009-06-22 21:28:35 -03:00
Pablo Hoffman
5271d1f185
renamed engine.resume() method to engine.unpause()
2009-06-22 20:01:29 -03:00
Pablo Hoffman
b3a624ed6b
engine: simplified next_request and removed 'domain in self.closing' check
2009-06-22 19:59:36 -03:00
Pablo Hoffman
ab0950b67e
added exception reporting to global_tests in engine.get_status()
2009-06-22 18:40:17 -03:00
Pablo Hoffman
b037ef6a27
added clear_pending_requests to scheduler
2009-06-22 18:37:39 -03:00
Pablo Hoffman
fb8e24acb5
more downloader cleanup and fixed bug which was preventing domains to get properly closed
2009-06-22 16:25:44 -03:00
Pablo Hoffman
e8543ca1f9
minor clean up to engine domain closing
2009-06-22 15:05:26 -03:00
Daniel Grana
b488e3b4a9
catch downloader process_queue exceptions
2009-06-22 14:24:04 -03:00
Daniel Grana
e6bf2821ef
remove request from transferring state prior to returning downloaded response
2009-06-22 14:06:15 -03:00
Pablo Hoffman
400a54bf7c
Added reasons when closing domains ('reason' argument to engine close_domain method), replaced old 'status' parameter by new 'reason' parameter in domain_closed signal. updated and improved signals doc
2009-06-21 22:00:16 -03:00
Pablo Hoffman
ab2dd764e0
downloader: some improvements to instantiation of SiteInfo (ex. SiteDetails) objects
2009-06-21 16:27:48 -03:00
Pablo Hoffman
7bf610d0b7
additional simplifications to downloader (several methods removed) and added more info to engine getstatus() method
2009-06-21 16:06:36 -03:00
Pablo Hoffman
fda1fe0eb8
decreased enabled extension/middlewares/pipelines log messages level to DEBUG
2009-06-21 15:38:26 -03:00
Pablo Hoffman
89712a4adf
downloader: renamed SiteDetails.downloading to SiteDetails.transferring, for clarity
2009-06-21 14:23:51 -03:00
Pablo Hoffman
abe4f812e9
downloader: added site.closed additional check to domain already closed
2009-06-21 14:16:40 -03:00
Daniel Grana
5944f6590a
restore downloader enqueing after middleware
2009-06-21 03:03:14 -03:00
Daniel Grana
cd1ad337b1
Downloader cleanup
...
* remove debug messages
* move deactivating of downloads to last callback
* simplify calling of download_any function
* raises RuntimeError while openinng/closing twice
2009-06-21 02:54:31 -03:00
Daniel Grana
31d5d519fc
remove obsolete lambda_deferred function
2009-06-21 01:37:58 -03:00
Daniel Grana
bd59748de6
simplify chain_deferred implementation
...
--HG--
extra : rebase_source : e8a3639051c560f8fa2d75fc5469194723fe3ef9
2009-06-21 01:25:31 -03:00
Pablo Hoffman
61b1e67cd3
core: fixed engine getstatus() method for recent changes
2009-06-20 21:57:03 -03:00
Pablo Hoffman
adccd9a04e
Sorted out Duplicate Filter API.
...
--HG--
rename : scrapy/dupefilter/__init__.py => scrapy/contrib/dupefilter.py
2009-06-20 20:29:07 -03:00
Daniel Grana
47970e91bc
core: Invert request priority meaning, a higher request.priority value means more priority
2009-06-20 19:23:26 -03:00
Daniel Grana
18b6fecc47
Remove custom redirection priority of request returned by downloadermiddleware
2009-06-20 19:19:07 -03:00
Daniel Grana
00b49752ce
Multiples changes to core scheduling and duplicates filtering
...
* removed starters from engine
* moved schedulermiddleware to scheduler
* raise RuntimeError when trying to open/close a scheduler domain twice
* removed dupesfilter singleton and spidermw dupefilter middleware
--HG--
extra : rebase_source : e4c3ad4b970cbc8f532bc751ba9d8a944ca16be5
2009-06-20 18:15:02 -03:00
Pablo Hoffman
728ec7c5c9
minor adjustment to FifoDomainScheduler and improved documentation of domain scheduler API (remove_pending_domain method removes all ocurrences)
2009-06-19 19:41:56 -03:00
Pablo Hoffman
161335fe78
Added domain schedulers (whose functionality was previously mixed with the
...
Scrapy Scheduler) and removed domain prioritizers whose functionality became
duplicated by the new domain schedulers.
2009-06-19 17:55:54 -03:00
Pablo Hoffman
3cb12e5dd3
Moved init_domain functionality out of the engine (refs #88 ) and into the
...
spider level. A new spider (InitSpider, not yet documented) was added to
provide initialization facilities.
Also renamed make_request_from_url to make_requests_from_url and allowed it to
return iterables.
2009-06-18 14:43:56 -03:00