Pablo Hoffman
2b65f20c26
engine: removed redundant line and unused import
2009-06-23 21:50:46 -03:00
Daniel Grana
7578ab00a2
Automated merge with ssh://hg.scrapy.org/scrapy
2009-06-23 16:47:58 -03:00
Daniel Grana
73b60788c1
log framework errors at the end of crawling
2009-06-23 16:47:32 -03:00
Pablo Hoffman
93fcf6e314
added web console docstring pointing to documentation, improved telnet console docstring
2009-06-23 16:11:23 -03:00
Pablo Hoffman
834ac9fca0
Some telnet console changes:
...
- added telnet console documentation
- added documentation for debugging memory leaks with guppy
- sorted out shell alises
- set default port (TELNETCONSOLE_PORT) to 6023
2009-06-23 16:08:58 -03:00
Daniel Grana
32bae8040d
add basic mustbe_deferred tests
2009-06-23 14:59:03 -03:00
daniel
9b78f929de
remove obsolete deferred_imap util, use coiterate+imap instead
2009-06-23 14:45:16 -07:00
dgrana
e9c724e5b7
no need for two callbacks while processing scraping responses
2009-06-23 13:26:24 -07:00
dgrana
6809375815
restore call to next_request inside pipeline output processor
2009-06-23 13:19:35 -07:00
dgrana
21ed9a24a3
fix replace of deferred_imap by coiterate+imap and fix broken engine test
2009-06-23 13:14:22 -07:00
dgrana
49b413ba40
merge
2009-06-23 13:00:01 -07:00
dgrana
38f184e42a
remove calls to chain_deferred and deferred_imap
2009-06-23 12:53:13 -07:00
Pablo Hoffman
a8fd107ad4
engine: some extra simplifications and removed debug mode
2009-06-22 23:01:41 -03:00
Pablo Hoffman
a1cca0da50
removed obsolete file
2009-06-22 22:55:18 -03:00
Pablo Hoffman
a8a44aa035
engine: domains are now polled and closed when they're idle, instead of being notified by the downloader
2009-06-22 21:28:35 -03:00
Pablo Hoffman
5271d1f185
renamed engine.resume() method to engine.unpause()
2009-06-22 20:01:29 -03:00
Pablo Hoffman
b3a624ed6b
engine: simplified next_request and removed 'domain in self.closing' check
2009-06-22 19:59:36 -03:00
Pablo Hoffman
ab0950b67e
added exception reporting to global_tests in engine.get_status()
2009-06-22 18:40:17 -03:00
Pablo Hoffman
b037ef6a27
added clear_pending_requests to scheduler
2009-06-22 18:37:39 -03:00
Pablo Hoffman
fb8e24acb5
more downloader cleanup and fixed bug which was preventing domains to get properly closed
2009-06-22 16:25:44 -03:00
Pablo Hoffman
e8543ca1f9
minor clean up to engine domain closing
2009-06-22 15:05:26 -03:00
Daniel Grana
b488e3b4a9
catch downloader process_queue exceptions
2009-06-22 14:24:04 -03:00
Daniel Grana
e6bf2821ef
remove request from transferring state prior to returning downloaded response
2009-06-22 14:06:15 -03:00
Pablo Hoffman
400a54bf7c
Added reasons when closing domains ('reason' argument to engine close_domain method), replaced old 'status' parameter by new 'reason' parameter in domain_closed signal. updated and improved signals doc
2009-06-21 22:00:16 -03:00
Pablo Hoffman
ab2dd764e0
downloader: some improvements to instantiation of SiteInfo (ex. SiteDetails) objects
2009-06-21 16:27:48 -03:00
Pablo Hoffman
7bf610d0b7
additional simplifications to downloader (several methods removed) and added more info to engine getstatus() method
2009-06-21 16:06:36 -03:00
Pablo Hoffman
fda1fe0eb8
decreased enabled extension/middlewares/pipelines log messages level to DEBUG
2009-06-21 15:38:26 -03:00
Pablo Hoffman
89712a4adf
downloader: renamed SiteDetails.downloading to SiteDetails.transferring, for clarity
2009-06-21 14:23:51 -03:00
Pablo Hoffman
abe4f812e9
downloader: added site.closed additional check to domain already closed
2009-06-21 14:16:40 -03:00
Daniel Grana
5944f6590a
restore downloader enqueing after middleware
2009-06-21 03:03:14 -03:00
Daniel Grana
cd1ad337b1
Downloader cleanup
...
* remove debug messages
* move deactivating of downloads to last callback
* simplify calling of download_any function
* raises RuntimeError while openinng/closing twice
2009-06-21 02:54:31 -03:00
Daniel Grana
31d5d519fc
remove obsolete lambda_deferred function
2009-06-21 01:37:58 -03:00
Daniel Grana
bd59748de6
simplify chain_deferred implementation
...
--HG--
extra : rebase_source : e8a3639051c560f8fa2d75fc5469194723fe3ef9
2009-06-21 01:25:31 -03:00
Pablo Hoffman
61b1e67cd3
core: fixed engine getstatus() method for recent changes
2009-06-20 21:57:03 -03:00
Pablo Hoffman
adccd9a04e
Sorted out Duplicate Filter API.
...
--HG--
rename : scrapy/dupefilter/__init__.py => scrapy/contrib/dupefilter.py
2009-06-20 20:29:07 -03:00
Daniel Grana
47970e91bc
core: Invert request priority meaning, a higher request.priority value means more priority
2009-06-20 19:23:26 -03:00
Daniel Grana
18b6fecc47
Remove custom redirection priority of request returned by downloadermiddleware
2009-06-20 19:19:07 -03:00
Daniel Grana
00b49752ce
Multiples changes to core scheduling and duplicates filtering
...
* removed starters from engine
* moved schedulermiddleware to scheduler
* raise RuntimeError when trying to open/close a scheduler domain twice
* removed dupesfilter singleton and spidermw dupefilter middleware
--HG--
extra : rebase_source : e4c3ad4b970cbc8f532bc751ba9d8a944ca16be5
2009-06-20 18:15:02 -03:00
Pablo Hoffman
728ec7c5c9
minor adjustment to FifoDomainScheduler and improved documentation of domain scheduler API (remove_pending_domain method removes all ocurrences)
2009-06-19 19:41:56 -03:00
Pablo Hoffman
161335fe78
Added domain schedulers (whose functionality was previously mixed with the
...
Scrapy Scheduler) and removed domain prioritizers whose functionality became
duplicated by the new domain schedulers.
2009-06-19 17:55:54 -03:00
Pablo Hoffman
3cb12e5dd3
Moved init_domain functionality out of the engine (refs #88 ) and into the
...
spider level. A new spider (InitSpider, not yet documented) was added to
provide initialization facilities.
Also renamed make_request_from_url to make_requests_from_url and allowed it to
return iterables.
2009-06-18 14:43:56 -03:00
Pablo Hoffman
b040e6a3a7
removed unused GenericSpider
2009-06-18 14:33:05 -03:00
Pablo Hoffman
e716fad03d
engine: log error when reactor.listenTCP fails instead of failing
2009-06-18 09:44:02 -03:00
Pablo Hoffman
d23dcbb390
engine: minor code simplification and fixed potential KeyError bug
2009-06-17 09:17:50 -03:00
daniel
9022f84c05
decompressionmw: use a temporal filename because None is failing in some cases
2009-06-17 05:01:04 -03:00
daniel
2c93bc40c6
decompressionmw: relative hardcoded filename raises OSError when process has not write access to CWD
2009-06-17 03:41:42 -03:00
daniel
d175ced4a1
Automated merge with ssh://hg.scrapy.org/scrapy
2009-06-17 02:45:46 -03:00
Pablo Hoffman
74f6bd3a4d
removed python2.5 from rpm-install.sh script
2009-06-16 13:14:40 -03:00
pablo
584a844e71
Fixed distutils packaging for all setup.py bdist formats
...
--HG--
rename : scrapy/bin/scrapy-admin.py => bin/scrapy-admin.py
2009-06-16 17:10:43 +01:00
Pablo Hoffman
ead5466998
changed format of --set SETTING:VALUE to --set SETTING=VALUE
2009-06-16 11:06:42 -03:00