1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 07:43:43 +00:00

1232 Commits

Author SHA1 Message Date
Pablo Hoffman
80cd534f92 removed redundant botname from log lines 2009-06-25 16:48:04 -03:00
Pablo Hoffman
18301b7e66 downloader: performance improvement for sites that use download delay (replace datetime by time) 2009-06-25 14:13:45 -03:00
Pablo Hoffman
7933e00ebd set more proper request priority for robots middleware and media pipeline 2009-06-25 12:10:55 -03:00
Pablo Hoffman
c22d2b1587 engine: added domain_is_open() method, added docstring for domain_is_closed() method 2009-06-25 09:56:38 -03:00
Pablo Hoffman
8de09fe4dd improved documentation of Downloader._download() method and fixed bug with process_queue() being called too early 2009-06-24 17:08:16 -03:00
Daniel Grana
830cd4f19f Restore download process queue processing after finish with recent transferred response 2009-06-24 13:45:50 -03:00
Pablo Hoffman
87df33ce0a s/_next_request_called/_next_request_pending/ 2009-06-24 10:36:36 -03:00
Pablo Hoffman
51029e37a3 engine: removed obsolete docstring and simplified next_request method 2009-06-24 10:34:44 -03:00
Daniel Grana
d7d18d27df avoid rescheduling next_request calls 2009-06-24 10:28:34 -03:00
Pablo Hoffman
2b65f20c26 engine: removed redundant line and unused import 2009-06-23 21:50:46 -03:00
Daniel Grana
7578ab00a2 Automated merge with ssh://hg.scrapy.org/scrapy 2009-06-23 16:47:58 -03:00
Daniel Grana
73b60788c1 log framework errors at the end of crawling 2009-06-23 16:47:32 -03:00
Pablo Hoffman
93fcf6e314 added web console docstring pointing to documentation, improved telnet console docstring 2009-06-23 16:11:23 -03:00
Pablo Hoffman
834ac9fca0 Some telnet console changes:
- added telnet console documentation
- added documentation for debugging memory leaks with guppy
- sorted out shell alises
- set default port (TELNETCONSOLE_PORT) to 6023
2009-06-23 16:08:58 -03:00
Daniel Grana
32bae8040d add basic mustbe_deferred tests 2009-06-23 14:59:03 -03:00
daniel
9b78f929de remove obsolete deferred_imap util, use coiterate+imap instead 2009-06-23 14:45:16 -07:00
dgrana
e9c724e5b7 no need for two callbacks while processing scraping responses 2009-06-23 13:26:24 -07:00
dgrana
6809375815 restore call to next_request inside pipeline output processor 2009-06-23 13:19:35 -07:00
dgrana
21ed9a24a3 fix replace of deferred_imap by coiterate+imap and fix broken engine test 2009-06-23 13:14:22 -07:00
dgrana
49b413ba40 merge 2009-06-23 13:00:01 -07:00
dgrana
38f184e42a remove calls to chain_deferred and deferred_imap 2009-06-23 12:53:13 -07:00
Pablo Hoffman
a8fd107ad4 engine: some extra simplifications and removed debug mode 2009-06-22 23:01:41 -03:00
Pablo Hoffman
a1cca0da50 removed obsolete file 2009-06-22 22:55:18 -03:00
Pablo Hoffman
a8a44aa035 engine: domains are now polled and closed when they're idle, instead of being notified by the downloader 2009-06-22 21:28:35 -03:00
Pablo Hoffman
5271d1f185 renamed engine.resume() method to engine.unpause() 2009-06-22 20:01:29 -03:00
Pablo Hoffman
b3a624ed6b engine: simplified next_request and removed 'domain in self.closing' check 2009-06-22 19:59:36 -03:00
Pablo Hoffman
ab0950b67e added exception reporting to global_tests in engine.get_status() 2009-06-22 18:40:17 -03:00
Pablo Hoffman
b037ef6a27 added clear_pending_requests to scheduler 2009-06-22 18:37:39 -03:00
Pablo Hoffman
fb8e24acb5 more downloader cleanup and fixed bug which was preventing domains to get properly closed 2009-06-22 16:25:44 -03:00
Pablo Hoffman
e8543ca1f9 minor clean up to engine domain closing 2009-06-22 15:05:26 -03:00
Daniel Grana
b488e3b4a9 catch downloader process_queue exceptions 2009-06-22 14:24:04 -03:00
Daniel Grana
e6bf2821ef remove request from transferring state prior to returning downloaded response 2009-06-22 14:06:15 -03:00
Pablo Hoffman
400a54bf7c Added reasons when closing domains ('reason' argument to engine close_domain method), replaced old 'status' parameter by new 'reason' parameter in domain_closed signal. updated and improved signals doc 2009-06-21 22:00:16 -03:00
Pablo Hoffman
ab2dd764e0 downloader: some improvements to instantiation of SiteInfo (ex. SiteDetails) objects 2009-06-21 16:27:48 -03:00
Pablo Hoffman
7bf610d0b7 additional simplifications to downloader (several methods removed) and added more info to engine getstatus() method 2009-06-21 16:06:36 -03:00
Pablo Hoffman
fda1fe0eb8 decreased enabled extension/middlewares/pipelines log messages level to DEBUG 2009-06-21 15:38:26 -03:00
Pablo Hoffman
89712a4adf downloader: renamed SiteDetails.downloading to SiteDetails.transferring, for clarity 2009-06-21 14:23:51 -03:00
Pablo Hoffman
abe4f812e9 downloader: added site.closed additional check to domain already closed 2009-06-21 14:16:40 -03:00
Daniel Grana
5944f6590a restore downloader enqueing after middleware 2009-06-21 03:03:14 -03:00
Daniel Grana
cd1ad337b1 Downloader cleanup
* remove debug messages
* move deactivating of downloads to last callback
* simplify calling of download_any function
* raises RuntimeError while openinng/closing twice
2009-06-21 02:54:31 -03:00
Daniel Grana
31d5d519fc remove obsolete lambda_deferred function 2009-06-21 01:37:58 -03:00
Daniel Grana
bd59748de6 simplify chain_deferred implementation
--HG--
extra : rebase_source : e8a3639051c560f8fa2d75fc5469194723fe3ef9
2009-06-21 01:25:31 -03:00
Pablo Hoffman
61b1e67cd3 core: fixed engine getstatus() method for recent changes 2009-06-20 21:57:03 -03:00
Pablo Hoffman
adccd9a04e Sorted out Duplicate Filter API.
--HG--
rename : scrapy/dupefilter/__init__.py => scrapy/contrib/dupefilter.py
2009-06-20 20:29:07 -03:00
Daniel Grana
47970e91bc core: Invert request priority meaning, a higher request.priority value means more priority 2009-06-20 19:23:26 -03:00
Daniel Grana
18b6fecc47 Remove custom redirection priority of request returned by downloadermiddleware 2009-06-20 19:19:07 -03:00
Daniel Grana
00b49752ce Multiples changes to core scheduling and duplicates filtering
* removed starters from engine
* moved schedulermiddleware to scheduler
* raise RuntimeError when trying to open/close a scheduler domain twice
* removed dupesfilter singleton and spidermw dupefilter middleware

--HG--
extra : rebase_source : e4c3ad4b970cbc8f532bc751ba9d8a944ca16be5
2009-06-20 18:15:02 -03:00
Pablo Hoffman
728ec7c5c9 minor adjustment to FifoDomainScheduler and improved documentation of domain scheduler API (remove_pending_domain method removes all ocurrences) 2009-06-19 19:41:56 -03:00
Pablo Hoffman
161335fe78 Added domain schedulers (whose functionality was previously mixed with the
Scrapy Scheduler) and removed domain prioritizers whose functionality became
duplicated by the new domain schedulers.
2009-06-19 17:55:54 -03:00
Pablo Hoffman
3cb12e5dd3 Moved init_domain functionality out of the engine (refs #88) and into the
spider level. A new spider (InitSpider, not yet documented) was added to
provide initialization facilities.

Also renamed make_request_from_url to make_requests_from_url and allowed it to
return iterables.
2009-06-18 14:43:56 -03:00