1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 14:03:41 +00:00

1218 Commits

Author SHA1 Message Date
Daniel Grana
32bae8040d add basic mustbe_deferred tests 2009-06-23 14:59:03 -03:00
daniel
9b78f929de remove obsolete deferred_imap util, use coiterate+imap instead 2009-06-23 14:45:16 -07:00
dgrana
e9c724e5b7 no need for two callbacks while processing scraping responses 2009-06-23 13:26:24 -07:00
dgrana
6809375815 restore call to next_request inside pipeline output processor 2009-06-23 13:19:35 -07:00
dgrana
21ed9a24a3 fix replace of deferred_imap by coiterate+imap and fix broken engine test 2009-06-23 13:14:22 -07:00
dgrana
49b413ba40 merge 2009-06-23 13:00:01 -07:00
dgrana
38f184e42a remove calls to chain_deferred and deferred_imap 2009-06-23 12:53:13 -07:00
Pablo Hoffman
a8fd107ad4 engine: some extra simplifications and removed debug mode 2009-06-22 23:01:41 -03:00
Pablo Hoffman
a1cca0da50 removed obsolete file 2009-06-22 22:55:18 -03:00
Pablo Hoffman
a8a44aa035 engine: domains are now polled and closed when they're idle, instead of being notified by the downloader 2009-06-22 21:28:35 -03:00
Pablo Hoffman
5271d1f185 renamed engine.resume() method to engine.unpause() 2009-06-22 20:01:29 -03:00
Pablo Hoffman
b3a624ed6b engine: simplified next_request and removed 'domain in self.closing' check 2009-06-22 19:59:36 -03:00
Pablo Hoffman
ab0950b67e added exception reporting to global_tests in engine.get_status() 2009-06-22 18:40:17 -03:00
Pablo Hoffman
b037ef6a27 added clear_pending_requests to scheduler 2009-06-22 18:37:39 -03:00
Pablo Hoffman
fb8e24acb5 more downloader cleanup and fixed bug which was preventing domains to get properly closed 2009-06-22 16:25:44 -03:00
Pablo Hoffman
e8543ca1f9 minor clean up to engine domain closing 2009-06-22 15:05:26 -03:00
Daniel Grana
b488e3b4a9 catch downloader process_queue exceptions 2009-06-22 14:24:04 -03:00
Daniel Grana
e6bf2821ef remove request from transferring state prior to returning downloaded response 2009-06-22 14:06:15 -03:00
Pablo Hoffman
400a54bf7c Added reasons when closing domains ('reason' argument to engine close_domain method), replaced old 'status' parameter by new 'reason' parameter in domain_closed signal. updated and improved signals doc 2009-06-21 22:00:16 -03:00
Pablo Hoffman
ab2dd764e0 downloader: some improvements to instantiation of SiteInfo (ex. SiteDetails) objects 2009-06-21 16:27:48 -03:00
Pablo Hoffman
7bf610d0b7 additional simplifications to downloader (several methods removed) and added more info to engine getstatus() method 2009-06-21 16:06:36 -03:00
Pablo Hoffman
fda1fe0eb8 decreased enabled extension/middlewares/pipelines log messages level to DEBUG 2009-06-21 15:38:26 -03:00
Pablo Hoffman
89712a4adf downloader: renamed SiteDetails.downloading to SiteDetails.transferring, for clarity 2009-06-21 14:23:51 -03:00
Pablo Hoffman
abe4f812e9 downloader: added site.closed additional check to domain already closed 2009-06-21 14:16:40 -03:00
Daniel Grana
5944f6590a restore downloader enqueing after middleware 2009-06-21 03:03:14 -03:00
Daniel Grana
cd1ad337b1 Downloader cleanup
* remove debug messages
* move deactivating of downloads to last callback
* simplify calling of download_any function
* raises RuntimeError while openinng/closing twice
2009-06-21 02:54:31 -03:00
Daniel Grana
31d5d519fc remove obsolete lambda_deferred function 2009-06-21 01:37:58 -03:00
Daniel Grana
bd59748de6 simplify chain_deferred implementation
--HG--
extra : rebase_source : e8a3639051c560f8fa2d75fc5469194723fe3ef9
2009-06-21 01:25:31 -03:00
Pablo Hoffman
61b1e67cd3 core: fixed engine getstatus() method for recent changes 2009-06-20 21:57:03 -03:00
Pablo Hoffman
adccd9a04e Sorted out Duplicate Filter API.
--HG--
rename : scrapy/dupefilter/__init__.py => scrapy/contrib/dupefilter.py
2009-06-20 20:29:07 -03:00
Daniel Grana
47970e91bc core: Invert request priority meaning, a higher request.priority value means more priority 2009-06-20 19:23:26 -03:00
Daniel Grana
18b6fecc47 Remove custom redirection priority of request returned by downloadermiddleware 2009-06-20 19:19:07 -03:00
Daniel Grana
00b49752ce Multiples changes to core scheduling and duplicates filtering
* removed starters from engine
* moved schedulermiddleware to scheduler
* raise RuntimeError when trying to open/close a scheduler domain twice
* removed dupesfilter singleton and spidermw dupefilter middleware

--HG--
extra : rebase_source : e4c3ad4b970cbc8f532bc751ba9d8a944ca16be5
2009-06-20 18:15:02 -03:00
Pablo Hoffman
728ec7c5c9 minor adjustment to FifoDomainScheduler and improved documentation of domain scheduler API (remove_pending_domain method removes all ocurrences) 2009-06-19 19:41:56 -03:00
Pablo Hoffman
161335fe78 Added domain schedulers (whose functionality was previously mixed with the
Scrapy Scheduler) and removed domain prioritizers whose functionality became
duplicated by the new domain schedulers.
2009-06-19 17:55:54 -03:00
Pablo Hoffman
3cb12e5dd3 Moved init_domain functionality out of the engine (refs #88) and into the
spider level. A new spider (InitSpider, not yet documented) was added to
provide initialization facilities.

Also renamed make_request_from_url to make_requests_from_url and allowed it to
return iterables.
2009-06-18 14:43:56 -03:00
Pablo Hoffman
b040e6a3a7 removed unused GenericSpider 2009-06-18 14:33:05 -03:00
Pablo Hoffman
e716fad03d engine: log error when reactor.listenTCP fails instead of failing 2009-06-18 09:44:02 -03:00
Pablo Hoffman
d23dcbb390 engine: minor code simplification and fixed potential KeyError bug 2009-06-17 09:17:50 -03:00
daniel
9022f84c05 decompressionmw: use a temporal filename because None is failing in some cases 2009-06-17 05:01:04 -03:00
daniel
2c93bc40c6 decompressionmw: relative hardcoded filename raises OSError when process has not write access to CWD 2009-06-17 03:41:42 -03:00
daniel
d175ced4a1 Automated merge with ssh://hg.scrapy.org/scrapy 2009-06-17 02:45:46 -03:00
Pablo Hoffman
74f6bd3a4d removed python2.5 from rpm-install.sh script 2009-06-16 13:14:40 -03:00
pablo
584a844e71 Fixed distutils packaging for all setup.py bdist formats
--HG--
rename : scrapy/bin/scrapy-admin.py => bin/scrapy-admin.py
2009-06-16 17:10:43 +01:00
Pablo Hoffman
ead5466998 changed format of --set SETTING:VALUE to --set SETTING=VALUE 2009-06-16 11:06:42 -03:00
Pablo Hoffman
e5b99a56c4 Several core changes:
Execution Manager:

* added control_reactor argument to delegate external twisted
  reactor control (for example by twistd)
* now it loads spiders (if not already loaded)
* now it stars the log (if not already started)
* removed *args from configure() method
* removed **opts from runonce and start methods

Execution engine:

* added control_reactor argument to to delegate external twisted
  reactor control (for example by twistd)
* changed some functions and method names for clarity
* improve handling of exceptions in st() method
* regrouped close_domain, closed_domain, and _close_domain method
  for legibilty

Scheduler:

* replaced pending_domains_count (dict) by pending_domains (set)
* simplified some doc
2009-06-15 19:44:26 -03:00
Pablo Hoffman
3c919f2562 Several core changes:
Execution Manager:

* added control_reactor argument to delegate external twisted
  reactor control (for example by twistd)
* now it loads spiders (if not already loaded)
* now it stars the log (if not already started)
* removed *args from configure() method
* removed **opts from runonce and start methods

Execution engine:

* added control_reactor argument to to delegate external twisted
  reactor control (for example by twistd)
* changed some functions and method names for clarity
* improve handling of exceptions in st() method
* regrouped close_domain, closed_domain, and _close_domain method
  for legibilty

Scheduler:

* replaced pending_domains_count (dict) by pending_domains (set)
* simplified some doc
2009-06-15 19:40:56 -03:00
Pablo Hoffman
5e3ef5a2fd item pipeline: added check for domain not already closed 2009-06-15 18:59:40 -03:00
Pablo Hoffman
aeb9734a80 downloader: made log message visible only when debug_mode is on 2009-06-15 18:58:37 -03:00
Pablo Hoffman
ff76f46d5a removed noisy comment and moved import to the top 2009-06-15 18:55:09 -03:00