1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 13:24:21 +00:00

1600 Commits

Author SHA1 Message Date
Pablo Hoffman
400a54bf7c Added reasons when closing domains ('reason' argument to engine close_domain method), replaced old 'status' parameter by new 'reason' parameter in domain_closed signal. updated and improved signals doc 2009-06-21 22:00:16 -03:00
Pablo Hoffman
ab2dd764e0 downloader: some improvements to instantiation of SiteInfo (ex. SiteDetails) objects 2009-06-21 16:27:48 -03:00
Pablo Hoffman
7bf610d0b7 additional simplifications to downloader (several methods removed) and added more info to engine getstatus() method 2009-06-21 16:06:36 -03:00
Pablo Hoffman
fda1fe0eb8 decreased enabled extension/middlewares/pipelines log messages level to DEBUG 2009-06-21 15:38:26 -03:00
Pablo Hoffman
89712a4adf downloader: renamed SiteDetails.downloading to SiteDetails.transferring, for clarity 2009-06-21 14:23:51 -03:00
Pablo Hoffman
abe4f812e9 downloader: added site.closed additional check to domain already closed 2009-06-21 14:16:40 -03:00
Daniel Grana
5944f6590a restore downloader enqueing after middleware 2009-06-21 03:03:14 -03:00
Daniel Grana
cd1ad337b1 Downloader cleanup
* remove debug messages
* move deactivating of downloads to last callback
* simplify calling of download_any function
* raises RuntimeError while openinng/closing twice
2009-06-21 02:54:31 -03:00
Daniel Grana
31d5d519fc remove obsolete lambda_deferred function 2009-06-21 01:37:58 -03:00
Daniel Grana
bd59748de6 simplify chain_deferred implementation
--HG--
extra : rebase_source : e8a3639051c560f8fa2d75fc5469194723fe3ef9
2009-06-21 01:25:31 -03:00
Pablo Hoffman
61b1e67cd3 core: fixed engine getstatus() method for recent changes 2009-06-20 21:57:03 -03:00
Pablo Hoffman
adccd9a04e Sorted out Duplicate Filter API.
--HG--
rename : scrapy/dupefilter/__init__.py => scrapy/contrib/dupefilter.py
2009-06-20 20:29:07 -03:00
Daniel Grana
47970e91bc core: Invert request priority meaning, a higher request.priority value means more priority 2009-06-20 19:23:26 -03:00
Daniel Grana
18b6fecc47 Remove custom redirection priority of request returned by downloadermiddleware 2009-06-20 19:19:07 -03:00
Daniel Grana
00b49752ce Multiples changes to core scheduling and duplicates filtering
* removed starters from engine
* moved schedulermiddleware to scheduler
* raise RuntimeError when trying to open/close a scheduler domain twice
* removed dupesfilter singleton and spidermw dupefilter middleware

--HG--
extra : rebase_source : e4c3ad4b970cbc8f532bc751ba9d8a944ca16be5
2009-06-20 18:15:02 -03:00
Pablo Hoffman
728ec7c5c9 minor adjustment to FifoDomainScheduler and improved documentation of domain scheduler API (remove_pending_domain method removes all ocurrences) 2009-06-19 19:41:56 -03:00
Pablo Hoffman
161335fe78 Added domain schedulers (whose functionality was previously mixed with the
Scrapy Scheduler) and removed domain prioritizers whose functionality became
duplicated by the new domain schedulers.
2009-06-19 17:55:54 -03:00
Pablo Hoffman
3cb12e5dd3 Moved init_domain functionality out of the engine (refs #88) and into the
spider level. A new spider (InitSpider, not yet documented) was added to
provide initialization facilities.

Also renamed make_request_from_url to make_requests_from_url and allowed it to
return iterables.
2009-06-18 14:43:56 -03:00
Pablo Hoffman
b040e6a3a7 removed unused GenericSpider 2009-06-18 14:33:05 -03:00
Pablo Hoffman
e716fad03d engine: log error when reactor.listenTCP fails instead of failing 2009-06-18 09:44:02 -03:00
Pablo Hoffman
d23dcbb390 engine: minor code simplification and fixed potential KeyError bug 2009-06-17 09:17:50 -03:00
daniel
9022f84c05 decompressionmw: use a temporal filename because None is failing in some cases 2009-06-17 05:01:04 -03:00
daniel
2c93bc40c6 decompressionmw: relative hardcoded filename raises OSError when process has not write access to CWD 2009-06-17 03:41:42 -03:00
daniel
d175ced4a1 Automated merge with ssh://hg.scrapy.org/scrapy 2009-06-17 02:45:46 -03:00
Pablo Hoffman
74f6bd3a4d removed python2.5 from rpm-install.sh script 2009-06-16 13:14:40 -03:00
pablo
584a844e71 Fixed distutils packaging for all setup.py bdist formats
--HG--
rename : scrapy/bin/scrapy-admin.py => bin/scrapy-admin.py
2009-06-16 17:10:43 +01:00
Pablo Hoffman
ead5466998 changed format of --set SETTING:VALUE to --set SETTING=VALUE 2009-06-16 11:06:42 -03:00
Pablo Hoffman
e5b99a56c4 Several core changes:
Execution Manager:

* added control_reactor argument to delegate external twisted
  reactor control (for example by twistd)
* now it loads spiders (if not already loaded)
* now it stars the log (if not already started)
* removed *args from configure() method
* removed **opts from runonce and start methods

Execution engine:

* added control_reactor argument to to delegate external twisted
  reactor control (for example by twistd)
* changed some functions and method names for clarity
* improve handling of exceptions in st() method
* regrouped close_domain, closed_domain, and _close_domain method
  for legibilty

Scheduler:

* replaced pending_domains_count (dict) by pending_domains (set)
* simplified some doc
2009-06-15 19:44:26 -03:00
Pablo Hoffman
3c919f2562 Several core changes:
Execution Manager:

* added control_reactor argument to delegate external twisted
  reactor control (for example by twistd)
* now it loads spiders (if not already loaded)
* now it stars the log (if not already started)
* removed *args from configure() method
* removed **opts from runonce and start methods

Execution engine:

* added control_reactor argument to to delegate external twisted
  reactor control (for example by twistd)
* changed some functions and method names for clarity
* improve handling of exceptions in st() method
* regrouped close_domain, closed_domain, and _close_domain method
  for legibilty

Scheduler:

* replaced pending_domains_count (dict) by pending_domains (set)
* simplified some doc
2009-06-15 19:40:56 -03:00
Pablo Hoffman
5e3ef5a2fd item pipeline: added check for domain not already closed 2009-06-15 18:59:40 -03:00
Pablo Hoffman
aeb9734a80 downloader: made log message visible only when debug_mode is on 2009-06-15 18:58:37 -03:00
Pablo Hoffman
ff76f46d5a removed noisy comment and moved import to the top 2009-06-15 18:55:09 -03:00
Pablo Hoffman
1d8cec63d1 scrapy.log: check if twisted log started before 2009-06-15 18:50:47 -03:00
daniel
a8d430b4dd httpcache: add domain to logging message 2009-06-15 12:35:42 -03:00
Pablo Hoffman
fd0e490157 added StatsMailer extension 2009-06-12 15:38:21 -03:00
Pablo Hoffman
7c2476bb25 fixed a couple of bugs caused by adding priority to Requests (thanks Artem for reporting) 2009-06-12 08:31:30 -03:00
Pablo Hoffman
4a1a01354b Added 'priority' attribute to Requests and removed old 'priority' argument passed through engine, scheduler and scheduler middleware calls 2009-06-11 22:25:47 -03:00
Pablo Hoffman
962dbeba88 fixed typo in docstring 2009-06-11 08:33:01 -03:00
Pablo Hoffman
e55158ebdd Merged olveyra's patch 2009-06-10 18:00:32 -03:00
Pablo Hoffman
635ac1ca64 Simplified domain prioritizers, so that they don't receive domains in the
constructor (domain prioritizers will be refactored later anyway) and
simplified Scrapy Manager code thanks to this.

Added make_request_from_url method to BaseSpider, splitting funtionality to
create requests from URLs which was previously done all in start_requests.
2009-06-10 14:21:36 -03:00
Pablo Hoffman
a74b0b1764 additional simplification of OffsiteMiddleware 2009-06-09 13:09:35 -03:00
Pablo Hoffman
eca05c9e12 OffsiteMiddleware: removed logging and simplified implementation 2009-06-09 12:37:15 -03:00
molveyra
6524def4b8 dont check guid in RobustScrapedItem.validate. Instead, raise
NotImplemented.
2009-06-04 10:44:40 -03:00
Daniel Grana
87fbc9c58c spidermw: add domain name to warning about missing callbacks in requests 2009-05-28 21:47:41 -03:00
Daniel Grana
727e67af5e spidermw: ignore and warn about requests without callback returned by spiders 2009-05-28 21:41:02 -03:00
Daniel Grana
cfafa01109 spidermw: check for __iter__ instead of trying to iter() that may cause that a string pass as iterable 2009-05-28 21:10:30 -03:00
Pablo Hoffman
0f690b03dc added deprecation warning to ErrorPages downloader middleware 2009-05-28 13:57:25 -03:00
Pablo Hoffman
1aac694343 updated settings doc 2009-05-28 13:52:56 -03:00
Pablo Hoffman
04e7f8f5f6 merged with Daniel's HttpException-removal branch 2009-05-28 13:45:26 -03:00
Daniel Grana
abda5edf09 decompressionmw: dont try to do decompress empty responses 2009-05-28 09:31:43 -03:00