Rolando Espinoza La fuente
dd477914db
spidermanager refactoring
...
* Implements find/create method in Spider Manager API, removed fromdomain and fromurl
This method is now in charge of spider resolution, it must return spider object
from its argument or raise KeyError if no spider is found.
This method obsoletes from_domain and from_url methods.
The default implementation of resolve only searches against spider.name, it
won't use spider.allowed_domains like the old fromdomain. This is the reason
of why you must supply a spider if you want to crawl an url.
Find methods returns only available spider names. Not spider instances.
If no spider found returns empty list.
Affected modules:
* command.models (force_domain)
* removed spiders.force_domain
* each command pass spider to crawl_* commands
* command.commands.*
* crawl
* set spider from opts.spider if arg is url
* group urls by spider to instance spider just once
* genspider
* use spiders.create() to check spider id
* parse
* log error if more than one spider found
* core.manager
* on crawl_* log message if multiple spiders found for url or request
* shell
* prints "Multiple found" if more than one spider found for url or request
* populate_vars(): added spider keyword parameter
* contrib.spidermanager:
* removed fromdomain() & fromurl()
* new create(spider_id) -> Spider. Raises KeyError if spider not found
* new find_by_request(request) -> list(spiders)
2010-04-01 17:16:38 -03:00
Rolando Espinoza La fuente
8db67b17a3
scrapy manager refactor
...
* ExecutionManager
* deprecated runonce(*args)
* changed start() to start(keep_alive=Bool)
* changed crawl(*args) to crawl(requests, spider=None)
* if no spider given, tries to resolve spider
for each request
* added crawl_url(url, spider=None)
* added crawl_request(request, spider=None)
* added crawl_domain(domain)
* added crawl_spider(spider)
* updated commands: crawl, runspider, start
* updated webconsole
* updated crawler
* updated tests.test_engine
* updated utils.fetch
2010-04-01 17:16:38 -03:00
Pablo Hoffman
32f9c5fe68
removed old untested (and probably broken) code
2010-04-01 04:05:53 -03:00
Pablo Hoffman
4dc886e319
Improved comment
2010-03-31 18:26:35 -03:00
Pablo Hoffman
83d5eff0b7
More refactoring to encoding handling in TextResponse and subclasses
2010-03-31 18:21:41 -03:00
Pablo Hoffman
de896fa62d
Refactored implementation of Request.replace() and Response.replace()
2010-03-31 16:29:53 -03:00
Pablo Hoffman
2ed8a5bfb5
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-03-27 13:25:06 -03:00
Pablo Hoffman
2f75839e7a
Ignore noisy Twisted deprecation warnings
2010-03-27 13:23:13 -03:00
Pablo Hoffman
2299deda66
updated wrong link in doc
2010-03-26 14:02:33 -03:00
Pablo Hoffman
7cf2f87e27
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-03-26 08:29:34 -03:00
Pablo Hoffman
f19c939925
fixed doc typo
2010-03-26 08:28:32 -03:00
Daniel Grana
996a1b3574
fix handling of relative base urls in get_base_url util
...
--HG--
extra : rebase_source : eb552219e6bf40bc0d2e35968c367105233b6ecc
2010-03-25 15:50:34 -03:00
Pablo Hoffman
1330697c3d
Some improvements to Response encoding support:
...
* added encoding aliases, configurable through a new ENCODING_ALIASES setting
* Response.encoding now returns the real encoding detected for the body
* simplified TextResponse API by removing body_encoding() and
headers_encoding() methods
* Response.encoding now tries to infer the encoding from the body always (it
was done before only on HtmlResponse and TextResponse)
* removed scrapy.utils.encoding.add_encoding_alias() function
* updated implementation of scrapy.utils.response function to reflect these API
changes
* updated documentation to reflect API changes
2010-03-25 15:47:10 -03:00
Daniel Grana
173e94386b
Support relative url used in base tag. closes #148
...
--HG--
extra : rebase_source : 1bff87c127a7e9d8d12c772b3068feb11eb5d97f
2010-03-25 12:38:37 -03:00
Pablo Hoffman
9ddcd1095d
sort setting alphabetically
2010-03-25 11:45:06 -03:00
Pablo Hoffman
cb49567ca6
Removed wrong line added in previous commit
2010-03-24 12:15:18 -03:00
Pablo Hoffman
45411926b5
Improved encoding support by explicitly passing encoding to all str_to_unicode() and unicode_to_str() calls
2010-03-24 12:14:07 -03:00
Pablo Hoffman
4fa833c849
Added LOG_ENCODING setting
2010-03-24 12:13:38 -03:00
Pablo Hoffman
87e68e7438
Made MailSender non IO-blocking, and improved MailSender documentation
2010-03-22 13:37:37 -03:00
Pablo Hoffman
1dfc79b5d0
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-03-20 20:48:11 -03:00
Pablo Hoffman
99a876754c
Improved "What else?" section of "Scrapy at a glance" overview
2010-03-20 20:24:18 -03:00
Pablo Hoffman
264cd2e035
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-03-19 10:32:42 -03:00
Pablo Hoffman
234fd709ad
fixed doc typo (thanks Victor)
2010-03-19 10:32:17 -03:00
Daniel Grana
184cf6684f
Remove HttpException references from docs. Since 0.7, scrapy returns non-200 as Response objects and does not raise HttpException anymore
2010-03-18 10:05:33 -03:00
Pablo Hoffman
403a21ec74
removed obsolete scrapy.crawler module
2010-03-12 17:28:33 -02:00
Daniel Grana
17091902f3
Explicity say where to save item class in "Defining our item" section of tutorial
2010-03-12 14:12:49 -02:00
Pablo Hoffman
54ae2c36d0
better implementation of open_in_browser() tests
2010-03-12 10:19:50 -02:00
Pablo Hoffman
38a296aa2c
Added tests to open_in_browser() function
2010-03-12 09:52:39 -02:00
Pablo Hoffman
2ab94d75e2
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-03-12 09:32:35 -02:00
Pablo Hoffman
c5cd8b9d3d
Fixed bug in open_in_browser() function with Python 2.5 ( closes #145 ).
2010-03-12 09:31:05 -02:00
Pablo Hoffman
39e4df0cff
removed unmaintained (and untested) contrib_exp ShoveItemPipeline
2010-03-10 00:10:36 -02:00
Pablo Hoffman
a505a9d490
minor code refactoring on scrapy.command.cmdline module
2010-03-04 11:09:16 -02:00
Pablo Hoffman
4c1ec0c97e
replaced hacky command_executed dict by standard signal
2010-03-04 10:58:18 -02:00
Pablo Hoffman
861f9691c7
removed partly-obsolete module scrapy.contrib.groupsettings
2010-03-04 10:40:41 -02:00
Pablo Hoffman
d12cd22d5e
switched default scheduler order to DFO, which consumes less memory by default
2010-03-04 10:15:58 -02:00
Daniel Grana
700be3202b
Automated merge with ssh://hg.scrapy.org/scrapy-0.8
2010-02-24 15:44:41 -02:00
Daniel Grana
2322322ee6
Add missing priority and errback arguments to Request.replace method signature
2010-02-24 15:43:09 -02:00
Pablo Hoffman
180c091fb2
Fixed encoding issue (reported in #135 ) when the encoding declared in the HTTP header is unknown. This is the patch proposed by Rolando, with an update to the Request/Response documentation.
2010-02-24 14:01:29 -02:00
Pablo Hoffman
bbef0fe870
Automated merge with http://hg.scrapy.org/users/rolando/scrapy/
2010-02-20 11:12:37 -02:00
Rolando Espinoza La fuente
7b1ad321e3
examples/experimental: added imdb top movies spider
2010-02-19 21:31:17 -04:00
Pablo Hoffman
cb99edd153
simplified and improved AUTHORS file
2010-02-19 23:16:55 -02:00
Pablo Hoffman
a3d22c7240
Automated merge with http://hg.scrapy.org/scrapy-0.8/
2010-02-19 23:11:24 -02:00
Pablo Hoffman
60961e5499
minor documentation fix (refs #135 )
2010-02-19 23:09:48 -02:00
Pablo Hoffman
c1f8198639
Added RANDOMIZE_DOWNLOAD_DELAY setting
2010-02-19 21:53:18 -02:00
Rolando Espinoza La fuente
4a053a762f
examples/experimental: added gooledir crawler
2010-02-19 18:28:16 -04:00
Rolando Espinoza La fuente
a6a3f085a7
docs: added crawlspider v2 outline documentation
...
Sign-Off: Rolando Espinoza La fuente
2010-02-19 18:22:38 -04:00
Rolando Espinoza La fuente
17d1543929
contrib_exp: added crawlspider v2 package + tests
...
Sign-Off: Rolando Espinoza La fuente
2010-02-19 18:19:01 -04:00
Rolando Espinoza La fuente
7ddd4441e3
utils.python: added equal_attributes() to compare two objects arbitrary attributes
...
Sign-Off: Rolando Espinoza La fuente
2010-02-19 17:57:48 -04:00
Rolando Espinoza La fuente
7235040936
merged upstream
2010-02-19 17:41:45 -04:00
Pablo Hoffman
23fcf48a89
Automated merge with http://hg.scrapy.org/scrapy-0.8/
2010-02-19 16:34:01 -02:00