1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 04:24:35 +00:00
scrapy/sep/sep-015.trac

51 lines
1.4 KiB
Plaintext

= SEP-015: !ScrapyManager and !SpiderManager API refactoring =
[[PageOutline(2-5,Contents)]]
||'''SEP:'''||15||
||'''Title:'''||!ScrapyManager and !SpiderManger API refactoring||
||'''Author:'''||Insophia Team||
||'''Created:'''||2010-03-10||
||'''Status'''||Final||
== Introduction ==
This SEP proposes a refactoring of !ScrapyManager and !SpiderManager APIs.
== !SpiderManager ==
* get(spider_name) -> Spider instance
* find_by_request(request) -> list of spider names
* list() -> list of spider names
* remove: fromdomain(), fromurl()
== !ScrapyManager ==
* crawl_request(request, spider=None)
* calls !SpiderManager.find_by_request(request) if spider is None
* fails if len(spiders returned) != 1
* crawl_spider(spider)
* calls spider.start_requests()
* crawl_spider_name(spider_name)
* calls !SpiderManager.get(spider_name)
* calls spider.start_requests()
* crawl_url(url)
* calls spider.make_requests_from_url()
* remove crawl(), runonce()
Instead of using runonce(), commands (such as crawl/parse) would call crawl_* and then start().
== Changes to Commands ==
* if is_url(arg):
* calls !ScrapyManager.crawl_url(arg)
* else:
* calls !ScrapyManager.crawl_spider_name(arg)
== Pending issues ==
* should we rename !ScrapyManager.crawl_* to schedule_* or add_* ?
* !SpiderManager.find_by_request or !SpiderManager.search(request=request) ?