mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-23 01:03:52 +00:00
58 lines
1.6 KiB
ReStructuredText
58 lines
1.6 KiB
ReStructuredText
======= ==============================================
|
|
SEP 15
|
|
Title ScrapyManager and SpiderManager API refactoring
|
|
Author Insophia Team
|
|
Created 2010-03-10
|
|
Status Final
|
|
======= ==============================================
|
|
|
|
========================================================
|
|
SEP-015: ScrapyManager and SpiderManager API refactoring
|
|
========================================================
|
|
|
|
This SEP proposes a refactoring of ``ScrapyManager`` and ``SpiderManager``
|
|
APIs.
|
|
|
|
SpiderManager
|
|
=============
|
|
|
|
- ``get(spider_name)`` -> ``Spider`` instance
|
|
- ``find_by_request(request)`` -> list of spider names
|
|
- ``list()`` -> list of spider names
|
|
|
|
- remove ``fromdomain()``, ``fromurl()``
|
|
|
|
ScrapyManager
|
|
=============
|
|
|
|
- ``crawl_request(request, spider=None)``
|
|
- calls ``SpiderManager.find_by_request(request)`` if spider is ``None``
|
|
- fails if ``len(spiders returned)`` != 1
|
|
- ``crawl_spider(spider)``
|
|
- calls ``spider.start_requests()``
|
|
- ``crawl_spider_name(spider_name)``
|
|
- calls ``SpiderManager.get(spider_name)``
|
|
- calls ``spider.start_requests()``
|
|
- ``crawl_url(url)``
|
|
- calls ``spider.make_requests_from_url()``
|
|
|
|
- remove ``crawl()``, ``runonce()``
|
|
|
|
Instead of using ``runonce()``, commands (such as crawl/parse) would call
|
|
``crawl_*`` and then ``start()``.
|
|
|
|
Changes to Commands
|
|
===================
|
|
|
|
- ``if is_url(arg):``
|
|
- calls ``ScrapyManager.crawl_url(arg)``
|
|
- ``else:``
|
|
- calls ``ScrapyManager.crawl_spider_name(arg)``
|
|
|
|
Pending issues
|
|
==============
|
|
|
|
- should we rename ``ScrapyManager.crawl_*`` to ``schedule_*`` or ``add_*`` ?
|
|
- ``SpiderManager.find_by_request`` or
|
|
``SpiderManager.search(request=request)`` ?
|