1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-06 11:00:46 +00:00

- sep 15 for #629

This commit is contained in:
Edwin O Marshall 2014-03-06 17:18:51 -05:00
parent c0a72e9c24
commit 27ff010a41

57
sep/sep-015.rst Normal file
View File

@ -0,0 +1,57 @@
======= ==============================================
SEP 15
Title ScrapyManager and SpiderManager API refactoring
Author Insophia Team
Created 2010-03-10
Status Final
======= ==============================================
========================================================
SEP-015: ScrapyManager and SpiderManager API refactoring
========================================================
This SEP proposes a refactoring of ``ScrapyManager`` and ``SpiderManager``
APIs.
SpiderManager
=============
- ``get(spider_name)`` -> ``Spider`` instance
- ``find_by_request(request)`` -> list of spider names
- ``list()`` -> list of spider names
- remove ``fromdomain()``, ``fromurl()``
ScrapyManager
=============
- ``crawl_request(request, spider=None)``
- calls ``SpiderManager.find_by_request(request)`` if spider is ``None``
- fails if ``len(spiders returned)`` != 1
- ``crawl_spider(spider)``
- calls ``spider.start_requests()``
- ``crawl_spider_name(spider_name)``
- calls ``SpiderManager.get(spider_name)``
- calls ``spider.start_requests()``
- ``crawl_url(url)``
- calls ``spider.make_requests_from_url()``
- remove ``crawl()``, ``runonce()``
Instead of using ``runonce()``, commands (such as crawl/parse) would call
``crawl_*`` and then ``start()``.
Changes to Commands
===================
- ``if is_url(arg):``
- calls ``ScrapyManager.crawl_url(arg)``
- ``else:``
- calls ``ScrapyManager.crawl_spider_name(arg)``
Pending issues
==============
- should we rename ``ScrapyManager.crawl_*`` to ``schedule_*`` or ``add_*`` ?
- ``SpiderManager.find_by_request`` or
``SpiderManager.search(request=request)`` ?