mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-24 19:44:33 +00:00
* Implements find/create method in Spider Manager API, removed fromdomain and fromurl This method is now in charge of spider resolution, it must return spider object from its argument or raise KeyError if no spider is found. This method obsoletes from_domain and from_url methods. The default implementation of resolve only searches against spider.name, it won't use spider.allowed_domains like the old fromdomain. This is the reason of why you must supply a spider if you want to crawl an url. Find methods returns only available spider names. Not spider instances. If no spider found returns empty list. Affected modules: * command.models (force_domain) * removed spiders.force_domain * each command pass spider to crawl_* commands * command.commands.* * crawl * set spider from opts.spider if arg is url * group urls by spider to instance spider just once * genspider * use spiders.create() to check spider id * parse * log error if more than one spider found * core.manager * on crawl_* log message if multiple spiders found for url or request * shell * prints "Multiple found" if more than one spider found for url or request * populate_vars(): added spider keyword parameter * contrib.spidermanager: * removed fromdomain() & fromurl() * new create(spider_id) -> Spider. Raises KeyError if spider not found * new find_by_request(request) -> list(spiders)
This is Scrapy, an opensource screen scraping framework written in Python. For more visit the project home page at http://scrapy.org
Description
Languages
Python
99.8%
HTML
0.1%