scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 19:44:33 +00:00

Go to file

Rolando Espinoza La fuente dd477914db spidermanager refactoring

* Implements find/create method in Spider Manager API, removed fromdomain and fromurl

    This method is now in charge of spider resolution, it must return spider object
    from its argument or raise KeyError if no spider is found.

    This method obsoletes from_domain and from_url methods.

    The default implementation of resolve only searches against spider.name, it
    won't use spider.allowed_domains like the old fromdomain. This is the reason
    of why you must supply a spider if you want to crawl an url.

    Find methods returns only available spider names. Not spider instances.
    If no spider found returns empty list.

Affected modules:
    * command.models (force_domain)
        * removed spiders.force_domain
    * each command pass spider to crawl_* commands
    * command.commands.*
        * crawl
            * set spider from opts.spider if arg is url
            * group urls by spider to instance spider just once
        * genspider
            * use spiders.create() to check spider id
        * parse
            * log error if more than one spider found
    * core.manager
        * on crawl_* log message if multiple spiders found for url or request
    * shell
        * prints "Multiple found" if more than one spider found for url or request
        * populate_vars(): added spider keyword parameter

    * contrib.spidermanager:
        * removed fromdomain() & fromurl()
        * new create(spider_id) -> Spider. Raises KeyError if spider not found
        * new find_by_request(request) -> list(spiders)

2010-04-01 17:16:38 -03:00

bin

removed scrapy-admin.py command, and left only scrapy-ctl as the only scrapy command

2009-08-24 15:43:36 -03:00

docs

updated wrong link in doc

2010-03-26 14:02:33 -03:00

examples

removed obsolete scrapy.crawler module

2010-03-12 17:28:33 -02:00

extras

removed old untested (and probably broken) code

2010-04-01 04:05:53 -03:00

profiling/priorityqueue

mv scrapy/trunk to root as part of svn2hg migration

2009-05-06 15:55:17 -03:00

scrapy

spidermanager refactoring

2010-04-01 17:16:38 -03:00

scripts

removed python2.5 from rpm-install.sh script

2009-06-16 13:14:40 -03:00

.hgignore

ignore docs/build

2009-07-25 15:21:22 -03:00

.hgtags

Added tag 0.8 for changeset eef0b17d8752

2009-12-12 18:02:42 -02:00

AUTHORS

simplified and improved AUTHORS file

2010-02-19 23:16:55 -02:00

INSTALL

mv scrapy/trunk to root as part of svn2hg migration

2009-05-06 15:55:17 -03:00

LICENSE

mv scrapy/trunk to root as part of svn2hg migration

2009-05-06 15:55:17 -03:00

MANIFEST.in

added some missing file to MANIFEST.in

2009-09-28 23:55:00 -03:00

README

mv scrapy/trunk to root as part of svn2hg migration

2009-05-06 15:55:17 -03:00

setup.cfg

added bitmap for windows installer

2009-09-17 02:01:40 -03:00

setup.py

title-cased project name in setup.py

2009-12-12 15:48:02 -02:00

README

This is Scrapy, an opensource screen scraping framework written in Python.

For more visit the project home page at http://scrapy.org