crawls.
* requests are serialized (using marshal by default) and stored on disk, using
one queue per priority
* request priorities must be integers now
* breadh-first and depth-first crawling orders can now be configured
through a new DEPTH_PRIORITY setting (see doc). backwards compatilibty with
SCHEDULER_ORDER was kept.
* requests that can't be serialized (for example, non serializable callbacks)
are always kept in memory queues
* adapted crawl spider to work with persitent scheduler
* Rename BaseSpider.domain_name to BaseSpider.name
This patch implements the domain_name to name change in BaseSpider class and
change all spider instantiations to use the new attribute.
* Add allowed_domains to spider
This patch implements the merging of spider.domain_name and
spider.extra_domain_names in spider.allowed_domains for offsite checking
purposes.
Note that spider.domain_name is not touched by this patch, only not used.
* Remove spider.domain_name references from scrapy.stats
* Rename domain_stats to spider_stats in MemoryStatsCollector
* Use ``spider`` instead of ``domain`` in SimpledbStatsCollector
* Rename domain_stats_history table to spider_data_history and rename domain
field to spider in MysqlStatsCollector
* Refactor genspider command
The new signature for genspider is: genspider [options] <domain_name>.
Genspider uses domain_name for spider name and for the module name.
* Remove spider.domain_name references
* Update crawl command signature <spider|url>
* docs: updated references to domain_name
* examples/experimental: use spider.name
* genspider: require <name> <domain>
* spidermanager: renamed crawl_domain to crawl_spider_name
* spiderctl: updated references of *domain* to spider
* added backward compatiblity with legacy spider's attributes
'domain_name' and 'extra_domain_names'