.. _topics-debug: ================= Debugging Spiders ================= This document explains the most common techniques for debugging spiders. Consider the following scrapy spider below:: class MySpider(BaseSpider): name = 'myspider' start_urls = ( 'http://example.com/page1', 'http://example.com/page2', ) def parse(self, response): # collect `item_urls` for item_url in item_urls: yield Request(url=item_url, callback=self.parse_item) def parse_item(self, response): item = MyItem() # populate `item` fields yield Request(url=item_details_url, meta={'item': item}, callback=self.parse_details) def parse_details(self, response): item = response.meta['item'] # populate more `item` fields return item Basically this is a simple spider which parses two pages of items (the start_urls). Items also have a details page with additional information, so we use the ``meta`` functionality of :class:`~scrapy.http.Request` to pass a partially populated item. Parse Command ============= The most basic way of checking the output of your spider is to use the :command:`parse` command. It allows to check the behaviour of different parts of the spider at the method level. It has the advantage of being flexible and simple to use, but does not allow debugging code inside a method. In order to see the item scraped from a specific url:: $ scrapy parse --spider=myspider -c parse_item -d 2 [ ... scrapy log lines crawling example.com spider ... ] >>> STATUS DEPTH LEVEL 2 <<< # Scraped Items ------------------------------------------------------------ [{'url': }] # Requests ----------------------------------------------------------------- [] Using the ``--verbose`` or ``-v`` option we can see the status at each depth level:: $ scrapy parse --spider=myspider -c parse_item -d 2 -v [ ... scrapy log lines crawling example.com spider ... ] >>> DEPTH LEVEL: 1 <<< # Scraped Items ------------------------------------------------------------ [] # Requests ----------------------------------------------------------------- [] >>> DEPTH LEVEL: 2 <<< # Scraped Items ------------------------------------------------------------ [{'url': }] # Requests ----------------------------------------------------------------- [] Checking items scraped from a single start_url, can also be easily achieved using:: $ scrapy parse --spider=myspider -d 3 'http://example.com/page1' Scrapy Shell ============ While the :command:`parse` command is very useful for checking behaviour of a spider, it is of little help to check what happens inside a callback, besides showing the reponse received and the output. How to debug the situation when ``parse_details`` sometimes receives no item? Fortunately, the :command:`shell` is your bread and butter in this case (see :ref:`topics-shell-inspect-response`):: from scrapy.shell import inspect_response def parse_details(self, response): item = response.meta.get('item', None) if item: # populate more `item` fields return item else: inspect_response(response, self) Logging ======= Logging is another useful option for getting information about your spider run. Although not as convenient, it comes with the advantage that the logs will be available in all future runs should they be necessary again:: from scrapy import log def parse_details(self, response): item = response.meta.get('item', None) if item: # populate more `item` fields return item else: self.log('No item received for %s' % response.url, level=log.WARNING) For more information, check the :ref:`topics-logging` section.