diff --git a/docs/intro/overview.rst b/docs/intro/overview.rst index d10c20138..15b4efd45 100644 --- a/docs/intro/overview.rst +++ b/docs/intro/overview.rst @@ -138,7 +138,7 @@ Finally, here's the spider code:: torrent['name'] = x.select("//h1/text()").extract() torrent['description'] = x.select("//div[@id='description']").extract() torrent['size'] = x.select("//div[@id='info-left']/p[2]/text()[2]").extract() - return [torrent] + return torrent For brevity sake, we intentionally left out the import statements and the diff --git a/docs/intro/tutorial.rst b/docs/intro/tutorial.rst index 7d63cf029..5f30cc911 100644 --- a/docs/intro/tutorial.rst +++ b/docs/intro/tutorial.rst @@ -137,7 +137,6 @@ This is the code for our first Spider, save it in a file named def parse(self, response): filename = response.url.split("/")[-2] open(filename, 'wb').write(response.body) - return [] SPIDER = DmozSpider() @@ -369,7 +368,6 @@ Let's add this code to our spider:: link = site.select('a/@href').extract() desc = site.select('text()').extract() print title, link, desc - return [] SPIDER = DmozSpider() diff --git a/docs/topics/spiders.rst b/docs/topics/spiders.rst index 197e67c24..a658f7e4f 100644 --- a/docs/topics/spiders.rst +++ b/docs/topics/spiders.rst @@ -22,11 +22,11 @@ For spiders, the scraping cycle goes through something like this: :attr:`~scrapy.spider.BaseSpider.parse` method as callback function for the Requests. -2. In the callback function you parse the response (web page) and return an - iterable containing either :class:`~scrapy.item.Item` objects, - :class:`~scrapy.http.Request` objects, or both. Those Requests will also - contain a callback (maybe the same) and will then be followed by downloaded - by Scrapy and then their response handled to the specified callback. +2. In the callback function you parse the response (web page) and return either + :class:`~scrapy.item.Item` objects, :class:`~scrapy.http.Request` objects, + or an iterable of both. Those Requests will also contain a callback (maybe + the same) and will then be followed by downloaded by Scrapy and then their + response handled to the specified callback. 3. In callback functions you parse the page contants, typically using :ref:`topics-selectors` (but you can also use BeautifuSoup, lxml or whatever @@ -138,9 +138,8 @@ BaseSpider will be used to parse the first pages crawled by the spider. The ``parse`` method is in charge of processing the response and returning - scraped data and/or more URLs to follow, because of this, the method must - always return a list or at least an empty one. Other Requests callbacks - have the same requirements as the BaseSpider class. + scraped data and/or more URLs to follow. Other Requests callbacks have + the same requirements as the BaseSpider class. .. method:: log(message, [level, component]) @@ -167,7 +166,6 @@ Let's see an example:: def parse(self, response): self.log('A response from %s just arrived!' % response.url) - return [] SPIDER = MySpider() @@ -251,7 +249,7 @@ Let's now take a look at an example CrawlSpider with rules:: item['id'] = hxs.select('//td[@id="item_id"]/text()').re(r'ID: (\d+)') item['name'] = hxs.select('//td[@id="item_name"]/text()').extract() item['description'] = hxs.select('//td[@id="item_description"]/text()').extract() - return [item] + return item SPIDER = MySpider()