1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 08:03:54 +00:00

2856 Commits

Author SHA1 Message Date
Daniel Grana
bdd627fe1d allow overriding store_uri by extending ImagePipeline
--HG--
extra : rebase_source : 5c561b8282f733ab0f26607059dd96d858154426
2011-07-15 15:17:38 -03:00
Pablo Hoffman
84f518fc5e More core changes:
* removed execution queue (replaced by newer spider queues)
* added real support for returning iterators in Spider.start_requests()
* removed support for passing urls to 'scrapy crawl' command
2011-07-15 15:18:39 -03:00
Daniel Grana
4dadeb7ccb fix issue with responses preventing spiders to be idle in engine counts 2011-07-15 13:57:24 -03:00
Pablo Hoffman
d207c0afe4 fixed bug in engine.download() method 2011-07-15 12:55:07 -03:00
Pablo Hoffman
830255eea3 removed deprecated commands: queue, runserver 2011-07-14 01:41:24 -03:00
Pablo Hoffman
359129adf9 fixed python pass handling in cmdline/commands tests so that it works with new w3lib library 2011-07-14 01:40:31 -03:00
Pablo Hoffman
dbad1373f1 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-07-13 18:44:54 -03:00
Pablo Hoffman
18cb4ff1d8 added natty to list of supporte ubuntu distros 2011-07-13 18:43:52 -03:00
Pablo Hoffman
39a2ea97c8 redirect mw: added REDIRECT_ENABLED setting and documented the other settings 2011-07-13 14:18:15 -03:00
Pablo Hoffman
0b6c7ce9b8 improved download errors propagation to the spiders, and removed no longer needed code to simplify 2011-07-13 14:10:05 -03:00
Pablo Hoffman
804c0279ec setup.py: only add lxml requirement if libxml2 is not available 2011-07-13 13:04:42 -03:00
Pablo Hoffman
541ed3913b retry middleware: added RETRY_ENABLED setting and documented the other settings more properly, also improved messages when no longer retrying requests 2011-07-13 11:55:05 -03:00
Pablo Hoffman
763f3dc628 minor update to doc 2011-07-12 19:56:39 -03:00
Pablo Hoffman
bfda9ec319 added clarification about scrapy versioning including the recently adopted odd/even versioning scheme
--HG--
rename : docs/api-stability.rst => docs/versioning.rst
2011-07-12 19:53:23 -03:00
Pablo Hoffman
4fde1ef94d added CloseSpider exception, to manually close spiders 2011-07-12 14:24:10 -03:00
Pablo Hoffman
4bb409923c improved encoding detection by adding support for HTML5 meta charset 2011-07-12 09:52:50 -03:00
Pablo Hoffman
67213ce673 logformatter: support non-ascii characters in custom implementations of Item.__str__() 2011-07-12 01:16:06 -03:00
Pablo Hoffman
31a375bde7 Close the scheduler after closing the scraper and downloader. This shouldn't have any real effect in practice, but it feels more appropiate to close the components in this order 2011-07-10 04:18:50 -03:00
Pablo Hoffman
90b1ae694c get_engine_status(): preserve test order defined in code 2011-07-10 04:10:20 -03:00
Pablo Hoffman
409aaade0b Refactored close spider behaviour so that the engine now waits for all
downloading (and enqueued for download) requests to finish and their responses
to be processed in the scraper/spiders, before closing the spider.

This will be required in the future to avoid loosing requests when we add
scheduler persistence and it's also a more correct behaviour overall.

The closing process has also been refactored to remove unneeded closing state
from downloader and leave it only in the engine.

Finally, some unused methods has been removed too, like spider_is_open() for
engine and scheduler.
2011-07-08 11:40:19 -03:00
Pablo Hoffman
574b070bb4 fixed minor bug in sitemap parser 2011-07-08 09:33:56 -03:00
Pablo Hoffman
ab9b786791 Updated CAMELCASE_EXCLUDE_CHARS to also exclude digits (patch by Adam Wentz) 2011-07-06 20:11:11 -03:00
Pablo Hoffman
7abc4b4c5a fixed typo 2011-07-06 01:35:21 -03:00
Pablo Hoffman
949e11ee31 SitemapSpider: added support for parsing gzipped sitemaps (patch contributed by Rolando Espinoza) 2011-07-06 01:33:46 -03:00
Pablo Hoffman
5707051352 fixed httpcompression middleware tests 2011-07-04 21:31:05 -03:00
Pablo Hoffman
81fbe8c9a4 added x-gzip to supported encoding declarations in httpcompression middleware 2011-07-04 21:27:24 -03:00
Pablo Hoffman
a5223881ee removed debugging code 2011-06-30 02:28:53 -03:00
Pablo Hoffman
5275343fa1 use handle_httpstatus_all=True in scrapy shell 2011-06-28 17:27:40 -03:00
Pablo Hoffman
7cd559eca5 SitemapSpider: ignore non-xml responses. fixes #331 2011-06-27 10:02:16 -03:00
Pablo Hoffman
db5cae7c03 SitemapSpider: added support for filtering which sitemaps to follow (patch contributed by Rolando Espinoza). closes #330 2011-06-23 18:18:29 -03:00
Pablo Hoffman
d97a9d8731 improved errors of ItemLoader.load_item() so that it shows the field name and value of the output processor that failed 2011-06-23 12:39:51 -03:00
Pablo Hoffman
fbafb295e8 removed DEFAULT_ITEM_CLASS setting from settings in new project template 2011-06-23 11:34:28 -03:00
Pablo Hoffman
d197895d8f removed deprecated code 2011-06-21 18:06:04 -03:00
Pablo Hoffman
d8775a7575 removed old deprecated FileExportPipeline 2011-06-21 18:01:05 -03:00
Pablo Hoffman
0305ffdd6c sitemaps: support trailing spaces in <loc> elements 2011-06-20 21:22:16 -03:00
Pablo Hoffman
2e74ccaa7e dropped InitSpider super class from CrawlSpider and Feed spiders, to avoid potentially confusing code, as it's also not needed 2011-06-20 13:10:13 -03:00
Pablo Hoffman
03bc218987 fixed bug in get_engine_status() function 2011-06-20 11:09:01 -03:00
Pablo Hoffman
03a92a8b03 slightly improved version of scrapyd script 2011-06-20 11:04:38 -03:00
Pablo Hoffman
5de5cac43e added quick script script to launch scrapyd 2011-06-20 10:48:34 -03:00
Pablo Hoffman
841007b5c5 added envvar SCRAPY_VERSION_FROM_HG=1 to extras/makedeb.py script 2011-06-18 03:31:47 -03:00
Pablo Hoffman
7e5e00cea5 Added public engine.download() method to use the downloader bypassing the scheduler. Changed media pipeline to use engine.download() to prevent deadlocks. 2011-06-18 02:52:21 -03:00
Pablo Hoffman
dd90e83eae get_engine_status(): also look up open spiders in scraper component 2011-06-18 02:48:01 -03:00
Pablo Hoffman
e575e015c1 LogStats extension: fixed KeyError bug caused with spiders that don't scrape any items 2011-06-17 16:50:02 -03:00
Pablo Hoffman
cfc93ba9db added SitemapSpider to basic spider assertion tests 2011-06-16 10:20:28 -03:00
Pablo Hoffman
25b0ca3125 minor imports sort out 2011-06-16 10:19:27 -03:00
Pablo Hoffman
59acb129e5 scrapyd activate_egg(): don't override SCRAPY_SETTINGS_MODULE envvar if already set 2011-06-15 19:35:03 -03:00
Pablo Hoffman
cd52a7c83b removed debugging print 2011-06-15 12:35:54 -03:00
Pablo Hoffman
57c43fdce6 added SitemapSpider, with tests and doc 2011-06-15 11:54:34 -03:00
Pablo Hoffman
91dc46539f added LogStats extension for periodically logging basic stats (like crawled pages and scraped items) 2011-06-14 00:50:05 -03:00
Pablo Hoffman
d2a9c0fdcd issue deprecation warning when using CLOSESPIDER_ITEMPASSED setting 2011-06-13 22:34:01 -03:00