scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 00:59:05 +00:00

Author	SHA1	Message	Date
Mikhail Korobov	64399d18d8	Stop reactor on Ctrl-C regardless of 'stop_after_crawl'. Fixes GH-1279.	2015-06-06 02:53:36 +05:00
Mikhail Korobov	33d145e2f5	CrawlerProcess cleanup * remove unneeded lambda; * extract _get_dns_resolver method and format code to pep8.	2015-06-06 02:49:39 +05:00
Julia Medina	24d8a85269	Update release notes for 1.0.0rc2 (cherry picked from commit 6e61d54168cf471363be3e7e54d75ad544b9f6e1)	2015-06-05 17:11:40 -03:00
Chris Nilsson	eae25a04d9	Added MEMUSAGE_CHECK_INTERVAL_SECONDS to Memory usage extension options. Kept the default as it was, at 60.0 seconds. But added a setting to allow this to be changed as desired.	2015-06-06 00:39:14 +10:00
Daniel Graña	d9bcd48606	Merge pull request #1278 from Curita/remove-tz-aware-logformat Remove deprecated %z formatting from the default LOG_DATEFORMAT	2015-06-04 13:39:01 -03:00
Julia Medina	367ea81e71	Remove deprecated %z formatting from the default LOG_DATEFORMAT	2015-06-04 04:11:23 +08:00
Mikhail Korobov	f312ffcb54	Merge pull request #1276 from scrapy/fix-spider-settings Fix Spider.custom_settings	2015-06-03 22:14:04 +05:00
Mikhail Korobov	d42c420a6d	fixed spider custom_settings https://github.com/scrapy/scrapy/pull/1128 moved spidercls.update_settings call to a later stage; this commit moves it back.	2015-06-03 04:29:10 +05:00
Mikhail Korobov	cc2f3e1b46	TST a test case to show custom_settings doesn't always work	2015-06-03 04:26:20 +05:00
Daniel Graña	d52cf8bb03	Merge pull request #1267 from Curita/fix-1265 Fix #1265	2015-06-01 20:31:46 -03:00
Julia Medina	ffc7b7fd6c	Add helper to update deprecated class paths	2015-06-01 17:01:33 -03:00
Ally Weir	bd2fe996aa	Spelling correction incorrect use of "too" instead of "to"	2015-06-01 20:47:22 +05:00
Julia Medina	9d1cf230ed	Merge pull request #1268 from scrapy/crawlerprocess-dict-settings fixed CrawlerProcess when settings are passed as dicts	2015-06-01 12:35:54 -03:00
Marven Sanchez	8771d1f79b	Update HTTPCache middleware docs	2015-06-01 18:20:59 +08:00
Marven Sanchez	bb3ebf13f9	Add tests for RFC2616 policy enhancements Add `scrapy/downloadermiddlewares/httpcache.py` to `tests/py3-ignores.txt	2015-06-01 18:20:12 +08:00
Jamey Sharp	1991550442	Allow client to bound max-age for revalidation. Unlike specifying "Cache-Control: no-cache", if the request specifies "max-age=0", then the cached validators will be used if possible to avoid re-fetching unchanged pages. That said, it's still useful to be able to specify "no-cache" on the request, in cases where the origin server may have changed page contents without changing validators.	2015-06-01 18:06:36 +08:00
Jamey Sharp	c3b2cabf6c	Allow setting RFC2616Policy to cache unconditionally. A spider may wish to have all responses available in the cache, for future use with "Cache-Control: max-stale", for instance. The DummyPolicy caches all responses but never revalidates them, and sometimes a more nuanced policy is desirable. This setting still respects "Cache-Control: no-store" directives in responses. If you don't want that, filter "no-store" out of the Cache-Control headers in responses you feed to the cache middleware.	2015-06-01 18:06:35 +08:00
Jamey Sharp	e23a381337	Let spiders ignore bogus Cache-Control headers. Sites often set "no-store", "no-cache", "must-revalidate", etc., but get upset at the traffic a spider can generate if it respects those directives. Allow the spider's author to selectively ignore Cache-Control directives that are known to be unimportant for the sites being crawled. We assume that the spider will not issue Cache-Control directives in requests unless it actually needs them, so directives in requests are not filtered.	2015-06-01 18:06:35 +08:00
Jamey Sharp	dd3a46295c	Support "Cache-Control: max-stale" in requests. This allows spiders to be configured with the full RFC2616 cache policy, but avoid revalidation on a request-by-request basis, while remaining conformant with the HTTP spec.	2015-06-01 18:06:35 +08:00
Jamey Sharp	4446baae33	Use cached responses if revalidation errors out.	2015-06-01 18:06:35 +08:00
Mikhail Korobov	aa6a72707d	fixed CrawlerProcess when settings are passed as dicts See https://github.com/scrapy/scrapy/pull/1156	2015-05-30 06:59:15 +05:00
Mikhail Korobov	342cb622f1	DOC fix non-working link (by removing it). See https://github.com/scrapy/scrapy/pull/1260	2015-05-27 23:04:58 +05:00
Julia Medina	343d20d791	Update 1.0 release notes	2015-05-27 11:53:54 -03:00
Julia Medina	62a6eff218	Merge pull request #1259 from chekunkov/log-counter-handler-is-never-removed [MRG +1] LogCounterHandler is never removed from root handlers list, fix that	2015-05-27 11:42:19 -03:00
Julia Medina	26f50d3f43	Extend regex for tags that deploy to PyPI to support new release cycle	2015-05-27 09:17:18 -03:00
Alexander Chekunkov	b2765aabd8	LogCounterHandler is never removed from root handlers list, fix that lambda is garbage collected and because receiver is added as weak reference by default - when signals.engine_stopped is fired logging.root.removeHandler is not executed. Fixed that by assigning lambda to a private argument and not by using connect(..., weak=False) because I belive this lambda function should be collected with crawler object	2015-05-27 13:52:47 +07:00
Daniel Graña	5ee08865d6	Merge pull request #1258 from chekunkov/crawler-process-stopping-is-no-more [MRG+1] Remove CrawlerProcess.stopping as it isn't used any more	2015-05-26 15:32:24 -03:00
Alexander Chekunkov	b0ea3e38d1	remove CrawlerProcess.stopping as it isn't used any more	2015-05-26 17:37:16 +07:00
Pablo Hoffman	545c4224f9	update old crawlera link	2015-05-25 16:01:54 -03:00
Daniel Graña	ebe889a663	Unquote request path before passing to FTPClient, it already escape paths	2015-05-23 20:50:30 -03:00
Daniel Graña	3545468389	Merge branch 'deferdelay'	2015-05-23 18:09:20 -03:00
Daniel Graña	d439c26d76	update docstring and release notes	2015-05-22 20:00:58 -03:00
Alexey Vishnevsky	27ce3225bd	Makes scrapy more async by letting to reactor spend another couple of cycles to accomplish its needs.	2015-05-22 17:05:19 -03:00
Julia Medina	4b2763c6f9	Bump version: 1.0.0rc1 → 1.1.0dev1	2015-05-22 13:24:50 -03:00
Julia Medina	de6d232a02	Bump version: 0.25.1 → 1.0.0rc1 1.0.0rc1	2015-05-22 13:24:27 -03:00
Julia Medina	29529e5e8e	Merge pull request #1244 from Curita/1.0-release-notes 1.0 release notes	2015-05-22 13:21:17 -03:00
Julia Medina	600164594c	New release cycle in .bumpversion.cfg 1.0.0dev1 -> 1.0.0rc1 -> 1.0.0 -> 1.1.0dev1 -> ...	2015-05-22 12:59:21 -03:00
Julia Medina	afcf70cdc6	Add 1.0 release notes	2015-05-22 12:53:11 -03:00
Mikhail Korobov	cc2258b2bb	Merge pull request #1145 from bosnj/master [MRG+1] default return value for extract_first	2015-05-21 22:03:54 +05:00
Daniel Graña	58717472f7	Merge pull request #1250 from chekunkov/scrapy-log-fix-incompatible-change [MRG+1] Keep level_names in scrapy.log for backwards compatibility	2015-05-21 10:46:39 -03:00
Alexander Chekunkov	795ca3945f	keep level_names in scrapy.log for backwards compatibility	2015-05-21 08:56:44 +00:00
Daniel Graña	ee59112480	Merge pull request #1224 from scrapy/fix-empty-feed-export-fields [MRG] fixed FEED_EXPORT_FIELDS handling (see #1223)	2015-05-19 16:36:05 -03:00
Daniel Graña	5beb9d251c	Merge pull request #1243 from scrapy/remove-contrib-from-py3-ignores remove unnecessary lines from py3-ignores	2015-05-19 12:20:51 -03:00
Mikhail Korobov	7a5b5ec4d6	TST remove unnecessary lines from py3-ignores scrapy/contrib is already skipped - see https://github.com/scrapy/scrapy/pull/1165	2015-05-19 00:57:39 +05:00
Mikhail Korobov	21b17734c5	Merge pull request #1242 from Curita/exporters-single-module Move exporters/__init__.py to exporters.py	2015-05-19 00:10:30 +05:00
Julia Medina	044f31cb8d	Merge pull request #1240 from scrapy/fix-feedexport-logging MRG+1 fixed FeedExporter shutdown log messages	2015-05-18 14:54:11 -03:00
Julia Medina	af0c8f82f4	Move exporters/__init__.py to exporters.py	2015-05-18 14:46:23 -03:00
Mikhail Korobov	60e79db3ee	fixed FeedExporter shutdown log messages	2015-05-18 19:28:37 +05:00
Mikhail Korobov	9b0ca1b7a0	drop support for FEED_EXPORT_FIELD=[] meaning "no fields"	2015-05-18 17:13:25 +05:00
Mikhail Korobov	9fb318338b	support FEED_EXPORT_FIELDS=[]	2015-05-18 16:44:02 +05:00

... 6 7 8 9 10 ...

5153 Commits