scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 16:27:44 +00:00

Author	SHA1	Message	Date
Mikhail Korobov	1740fcf1a6	DOC SignalManager docstrings. See GH-713. This change is not 100% backwards compatible because of *args changes. Their usage was not documented, so we're not breaking public interface.	2015-06-08 21:05:58 +05:00
Mikhail Korobov	9a787893e3	(backwards-incompatible) allow to pass settings=None to configure_logging * use explicit argument for disabling root handler; * handle LOG_STDOUT even if install_root_handler is False	2015-06-08 19:54:18 +05:00
Mikhail Korobov	3cbf8a0b2b	extract CrawlerRunner._crawl method which always expects Crawler It provides an extension point where crawler instance is available; it should make it easier to write alternative CrawlerRunner.crawl implementations. See also: https://github.com/scrapy/scrapy/pull/1256	2015-06-08 18:35:44 +05:00
Pawel Miech	e575f44446	[settings/default_settings.py] dont retry 400 As in HTTP specs: "10.4.1 400 Bad Request The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications." Scrapy should not retry 400 by default.	2015-06-08 10:52:42 +02:00
Daniel Graña	87293965db	Merge pull request #1285 from scrapy/optional-settings-arguments make it easier to use default settings	2015-06-07 20:29:24 -03:00
Chris Nilsson	61dec83f70	Moved default value of MEMUSAGE_CHECK_INTERVAL_SECONDS to default_settings	2015-06-06 11:19:29 +10:00
Chris Nilsson	0c532baf4c	Removed typo, and clarified time unit of setting	2015-06-06 11:18:13 +10:00
Mikhail Korobov	d047665c02	make "settings" argument optional for Crawler, CrawlerRunner and CrawlerProcess	2015-06-06 03:23:13 +05:00
Mikhail Korobov	64399d18d8	Stop reactor on Ctrl-C regardless of 'stop_after_crawl'. Fixes GH-1279.	2015-06-06 02:53:36 +05:00
Mikhail Korobov	33d145e2f5	CrawlerProcess cleanup * remove unneeded lambda; * extract _get_dns_resolver method and format code to pep8.	2015-06-06 02:49:39 +05:00
Julia Medina	24d8a85269	Update release notes for 1.0.0rc2 (cherry picked from commit 6e61d54168cf471363be3e7e54d75ad544b9f6e1)	2015-06-05 17:11:40 -03:00
Chris Nilsson	eae25a04d9	Added MEMUSAGE_CHECK_INTERVAL_SECONDS to Memory usage extension options. Kept the default as it was, at 60.0 seconds. But added a setting to allow this to be changed as desired.	2015-06-06 00:39:14 +10:00
Daniel Graña	d9bcd48606	Merge pull request #1278 from Curita/remove-tz-aware-logformat Remove deprecated %z formatting from the default LOG_DATEFORMAT	2015-06-04 13:39:01 -03:00
Julia Medina	367ea81e71	Remove deprecated %z formatting from the default LOG_DATEFORMAT	2015-06-04 04:11:23 +08:00
Mikhail Korobov	f312ffcb54	Merge pull request #1276 from scrapy/fix-spider-settings Fix Spider.custom_settings	2015-06-03 22:14:04 +05:00
Mikhail Korobov	d42c420a6d	fixed spider custom_settings https://github.com/scrapy/scrapy/pull/1128 moved spidercls.update_settings call to a later stage; this commit moves it back.	2015-06-03 04:29:10 +05:00
Mikhail Korobov	cc2f3e1b46	TST a test case to show custom_settings doesn't always work	2015-06-03 04:26:20 +05:00
Daniel Graña	d52cf8bb03	Merge pull request #1267 from Curita/fix-1265 Fix #1265	2015-06-01 20:31:46 -03:00
Julia Medina	ffc7b7fd6c	Add helper to update deprecated class paths	2015-06-01 17:01:33 -03:00
Ally Weir	bd2fe996aa	Spelling correction incorrect use of "too" instead of "to"	2015-06-01 20:47:22 +05:00
Julia Medina	9d1cf230ed	Merge pull request #1268 from scrapy/crawlerprocess-dict-settings fixed CrawlerProcess when settings are passed as dicts	2015-06-01 12:35:54 -03:00
Marven Sanchez	8771d1f79b	Update HTTPCache middleware docs	2015-06-01 18:20:59 +08:00
Marven Sanchez	bb3ebf13f9	Add tests for RFC2616 policy enhancements Add `scrapy/downloadermiddlewares/httpcache.py` to `tests/py3-ignores.txt	2015-06-01 18:20:12 +08:00
Jamey Sharp	1991550442	Allow client to bound max-age for revalidation. Unlike specifying "Cache-Control: no-cache", if the request specifies "max-age=0", then the cached validators will be used if possible to avoid re-fetching unchanged pages. That said, it's still useful to be able to specify "no-cache" on the request, in cases where the origin server may have changed page contents without changing validators.	2015-06-01 18:06:36 +08:00
Jamey Sharp	c3b2cabf6c	Allow setting RFC2616Policy to cache unconditionally. A spider may wish to have all responses available in the cache, for future use with "Cache-Control: max-stale", for instance. The DummyPolicy caches all responses but never revalidates them, and sometimes a more nuanced policy is desirable. This setting still respects "Cache-Control: no-store" directives in responses. If you don't want that, filter "no-store" out of the Cache-Control headers in responses you feed to the cache middleware.	2015-06-01 18:06:35 +08:00
Jamey Sharp	e23a381337	Let spiders ignore bogus Cache-Control headers. Sites often set "no-store", "no-cache", "must-revalidate", etc., but get upset at the traffic a spider can generate if it respects those directives. Allow the spider's author to selectively ignore Cache-Control directives that are known to be unimportant for the sites being crawled. We assume that the spider will not issue Cache-Control directives in requests unless it actually needs them, so directives in requests are not filtered.	2015-06-01 18:06:35 +08:00
Jamey Sharp	dd3a46295c	Support "Cache-Control: max-stale" in requests. This allows spiders to be configured with the full RFC2616 cache policy, but avoid revalidation on a request-by-request basis, while remaining conformant with the HTTP spec.	2015-06-01 18:06:35 +08:00
Jamey Sharp	4446baae33	Use cached responses if revalidation errors out.	2015-06-01 18:06:35 +08:00
Mikhail Korobov	aa6a72707d	fixed CrawlerProcess when settings are passed as dicts See https://github.com/scrapy/scrapy/pull/1156	2015-05-30 06:59:15 +05:00
Mikhail Korobov	342cb622f1	DOC fix non-working link (by removing it). See https://github.com/scrapy/scrapy/pull/1260	2015-05-27 23:04:58 +05:00
Julia Medina	343d20d791	Update 1.0 release notes	2015-05-27 11:53:54 -03:00
Julia Medina	62a6eff218	Merge pull request #1259 from chekunkov/log-counter-handler-is-never-removed [MRG +1] LogCounterHandler is never removed from root handlers list, fix that	2015-05-27 11:42:19 -03:00
Julia Medina	26f50d3f43	Extend regex for tags that deploy to PyPI to support new release cycle	2015-05-27 09:17:18 -03:00
Alexander Chekunkov	b2765aabd8	LogCounterHandler is never removed from root handlers list, fix that lambda is garbage collected and because receiver is added as weak reference by default - when signals.engine_stopped is fired logging.root.removeHandler is not executed. Fixed that by assigning lambda to a private argument and not by using connect(..., weak=False) because I belive this lambda function should be collected with crawler object	2015-05-27 13:52:47 +07:00
Daniel Graña	5ee08865d6	Merge pull request #1258 from chekunkov/crawler-process-stopping-is-no-more [MRG+1] Remove CrawlerProcess.stopping as it isn't used any more	2015-05-26 15:32:24 -03:00
Alexander Chekunkov	b0ea3e38d1	remove CrawlerProcess.stopping as it isn't used any more	2015-05-26 17:37:16 +07:00
Pablo Hoffman	545c4224f9	update old crawlera link	2015-05-25 16:01:54 -03:00
Daniel Graña	ebe889a663	Unquote request path before passing to FTPClient, it already escape paths	2015-05-23 20:50:30 -03:00
Daniel Graña	3545468389	Merge branch 'deferdelay'	2015-05-23 18:09:20 -03:00
Daniel Graña	d439c26d76	update docstring and release notes	2015-05-22 20:00:58 -03:00
Alexey Vishnevsky	27ce3225bd	Makes scrapy more async by letting to reactor spend another couple of cycles to accomplish its needs.	2015-05-22 17:05:19 -03:00
Julia Medina	4b2763c6f9	Bump version: 1.0.0rc1 → 1.1.0dev1	2015-05-22 13:24:50 -03:00
Julia Medina	de6d232a02	Bump version: 0.25.1 → 1.0.0rc1 1.0.0rc1	2015-05-22 13:24:27 -03:00
Julia Medina	29529e5e8e	Merge pull request #1244 from Curita/1.0-release-notes 1.0 release notes	2015-05-22 13:21:17 -03:00
Julia Medina	600164594c	New release cycle in .bumpversion.cfg 1.0.0dev1 -> 1.0.0rc1 -> 1.0.0 -> 1.1.0dev1 -> ...	2015-05-22 12:59:21 -03:00
Julia Medina	afcf70cdc6	Add 1.0 release notes	2015-05-22 12:53:11 -03:00
Mikhail Korobov	cc2258b2bb	Merge pull request #1145 from bosnj/master [MRG+1] default return value for extract_first	2015-05-21 22:03:54 +05:00
Daniel Graña	58717472f7	Merge pull request #1250 from chekunkov/scrapy-log-fix-incompatible-change [MRG+1] Keep level_names in scrapy.log for backwards compatibility	2015-05-21 10:46:39 -03:00
Alexander Chekunkov	795ca3945f	keep level_names in scrapy.log for backwards compatibility	2015-05-21 08:56:44 +00:00
Daniel Graña	ee59112480	Merge pull request #1224 from scrapy/fix-empty-feed-export-fields [MRG] fixed FEED_EXPORT_FIELDS handling (see #1223)	2015-05-19 16:36:05 -03:00

... 5 6 7 8 9 ...

5111 Commits