scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 13:33:28 +00:00

Author	SHA1	Message	Date
Marven Sanchez	bb3ebf13f9	Add tests for RFC2616 policy enhancements Add `scrapy/downloadermiddlewares/httpcache.py` to `tests/py3-ignores.txt	2015-06-01 18:20:12 +08:00
Jamey Sharp	1991550442	Allow client to bound max-age for revalidation. Unlike specifying "Cache-Control: no-cache", if the request specifies "max-age=0", then the cached validators will be used if possible to avoid re-fetching unchanged pages. That said, it's still useful to be able to specify "no-cache" on the request, in cases where the origin server may have changed page contents without changing validators.	2015-06-01 18:06:36 +08:00
Jamey Sharp	c3b2cabf6c	Allow setting RFC2616Policy to cache unconditionally. A spider may wish to have all responses available in the cache, for future use with "Cache-Control: max-stale", for instance. The DummyPolicy caches all responses but never revalidates them, and sometimes a more nuanced policy is desirable. This setting still respects "Cache-Control: no-store" directives in responses. If you don't want that, filter "no-store" out of the Cache-Control headers in responses you feed to the cache middleware.	2015-06-01 18:06:35 +08:00
Jamey Sharp	e23a381337	Let spiders ignore bogus Cache-Control headers. Sites often set "no-store", "no-cache", "must-revalidate", etc., but get upset at the traffic a spider can generate if it respects those directives. Allow the spider's author to selectively ignore Cache-Control directives that are known to be unimportant for the sites being crawled. We assume that the spider will not issue Cache-Control directives in requests unless it actually needs them, so directives in requests are not filtered.	2015-06-01 18:06:35 +08:00
Jamey Sharp	dd3a46295c	Support "Cache-Control: max-stale" in requests. This allows spiders to be configured with the full RFC2616 cache policy, but avoid revalidation on a request-by-request basis, while remaining conformant with the HTTP spec.	2015-06-01 18:06:35 +08:00
Jamey Sharp	4446baae33	Use cached responses if revalidation errors out.	2015-06-01 18:06:35 +08:00
Julia Medina	9a3e3ba505	Move scrapy/contrib remaining top-level files to scrapy/extensions	2015-04-29 21:27:19 -03:00
Julia Medina	e262c5b8d5	scrapy/contrib/spiders shims	2015-04-29 21:27:19 -03:00
Julia Medina	fc346cba4d	Move scrapy/contrib/spiders to scrapy/spiders	2015-04-29 21:27:19 -03:00
Julia Medina	b2a15ddbf3	scrapy/contrib/spidermiddleware shims	2015-04-29 21:26:35 -03:00
Julia Medina	180272c092	Move scrapy/contrib/spidermiddleware to scrapy/spidermiddlewares	2015-04-29 21:26:35 -03:00
Julia Medina	c97a69c907	scrapy/contrib/pipeline shims	2015-04-29 21:26:35 -03:00
Julia Medina	8021df18d4	Move scrapy/contrib/pipeline to scrapy/pipelines	2015-04-29 21:26:35 -03:00
Julia Medina	d7e60f3c71	scrapy/contrib/loader shims	2015-04-29 21:24:30 -03:00
Julia Medina	b47228ada8	Move scrapy/contrib/loader to scrapy/loader	2015-04-29 21:24:30 -03:00
Julia Medina	569156be19	scrapy/contrib/linkextractors shims	2015-04-29 21:24:30 -03:00
Julia Medina	cf064b1437	Move scrapy/contrib/linkextractors to scrapy/linkextractors	2015-04-29 21:24:30 -03:00
Julia Medina	152594ce99	scrapy/contrib/exporter shims	2015-04-29 21:24:30 -03:00
Julia Medina	7804b3d778	Move scrapy/contrib/exporter to scrapy/exporters	2015-04-29 21:24:30 -03:00
Julia Medina	6b4c00cc9b	scrapy/contrib/downloadermiddleware shims	2015-04-29 21:24:30 -03:00
Julia Medina	d7c444fefb	Move scrapy/contrib/downloadermiddleware to scrapy/downloadermiddlewares	2015-04-29 21:24:30 -03:00
Pablo Hoffman	9441761aef	Merge pull request #1196 from mineo/patch-1 Remove a duplicate word	2015-04-29 17:48:19 -03:00
Wieland Hoffmann	de6501ed1b	Remove a duplicate word	2015-04-29 22:31:48 +02:00
Pablo Hoffman	3d2b74a6ff	Merge pull request #1188 from eliasdorneles/favoring_web_scraping_over_screen_scraping [MRG+1] Favoring web scraping over screen scraping in the descriptions	2015-04-29 16:49:43 -03:00
Mikhail Korobov	fbb1078f58	Merge pull request #1060 from Curita/python-logging [MRG+1] Python logging	2015-04-29 23:20:34 +05:00
Daniel Graña	5eb098a939	Merge pull request #1168 from scrapy/service-identity install service_identity package in tests to prevent warnings	2015-04-28 23:48:58 -03:00
Elias Dorneles	3d3633f3d2	favoring web scraping over screen scraping in the descriptions	2015-04-25 11:20:20 -03:00
Mikhail Korobov	fa1039f5b2	Merge pull request #1187 from Curita/relax-spiderloader-check Relax SpiderLoader interface check scrapy-0.25.1-sc	2015-04-24 03:40:10 +05:00
Julia Medina	cc4c31e426	Relax SpiderLoader interface check	2015-04-23 15:08:04 -03:00
Julia Medina	1d8f8221e6	Add backward compatibility to LogFormatter	2015-04-22 17:27:24 -03:00
Julia Medina	4858af4e94	Fix backward compatible functions in scrapy.log	2015-04-22 17:27:24 -03:00
Julia Medina	7a92dae4c8	Change Scrapy log output through docs	2015-04-22 17:27:24 -03:00
Julia Medina	6d1205063c	Add a filter to replace '__name__' loggers with 'scrapy'	2015-04-22 17:24:41 -03:00
Julia Medina	4f54ca3294	Change 'scrapy' logger for '__name__' on every module	2015-04-22 17:24:41 -03:00
Julia Medina	69a3d58110	Basic example on manually configuring log handlers	2015-04-22 17:24:41 -03:00
Julia Medina	bd0b639b21	Fix logging usage across docs	2015-04-22 17:24:41 -03:00
Julia Medina	4811d16f1d	Update `logger` attr and `log` method in the Spiders topic on docs	2015-04-22 17:24:41 -03:00
Julia Medina	d47a7edc65	Update Logging topic on docs	2015-04-22 17:24:40 -03:00
Julia Medina	ccdd8bfbcc	Parametrize log formatting strings	2015-04-22 17:24:40 -03:00
Julia Medina	21b9f377d6	Deprecate more frequently used functions from scrapy/log.py	2015-04-22 17:24:40 -03:00
Julia Medina	c174d78f12	Deprecate scrapy/log.py	2015-04-22 17:24:40 -03:00
Julia Medina	6acb3848fb	Stdout redirect in configure_logging	2015-04-22 17:24:40 -03:00
Julia Medina	ffd97f2f1b	Set root handlers based on settings in configure_logging	2015-04-22 17:24:40 -03:00
Julia Medina	1c8708eb82	Create a logger for every Spider and adapt Spider.log to log through it	2015-04-22 17:24:40 -03:00
Julia Medina	ac40ef611a	Custom handler to count log level occurrences in a crawler	2015-04-22 17:24:40 -03:00
Julia Medina	b75556ef79	Add a logging filter to mimic Twisted's log.err formating for Failures	2015-04-22 17:24:40 -03:00
Julia Medina	8baad55267	New scrapy/utils/log.py file with basic log helpers There are two functions, `configure_logging` and `log_scrapy_info` which intend to replace scrapy.log.start and scrapy.log.scrapy_info respectively. Creating new functions makes evident the backward incompatible change of using another logging system, and since the Python logging module is a standard builtin, additional helpers make sense to be on a scrapy/utils file.	2015-04-22 17:24:40 -03:00
Julia Medina	6f9b423215	Restructure LogFormatter to comply with std logging calls	2015-04-22 17:24:40 -03:00
Julia Medina	c2d716807a	Use LogCapture in testfixtures package for tests This allows to remove `get_testlog` helper, `flushLoggedErrors` from twisted.trial.unittest.TestCase and Twisted log observers created for each test on conftest.py.	2015-04-22 17:24:40 -03:00
Julia Medina	7a958f90be	Replace scrapy.log calls for their equivalents in the logging std module Changes: - Each module takes 'scrapy' logger and logs through it - Lazy string evaluation in all log messages - Added missing log messages in scrapy/core/engine.py - Contextual data such as crawler or spider instances, and failures	2015-04-22 17:24:39 -03:00

1 2 3 4 5 ...

4677 Commits