Unlike specifying "Cache-Control: no-cache", if the request specifies
"max-age=0", then the cached validators will be used if possible to
avoid re-fetching unchanged pages.
That said, it's still useful to be able to specify "no-cache" on the
request, in cases where the origin server may have changed page contents
without changing validators.
A spider may wish to have all responses available in the cache, for
future use with "Cache-Control: max-stale", for instance. The
DummyPolicy caches all responses but never revalidates them, and
sometimes a more nuanced policy is desirable.
This setting still respects "Cache-Control: no-store" directives in
responses. If you don't want that, filter "no-store" out of the
Cache-Control headers in responses you feed to the cache middleware.
Sites often set "no-store", "no-cache", "must-revalidate", etc., but get
upset at the traffic a spider can generate if it respects those
directives.
Allow the spider's author to selectively ignore Cache-Control directives
that are known to be unimportant for the sites being crawled.
We assume that the spider will not issue Cache-Control directives in
requests unless it actually needs them, so directives in requests are
not filtered.
This allows spiders to be configured with the full RFC2616 cache policy,
but avoid revalidation on a request-by-request basis, while remaining
conformant with the HTTP spec.
There are two functions, `configure_logging` and `log_scrapy_info` which
intend to replace scrapy.log.start and scrapy.log.scrapy_info
respectively.
Creating new functions makes evident the backward incompatible change of
using another logging system, and since the Python logging module is a
standard builtin, additional helpers make sense to be on a scrapy/utils
file.
This allows to remove `get_testlog` helper, `flushLoggedErrors` from
twisted.trial.unittest.TestCase and Twisted log observers created for
each test on conftest.py.
Changes:
- Each module takes 'scrapy' logger and logs through it
- Lazy string evaluation in all log messages
- Added missing log messages in scrapy/core/engine.py
- Contextual data such as crawler or spider instances, and failures