1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 13:33:28 +00:00

4677 Commits

Author SHA1 Message Date
Marven Sanchez
bb3ebf13f9 Add tests for RFC2616 policy enhancements
Add `scrapy/downloadermiddlewares/httpcache.py` to `tests/py3-ignores.txt
2015-06-01 18:20:12 +08:00
Jamey Sharp
1991550442 Allow client to bound max-age for revalidation.
Unlike specifying "Cache-Control: no-cache", if the request specifies
"max-age=0", then the cached validators will be used if possible to
avoid re-fetching unchanged pages.

That said, it's still useful to be able to specify "no-cache" on the
request, in cases where the origin server may have changed page contents
without changing validators.
2015-06-01 18:06:36 +08:00
Jamey Sharp
c3b2cabf6c Allow setting RFC2616Policy to cache unconditionally.
A spider may wish to have all responses available in the cache, for
future use with "Cache-Control: max-stale", for instance. The
DummyPolicy caches all responses but never revalidates them, and
sometimes a more nuanced policy is desirable.

This setting still respects "Cache-Control: no-store" directives in
responses. If you don't want that, filter "no-store" out of the
Cache-Control headers in responses you feed to the cache middleware.
2015-06-01 18:06:35 +08:00
Jamey Sharp
e23a381337 Let spiders ignore bogus Cache-Control headers.
Sites often set "no-store", "no-cache", "must-revalidate", etc., but get
upset at the traffic a spider can generate if it respects those
directives.

Allow the spider's author to selectively ignore Cache-Control directives
that are known to be unimportant for the sites being crawled.

We assume that the spider will not issue Cache-Control directives in
requests unless it actually needs them, so directives in requests are
not filtered.
2015-06-01 18:06:35 +08:00
Jamey Sharp
dd3a46295c Support "Cache-Control: max-stale" in requests.
This allows spiders to be configured with the full RFC2616 cache policy,
but avoid revalidation on a request-by-request basis, while remaining
conformant with the HTTP spec.
2015-06-01 18:06:35 +08:00
Jamey Sharp
4446baae33 Use cached responses if revalidation errors out. 2015-06-01 18:06:35 +08:00
Julia Medina
9a3e3ba505 Move scrapy/contrib remaining top-level files to scrapy/extensions 2015-04-29 21:27:19 -03:00
Julia Medina
e262c5b8d5 scrapy/contrib/spiders shims 2015-04-29 21:27:19 -03:00
Julia Medina
fc346cba4d Move scrapy/contrib/spiders to scrapy/spiders 2015-04-29 21:27:19 -03:00
Julia Medina
b2a15ddbf3 scrapy/contrib/spidermiddleware shims 2015-04-29 21:26:35 -03:00
Julia Medina
180272c092 Move scrapy/contrib/spidermiddleware to scrapy/spidermiddlewares 2015-04-29 21:26:35 -03:00
Julia Medina
c97a69c907 scrapy/contrib/pipeline shims 2015-04-29 21:26:35 -03:00
Julia Medina
8021df18d4 Move scrapy/contrib/pipeline to scrapy/pipelines 2015-04-29 21:26:35 -03:00
Julia Medina
d7e60f3c71 scrapy/contrib/loader shims 2015-04-29 21:24:30 -03:00
Julia Medina
b47228ada8 Move scrapy/contrib/loader to scrapy/loader 2015-04-29 21:24:30 -03:00
Julia Medina
569156be19 scrapy/contrib/linkextractors shims 2015-04-29 21:24:30 -03:00
Julia Medina
cf064b1437 Move scrapy/contrib/linkextractors to scrapy/linkextractors 2015-04-29 21:24:30 -03:00
Julia Medina
152594ce99 scrapy/contrib/exporter shims 2015-04-29 21:24:30 -03:00
Julia Medina
7804b3d778 Move scrapy/contrib/exporter to scrapy/exporters 2015-04-29 21:24:30 -03:00
Julia Medina
6b4c00cc9b scrapy/contrib/downloadermiddleware shims 2015-04-29 21:24:30 -03:00
Julia Medina
d7c444fefb Move scrapy/contrib/downloadermiddleware to scrapy/downloadermiddlewares 2015-04-29 21:24:30 -03:00
Pablo Hoffman
9441761aef Merge pull request #1196 from mineo/patch-1
Remove a duplicate word
2015-04-29 17:48:19 -03:00
Wieland Hoffmann
de6501ed1b Remove a duplicate word 2015-04-29 22:31:48 +02:00
Pablo Hoffman
3d2b74a6ff Merge pull request #1188 from eliasdorneles/favoring_web_scraping_over_screen_scraping
[MRG+1] Favoring web scraping over screen scraping in the descriptions
2015-04-29 16:49:43 -03:00
Mikhail Korobov
fbb1078f58 Merge pull request #1060 from Curita/python-logging
[MRG+1] Python logging
2015-04-29 23:20:34 +05:00
Daniel Graña
5eb098a939 Merge pull request #1168 from scrapy/service-identity
install service_identity package in tests to prevent warnings
2015-04-28 23:48:58 -03:00
Elias Dorneles
3d3633f3d2 favoring web scraping over screen scraping in the descriptions 2015-04-25 11:20:20 -03:00
Mikhail Korobov
fa1039f5b2 Merge pull request #1187 from Curita/relax-spiderloader-check
Relax SpiderLoader interface check
scrapy-0.25.1-sc
2015-04-24 03:40:10 +05:00
Julia Medina
cc4c31e426 Relax SpiderLoader interface check 2015-04-23 15:08:04 -03:00
Julia Medina
1d8f8221e6 Add backward compatibility to LogFormatter 2015-04-22 17:27:24 -03:00
Julia Medina
4858af4e94 Fix backward compatible functions in scrapy.log 2015-04-22 17:27:24 -03:00
Julia Medina
7a92dae4c8 Change Scrapy log output through docs 2015-04-22 17:27:24 -03:00
Julia Medina
6d1205063c Add a filter to replace '__name__' loggers with 'scrapy' 2015-04-22 17:24:41 -03:00
Julia Medina
4f54ca3294 Change 'scrapy' logger for '__name__' on every module 2015-04-22 17:24:41 -03:00
Julia Medina
69a3d58110 Basic example on manually configuring log handlers 2015-04-22 17:24:41 -03:00
Julia Medina
bd0b639b21 Fix logging usage across docs 2015-04-22 17:24:41 -03:00
Julia Medina
4811d16f1d Update logger attr and log method in the Spiders topic on docs 2015-04-22 17:24:41 -03:00
Julia Medina
d47a7edc65 Update Logging topic on docs 2015-04-22 17:24:40 -03:00
Julia Medina
ccdd8bfbcc Parametrize log formatting strings 2015-04-22 17:24:40 -03:00
Julia Medina
21b9f377d6 Deprecate more frequently used functions from scrapy/log.py 2015-04-22 17:24:40 -03:00
Julia Medina
c174d78f12 Deprecate scrapy/log.py 2015-04-22 17:24:40 -03:00
Julia Medina
6acb3848fb Stdout redirect in configure_logging 2015-04-22 17:24:40 -03:00
Julia Medina
ffd97f2f1b Set root handlers based on settings in configure_logging 2015-04-22 17:24:40 -03:00
Julia Medina
1c8708eb82 Create a logger for every Spider and adapt Spider.log to log through it 2015-04-22 17:24:40 -03:00
Julia Medina
ac40ef611a Custom handler to count log level occurrences in a crawler 2015-04-22 17:24:40 -03:00
Julia Medina
b75556ef79 Add a logging filter to mimic Twisted's log.err formating for Failures 2015-04-22 17:24:40 -03:00
Julia Medina
8baad55267 New scrapy/utils/log.py file with basic log helpers
There are two functions, `configure_logging` and `log_scrapy_info` which
intend to replace scrapy.log.start and scrapy.log.scrapy_info
respectively.

Creating new functions makes evident the backward incompatible change of
using another logging system, and since the Python logging module is a
standard builtin, additional helpers make sense to be on a scrapy/utils
file.
2015-04-22 17:24:40 -03:00
Julia Medina
6f9b423215 Restructure LogFormatter to comply with std logging calls 2015-04-22 17:24:40 -03:00
Julia Medina
c2d716807a Use LogCapture in testfixtures package for tests
This allows to remove `get_testlog` helper, `flushLoggedErrors` from
twisted.trial.unittest.TestCase and Twisted log observers created for
each test on conftest.py.
2015-04-22 17:24:40 -03:00
Julia Medina
7a958f90be Replace scrapy.log calls for their equivalents in the logging std module
Changes:
 - Each module takes 'scrapy' logger and logs through it
 - Lazy string evaluation in all log messages
 - Added missing log messages in scrapy/core/engine.py
 - Contextual data such as crawler or spider instances, and failures
2015-04-22 17:24:39 -03:00