Mikhail Korobov
64399d18d8
Stop reactor on Ctrl-C regardless of 'stop_after_crawl'. Fixes GH-1279.
2015-06-06 02:53:36 +05:00
Mikhail Korobov
33d145e2f5
CrawlerProcess cleanup
...
* remove unneeded lambda;
* extract _get_dns_resolver method and format code to pep8.
2015-06-06 02:49:39 +05:00
Julia Medina
24d8a85269
Update release notes for 1.0.0rc2
...
(cherry picked from commit 6e61d54168cf471363be3e7e54d75ad544b9f6e1)
2015-06-05 17:11:40 -03:00
Chris Nilsson
eae25a04d9
Added MEMUSAGE_CHECK_INTERVAL_SECONDS to Memory usage extension options.
...
Kept the default as it was, at 60.0 seconds. But added a setting to
allow this to be changed as desired.
2015-06-06 00:39:14 +10:00
Daniel Graña
d9bcd48606
Merge pull request #1278 from Curita/remove-tz-aware-logformat
...
Remove deprecated %z formatting from the default LOG_DATEFORMAT
2015-06-04 13:39:01 -03:00
Julia Medina
367ea81e71
Remove deprecated %z formatting from the default LOG_DATEFORMAT
2015-06-04 04:11:23 +08:00
Mikhail Korobov
f312ffcb54
Merge pull request #1276 from scrapy/fix-spider-settings
...
Fix Spider.custom_settings
2015-06-03 22:14:04 +05:00
Mikhail Korobov
d42c420a6d
fixed spider custom_settings
...
https://github.com/scrapy/scrapy/pull/1128 moved spidercls.update_settings
call to a later stage; this commit moves it back.
2015-06-03 04:29:10 +05:00
Mikhail Korobov
cc2f3e1b46
TST a test case to show custom_settings doesn't always work
2015-06-03 04:26:20 +05:00
Daniel Graña
d52cf8bb03
Merge pull request #1267 from Curita/fix-1265
...
Fix #1265
2015-06-01 20:31:46 -03:00
Julia Medina
ffc7b7fd6c
Add helper to update deprecated class paths
2015-06-01 17:01:33 -03:00
Ally Weir
bd2fe996aa
Spelling correction
...
incorrect use of "too" instead of "to"
2015-06-01 20:47:22 +05:00
Julia Medina
9d1cf230ed
Merge pull request #1268 from scrapy/crawlerprocess-dict-settings
...
fixed CrawlerProcess when settings are passed as dicts
2015-06-01 12:35:54 -03:00
Marven Sanchez
8771d1f79b
Update HTTPCache middleware docs
2015-06-01 18:20:59 +08:00
Marven Sanchez
bb3ebf13f9
Add tests for RFC2616 policy enhancements
...
Add `scrapy/downloadermiddlewares/httpcache.py` to `tests/py3-ignores.txt
2015-06-01 18:20:12 +08:00
Jamey Sharp
1991550442
Allow client to bound max-age for revalidation.
...
Unlike specifying "Cache-Control: no-cache", if the request specifies
"max-age=0", then the cached validators will be used if possible to
avoid re-fetching unchanged pages.
That said, it's still useful to be able to specify "no-cache" on the
request, in cases where the origin server may have changed page contents
without changing validators.
2015-06-01 18:06:36 +08:00
Jamey Sharp
c3b2cabf6c
Allow setting RFC2616Policy to cache unconditionally.
...
A spider may wish to have all responses available in the cache, for
future use with "Cache-Control: max-stale", for instance. The
DummyPolicy caches all responses but never revalidates them, and
sometimes a more nuanced policy is desirable.
This setting still respects "Cache-Control: no-store" directives in
responses. If you don't want that, filter "no-store" out of the
Cache-Control headers in responses you feed to the cache middleware.
2015-06-01 18:06:35 +08:00
Jamey Sharp
e23a381337
Let spiders ignore bogus Cache-Control headers.
...
Sites often set "no-store", "no-cache", "must-revalidate", etc., but get
upset at the traffic a spider can generate if it respects those
directives.
Allow the spider's author to selectively ignore Cache-Control directives
that are known to be unimportant for the sites being crawled.
We assume that the spider will not issue Cache-Control directives in
requests unless it actually needs them, so directives in requests are
not filtered.
2015-06-01 18:06:35 +08:00
Jamey Sharp
dd3a46295c
Support "Cache-Control: max-stale" in requests.
...
This allows spiders to be configured with the full RFC2616 cache policy,
but avoid revalidation on a request-by-request basis, while remaining
conformant with the HTTP spec.
2015-06-01 18:06:35 +08:00
Jamey Sharp
4446baae33
Use cached responses if revalidation errors out.
2015-06-01 18:06:35 +08:00
Mikhail Korobov
aa6a72707d
fixed CrawlerProcess when settings are passed as dicts
...
See https://github.com/scrapy/scrapy/pull/1156
2015-05-30 06:59:15 +05:00
Mikhail Korobov
342cb622f1
DOC fix non-working link (by removing it).
...
See https://github.com/scrapy/scrapy/pull/1260
2015-05-27 23:04:58 +05:00
Julia Medina
343d20d791
Update 1.0 release notes
2015-05-27 11:53:54 -03:00
Julia Medina
62a6eff218
Merge pull request #1259 from chekunkov/log-counter-handler-is-never-removed
...
[MRG +1] LogCounterHandler is never removed from root handlers list, fix that
2015-05-27 11:42:19 -03:00
Julia Medina
26f50d3f43
Extend regex for tags that deploy to PyPI to support new release cycle
2015-05-27 09:17:18 -03:00
Alexander Chekunkov
b2765aabd8
LogCounterHandler is never removed from root handlers list, fix that
...
lambda is garbage collected and because receiver is added as weak reference by default - when signals.engine_stopped is fired logging.root.removeHandler is not executed. Fixed that by assigning lambda to a private argument and not by using connect(..., weak=False) because I belive this lambda function should be collected with crawler object
2015-05-27 13:52:47 +07:00
Daniel Graña
5ee08865d6
Merge pull request #1258 from chekunkov/crawler-process-stopping-is-no-more
...
[MRG+1] Remove CrawlerProcess.stopping as it isn't used any more
2015-05-26 15:32:24 -03:00
Alexander Chekunkov
b0ea3e38d1
remove CrawlerProcess.stopping as it isn't used any more
2015-05-26 17:37:16 +07:00
Pablo Hoffman
545c4224f9
update old crawlera link
2015-05-25 16:01:54 -03:00
Daniel Graña
ebe889a663
Unquote request path before passing to FTPClient, it already escape paths
2015-05-23 20:50:30 -03:00
Daniel Graña
3545468389
Merge branch 'deferdelay'
2015-05-23 18:09:20 -03:00
Daniel Graña
d439c26d76
update docstring and release notes
2015-05-22 20:00:58 -03:00
Alexey Vishnevsky
27ce3225bd
Makes scrapy more async by letting to reactor spend another couple of cycles to accomplish its needs.
2015-05-22 17:05:19 -03:00
Julia Medina
4b2763c6f9
Bump version: 1.0.0rc1 → 1.1.0dev1
2015-05-22 13:24:50 -03:00
Julia Medina
de6d232a02
Bump version: 0.25.1 → 1.0.0rc1
1.0.0rc1
2015-05-22 13:24:27 -03:00
Julia Medina
29529e5e8e
Merge pull request #1244 from Curita/1.0-release-notes
...
1.0 release notes
2015-05-22 13:21:17 -03:00
Julia Medina
600164594c
New release cycle in .bumpversion.cfg
...
1.0.0dev1 -> 1.0.0rc1 -> 1.0.0 -> 1.1.0dev1 -> ...
2015-05-22 12:59:21 -03:00
Julia Medina
afcf70cdc6
Add 1.0 release notes
2015-05-22 12:53:11 -03:00
Mikhail Korobov
cc2258b2bb
Merge pull request #1145 from bosnj/master
...
[MRG+1] default return value for extract_first
2015-05-21 22:03:54 +05:00
Daniel Graña
58717472f7
Merge pull request #1250 from chekunkov/scrapy-log-fix-incompatible-change
...
[MRG+1] Keep level_names in scrapy.log for backwards compatibility
2015-05-21 10:46:39 -03:00
Alexander Chekunkov
795ca3945f
keep level_names in scrapy.log for backwards compatibility
2015-05-21 08:56:44 +00:00
Daniel Graña
ee59112480
Merge pull request #1224 from scrapy/fix-empty-feed-export-fields
...
[MRG] fixed FEED_EXPORT_FIELDS handling (see #1223 )
2015-05-19 16:36:05 -03:00
Daniel Graña
5beb9d251c
Merge pull request #1243 from scrapy/remove-contrib-from-py3-ignores
...
remove unnecessary lines from py3-ignores
2015-05-19 12:20:51 -03:00
Mikhail Korobov
7a5b5ec4d6
TST remove unnecessary lines from py3-ignores
...
scrapy/contrib is already skipped - see https://github.com/scrapy/scrapy/pull/1165
2015-05-19 00:57:39 +05:00
Mikhail Korobov
21b17734c5
Merge pull request #1242 from Curita/exporters-single-module
...
Move exporters/__init__.py to exporters.py
2015-05-19 00:10:30 +05:00
Julia Medina
044f31cb8d
Merge pull request #1240 from scrapy/fix-feedexport-logging
...
MRG+1 fixed FeedExporter shutdown log messages
2015-05-18 14:54:11 -03:00
Julia Medina
af0c8f82f4
Move exporters/__init__.py to exporters.py
2015-05-18 14:46:23 -03:00
Mikhail Korobov
60e79db3ee
fixed FeedExporter shutdown log messages
2015-05-18 19:28:37 +05:00
Mikhail Korobov
9b0ca1b7a0
drop support for FEED_EXPORT_FIELD=[] meaning "no fields"
2015-05-18 17:13:25 +05:00
Mikhail Korobov
9fb318338b
support FEED_EXPORT_FIELDS=[]
2015-05-18 16:44:02 +05:00