Mikhail Korobov
1740fcf1a6
DOC SignalManager docstrings. See GH-713.
...
This change is not 100% backwards compatible because of *args changes.
Their usage was not documented, so we're not breaking public interface.
2015-06-08 21:05:58 +05:00
Mikhail Korobov
9a787893e3
(backwards-incompatible) allow to pass settings=None to configure_logging
...
* use explicit argument for disabling root handler;
* handle LOG_STDOUT even if install_root_handler is False
2015-06-08 19:54:18 +05:00
Mikhail Korobov
3cbf8a0b2b
extract CrawlerRunner._crawl method which always expects Crawler
...
It provides an extension point where crawler instance is available;
it should make it easier to write alternative CrawlerRunner.crawl
implementations.
See also: https://github.com/scrapy/scrapy/pull/1256
2015-06-08 18:35:44 +05:00
Pawel Miech
e575f44446
[settings/default_settings.py] dont retry 400
...
As in HTTP specs:
"10.4.1 400 Bad Request
The request could not be understood by the server due to malformed
syntax. The client SHOULD NOT repeat the request without
modifications."
Scrapy should not retry 400 by default.
2015-06-08 10:52:42 +02:00
Daniel Graña
87293965db
Merge pull request #1285 from scrapy/optional-settings-arguments
...
make it easier to use default settings
2015-06-07 20:29:24 -03:00
Chris Nilsson
61dec83f70
Moved default value of MEMUSAGE_CHECK_INTERVAL_SECONDS to default_settings
2015-06-06 11:19:29 +10:00
Chris Nilsson
0c532baf4c
Removed typo, and clarified time unit of setting
2015-06-06 11:18:13 +10:00
Mikhail Korobov
d047665c02
make "settings" argument optional for Crawler, CrawlerRunner and CrawlerProcess
2015-06-06 03:23:13 +05:00
Mikhail Korobov
64399d18d8
Stop reactor on Ctrl-C regardless of 'stop_after_crawl'. Fixes GH-1279.
2015-06-06 02:53:36 +05:00
Mikhail Korobov
33d145e2f5
CrawlerProcess cleanup
...
* remove unneeded lambda;
* extract _get_dns_resolver method and format code to pep8.
2015-06-06 02:49:39 +05:00
Julia Medina
24d8a85269
Update release notes for 1.0.0rc2
...
(cherry picked from commit 6e61d54168cf471363be3e7e54d75ad544b9f6e1)
2015-06-05 17:11:40 -03:00
Chris Nilsson
eae25a04d9
Added MEMUSAGE_CHECK_INTERVAL_SECONDS to Memory usage extension options.
...
Kept the default as it was, at 60.0 seconds. But added a setting to
allow this to be changed as desired.
2015-06-06 00:39:14 +10:00
Daniel Graña
d9bcd48606
Merge pull request #1278 from Curita/remove-tz-aware-logformat
...
Remove deprecated %z formatting from the default LOG_DATEFORMAT
2015-06-04 13:39:01 -03:00
Julia Medina
367ea81e71
Remove deprecated %z formatting from the default LOG_DATEFORMAT
2015-06-04 04:11:23 +08:00
Mikhail Korobov
f312ffcb54
Merge pull request #1276 from scrapy/fix-spider-settings
...
Fix Spider.custom_settings
2015-06-03 22:14:04 +05:00
Mikhail Korobov
d42c420a6d
fixed spider custom_settings
...
https://github.com/scrapy/scrapy/pull/1128 moved spidercls.update_settings
call to a later stage; this commit moves it back.
2015-06-03 04:29:10 +05:00
Mikhail Korobov
cc2f3e1b46
TST a test case to show custom_settings doesn't always work
2015-06-03 04:26:20 +05:00
Daniel Graña
d52cf8bb03
Merge pull request #1267 from Curita/fix-1265
...
Fix #1265
2015-06-01 20:31:46 -03:00
Julia Medina
ffc7b7fd6c
Add helper to update deprecated class paths
2015-06-01 17:01:33 -03:00
Ally Weir
bd2fe996aa
Spelling correction
...
incorrect use of "too" instead of "to"
2015-06-01 20:47:22 +05:00
Julia Medina
9d1cf230ed
Merge pull request #1268 from scrapy/crawlerprocess-dict-settings
...
fixed CrawlerProcess when settings are passed as dicts
2015-06-01 12:35:54 -03:00
Marven Sanchez
8771d1f79b
Update HTTPCache middleware docs
2015-06-01 18:20:59 +08:00
Marven Sanchez
bb3ebf13f9
Add tests for RFC2616 policy enhancements
...
Add `scrapy/downloadermiddlewares/httpcache.py` to `tests/py3-ignores.txt
2015-06-01 18:20:12 +08:00
Jamey Sharp
1991550442
Allow client to bound max-age for revalidation.
...
Unlike specifying "Cache-Control: no-cache", if the request specifies
"max-age=0", then the cached validators will be used if possible to
avoid re-fetching unchanged pages.
That said, it's still useful to be able to specify "no-cache" on the
request, in cases where the origin server may have changed page contents
without changing validators.
2015-06-01 18:06:36 +08:00
Jamey Sharp
c3b2cabf6c
Allow setting RFC2616Policy to cache unconditionally.
...
A spider may wish to have all responses available in the cache, for
future use with "Cache-Control: max-stale", for instance. The
DummyPolicy caches all responses but never revalidates them, and
sometimes a more nuanced policy is desirable.
This setting still respects "Cache-Control: no-store" directives in
responses. If you don't want that, filter "no-store" out of the
Cache-Control headers in responses you feed to the cache middleware.
2015-06-01 18:06:35 +08:00
Jamey Sharp
e23a381337
Let spiders ignore bogus Cache-Control headers.
...
Sites often set "no-store", "no-cache", "must-revalidate", etc., but get
upset at the traffic a spider can generate if it respects those
directives.
Allow the spider's author to selectively ignore Cache-Control directives
that are known to be unimportant for the sites being crawled.
We assume that the spider will not issue Cache-Control directives in
requests unless it actually needs them, so directives in requests are
not filtered.
2015-06-01 18:06:35 +08:00
Jamey Sharp
dd3a46295c
Support "Cache-Control: max-stale" in requests.
...
This allows spiders to be configured with the full RFC2616 cache policy,
but avoid revalidation on a request-by-request basis, while remaining
conformant with the HTTP spec.
2015-06-01 18:06:35 +08:00
Jamey Sharp
4446baae33
Use cached responses if revalidation errors out.
2015-06-01 18:06:35 +08:00
Mikhail Korobov
aa6a72707d
fixed CrawlerProcess when settings are passed as dicts
...
See https://github.com/scrapy/scrapy/pull/1156
2015-05-30 06:59:15 +05:00
Mikhail Korobov
342cb622f1
DOC fix non-working link (by removing it).
...
See https://github.com/scrapy/scrapy/pull/1260
2015-05-27 23:04:58 +05:00
Julia Medina
343d20d791
Update 1.0 release notes
2015-05-27 11:53:54 -03:00
Julia Medina
62a6eff218
Merge pull request #1259 from chekunkov/log-counter-handler-is-never-removed
...
[MRG +1] LogCounterHandler is never removed from root handlers list, fix that
2015-05-27 11:42:19 -03:00
Julia Medina
26f50d3f43
Extend regex for tags that deploy to PyPI to support new release cycle
2015-05-27 09:17:18 -03:00
Alexander Chekunkov
b2765aabd8
LogCounterHandler is never removed from root handlers list, fix that
...
lambda is garbage collected and because receiver is added as weak reference by default - when signals.engine_stopped is fired logging.root.removeHandler is not executed. Fixed that by assigning lambda to a private argument and not by using connect(..., weak=False) because I belive this lambda function should be collected with crawler object
2015-05-27 13:52:47 +07:00
Daniel Graña
5ee08865d6
Merge pull request #1258 from chekunkov/crawler-process-stopping-is-no-more
...
[MRG+1] Remove CrawlerProcess.stopping as it isn't used any more
2015-05-26 15:32:24 -03:00
Alexander Chekunkov
b0ea3e38d1
remove CrawlerProcess.stopping as it isn't used any more
2015-05-26 17:37:16 +07:00
Pablo Hoffman
545c4224f9
update old crawlera link
2015-05-25 16:01:54 -03:00
Daniel Graña
ebe889a663
Unquote request path before passing to FTPClient, it already escape paths
2015-05-23 20:50:30 -03:00
Daniel Graña
3545468389
Merge branch 'deferdelay'
2015-05-23 18:09:20 -03:00
Daniel Graña
d439c26d76
update docstring and release notes
2015-05-22 20:00:58 -03:00
Alexey Vishnevsky
27ce3225bd
Makes scrapy more async by letting to reactor spend another couple of cycles to accomplish its needs.
2015-05-22 17:05:19 -03:00
Julia Medina
4b2763c6f9
Bump version: 1.0.0rc1 → 1.1.0dev1
2015-05-22 13:24:50 -03:00
Julia Medina
de6d232a02
Bump version: 0.25.1 → 1.0.0rc1
1.0.0rc1
2015-05-22 13:24:27 -03:00
Julia Medina
29529e5e8e
Merge pull request #1244 from Curita/1.0-release-notes
...
1.0 release notes
2015-05-22 13:21:17 -03:00
Julia Medina
600164594c
New release cycle in .bumpversion.cfg
...
1.0.0dev1 -> 1.0.0rc1 -> 1.0.0 -> 1.1.0dev1 -> ...
2015-05-22 12:59:21 -03:00
Julia Medina
afcf70cdc6
Add 1.0 release notes
2015-05-22 12:53:11 -03:00
Mikhail Korobov
cc2258b2bb
Merge pull request #1145 from bosnj/master
...
[MRG+1] default return value for extract_first
2015-05-21 22:03:54 +05:00
Daniel Graña
58717472f7
Merge pull request #1250 from chekunkov/scrapy-log-fix-incompatible-change
...
[MRG+1] Keep level_names in scrapy.log for backwards compatibility
2015-05-21 10:46:39 -03:00
Alexander Chekunkov
795ca3945f
keep level_names in scrapy.log for backwards compatibility
2015-05-21 08:56:44 +00:00
Daniel Graña
ee59112480
Merge pull request #1224 from scrapy/fix-empty-feed-export-fields
...
[MRG] fixed FEED_EXPORT_FIELDS handling (see #1223 )
2015-05-19 16:36:05 -03:00