logwatchers: add new kmsg-based kernel log watcher #41

euank · 2016-11-14T06:55:50Z

All the other loggers (afaik), like rsyslog and journald and what have you, simply read /dev/kmsg.

It's probably easier for everyone if we just read that directly too!

I put together a bit of code doing so over here, and this commit leverages that code for the node-problem-detector.

I plan to replace cadvisor's similar hacky log parsing with the same code as well.

So, what are the implications?
Well, one thing this can't do as well is read far into the past. This can't read further back than the ring buffer extends, so on startup it's quite possible it won't get as many old messages. That means really large lookback values will behave differently now because they'll read less.

That could have already happened depending on logrotation policy, but now it's a lot harder for the user to control (kernel dmesg size is much harder to change :).

However, I think the significant simplicity and cross-distro gains by doing this are worth it.

I also preemptively removed the old integrations as I think this makes them fully redundant, as well as the translators thingy since why would you want to translate when you're reading the one and only true source?

Edit: I added this as an additional plugin, but didn't change any defaults or such.

Fixes #14 #39

cc @Random-Liu @adohe @derekwaynecarr

This change is

euank · 2016-11-14T07:00:00Z

Testing done:

I manually built and published an image (available at euank/node-problem-detector:v0.6). I then edited the daemonset to reference that image, deployed it, and ssh'd into a node.

I ran echo "BUG: unable to handle kernel NULL pointer dereference at some-location" > /dev/kmsg as root, verified it showed up in dmesg, and then verified kubectl describe no had an event of 1m 1m 1 {kernel-monitor k8s1} Warning KernelPanic BUG: unable to handle kernel NULL pointer dereference at some-location.

Random-Liu · 2016-11-14T17:51:18Z

It's probably easier for everyone if we just read that directly too!

Agreed! Thanks a lot for helping on this! @euank

Well, one thing this can't do as well is read far into the past.

Yeah, this is the reason why I didn't use it before. We want to know whether the node panicked last time and the panic reason.

For example, for kernel panic (assuming the node is configured as reboot on kernel panic), we have to look back to figure out what happened. I think this is a useful feature that we want to keep, given that we already encountered some kernel panics:

Since it supports more os distro, but has less look back support. What about we make /dev/kmsg reader as the fall back log readers in #39, kernel monitor will use it once the other ones not work.

@euank Are you ok with this? :)

euank · 2016-11-14T21:27:11Z

@Random-Liu That makes some sense, but an alternate solution would be checkpointing. In general, we don't care about lookbehind on the initial boot/setup, only on reboots where the daemonset starts up again.
In those cases, if we checkpoint then we can be our own super limited logdaemon for the last $lookBehind duration of logs.

That might actually be simpler than integrating journald and syslog parsers.

The only benefit those loggers have over us checkpointing is that journald at some point was gonna integrate with pstore (https://www.kernel.org/doc/Documentation/ABI/testing/pstore) and, because it plans to delete after consuming, it'll be the only source of truth for it..... but that's not being done yet so that's a future theoretical issue/feature, not a realistic current concern

Really though, integrating with journald/syslog's kern.log is almost certainly the right way to go now that I think about the lookback bit more. Should I wait for your journald code to go in and rebase this on top of it as a fallback if that's the path we're taking?

Random-Liu · 2017-01-06T19:47:24Z

For example, for kernel panic (assuming the node is configured as reboot on kernel panic), we have to look back to figure out what happened. I think this is a useful feature that we want to keep, given that we already encountered some kernel panics:

I recently found out that there is no easy way to log kernel panics into logs, because mostly for safety kernel won't write anything when it's already panicked. So, my argument is wrong.

However, I still think look back is needed when NPD is firstly started or restarted. :)

dchen1107 · 2017-02-01T21:08:44Z

@euank, #39 was merged. Can you please rebase this one, so that we can continue reviewing? Thanks!

This adds a logwatcher which is able to parse kernel messages directly from the /dev/kmsg interface. This supports any modern linux distro, while also avoiding any dependency on libraries (e.g. as journald needs).

euank · 2017-03-10T05:36:04Z

Rebased, manually tested that it worked as I expected.

Random-Liu · 2017-03-10T05:46:55Z

@euank Thanks a lot! Will try to review it soon.

Random-Liu · 2017-05-22T22:05:52Z

@euank Will review this PR this week. Sorry for the delay, and thanks a lot for helping!

Random-Liu · 2017-05-30T22:11:00Z

pkg/systemlogmonitor/README.md

 arbitrary file based log.
-* [journald](https://github.com/kubernetes/node-problem-detector/blob/master/pkg/systemlogmonitor/logwatchers/journald): Log watcher for
-journald.
+* [journald](.//logwatchers/journald): Log watcher for


for journald

Random-Liu · 2017-05-30T22:11:48Z

pkg/systemlogmonitor/README.md

@@ -66,6 +67,7 @@ Log watcher specific configurations are configured in `pluginConfig`.
  * timestampFormat: The format of the timestamp. The format string is the time
    `2006-01-02T15:04:05Z07:00` in the expected format. (See
    [golang timestamp format](https://golang.org/pkg/time/#pkg-constants))
+* **kmsg**


kmsg: no configurations for now.

Random-Liu · 2017-05-30T22:32:06Z

pkg/systemlogmonitor/logwatchers/kmsg/log_watcher.go

+	cfg    types.WatcherConfig
+	logCh  chan *logtypes.Log
+	tomb   *util.Tomb
+	reader *bufio.Reader


Random-Liu · 2017-05-30T22:35:42Z

pkg/systemlogmonitor/logwatchers/kmsg/log_watcher.go

+
+// NewKmsgWatcher creates a watcher which will read messages from /dev/kmsg
+func NewKmsgWatcher(cfg types.WatcherConfig) types.LogWatcher {
+	kmsgparser.NewParser()


Unused code, remove it.

Random-Liu · 2017-05-30T22:42:54Z

pkg/systemlogmonitor/logwatchers/kmsg/log_watcher.go

+
+			// Discard too old messages
+			if k.clock.Since(msg.Timestamp) > lookback {
+				glog.V(5).Infof("throwing away msg %v for being too old: %v > %v", msg.Message, msg.Timestamp.String(), lookback.String())


nit: s/throwing/Throwing

Random-Liu · 2017-05-30T22:45:35Z

pkg/systemlogmonitor/logwatchers/kmsg/log_watcher.go

+		select {
+		case <-k.tomb.Stopping():
+			glog.Infof("Stop watching kernel log")
+			k.kmsgParser.Close()


nit: log error of Close.

Random-Liu · 2017-05-30T22:48:49Z

pkg/systemlogmonitor/logwatchers/kmsg/log_watcher_test.go

+	}
+}
+
+type fakeKmsgReader struct {


Random-Liu · 2017-05-30T22:50:31Z

@euank Do you have time to address the comments in this PR?

Random-Liu · 2017-05-30T22:52:58Z

The PR LGTM overall, only several small cleanups are needed.

Offline discussed with @dchen1107, since we want this PR badly for 1.7, I'll merge this first, and send another PR to address the comments. :)

@euank Thanks for working on this, this is very useful for us!

euank mentioned this pull request Nov 14, 2016

Journald support #39

Merged

euank force-pushed the kmsg-parser branch from 449a79c to 2779414 Compare November 14, 2016 06:59

euank force-pushed the kmsg-parser branch from 2779414 to d834639 Compare November 14, 2016 07:15

Random-Liu self-assigned this Nov 14, 2016

This was referenced Nov 16, 2016

cAdvisor leaking journalctl processes kubernetes/kubernetes#34965

Closed

oomparser: update to use kmsg based parser google/cadvisor#1544

Merged

Random-Liu mentioned this pull request Nov 22, 2016

Generalize kernel monitor. #44

Closed

euank added 2 commits March 9, 2017 20:40

logwatchers/kmsg: add initial kmsg watcher impl

9c23921

This adds a logwatcher which is able to parse kernel messages directly from the /dev/kmsg interface. This supports any modern linux distro, while also avoiding any dependency on libraries (e.g. as journald needs).

vendor: include kmsgparser

7364867

euank force-pushed the kmsg-parser branch from d834639 to 8d1158d Compare March 10, 2017 05:35

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 10, 2017

kmsg: update the docs to reference kmsg parser too

73cba49

euank force-pushed the kmsg-parser branch from 8d1158d to 73cba49 Compare March 10, 2017 05:38

euank changed the title ~~update kernel log parser to a kmsg based parser~~ logwatchers: add new kmsg-based kernel log watcher Mar 10, 2017

Random-Liu reviewed May 30, 2017

View reviewed changes

Random-Liu closed this May 30, 2017

Random-Liu reopened this May 30, 2017

Random-Liu merged commit be6c516 into kubernetes:master May 30, 2017

Random-Liu mentioned this pull request May 30, 2017

Cleanup kmsg log watcher #112

Merged

euank deleted the kmsg-parser branch May 30, 2017 23:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logwatchers: add new kmsg-based kernel log watcher #41

logwatchers: add new kmsg-based kernel log watcher #41

euank commented Nov 14, 2016 •

edited

euank commented Nov 14, 2016 •

edited

Random-Liu commented Nov 14, 2016 •

edited

euank commented Nov 14, 2016 •

edited

Random-Liu commented Jan 6, 2017 •

edited

dchen1107 commented Feb 1, 2017

euank commented Mar 10, 2017

Random-Liu commented Mar 10, 2017

Random-Liu commented May 22, 2017 •

edited

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu May 30, 2017

Random-Liu commented May 30, 2017

Random-Liu commented May 30, 2017 •

edited

logwatchers: add new kmsg-based kernel log watcher #41

logwatchers: add new kmsg-based kernel log watcher #41

Conversation

euank commented Nov 14, 2016 • edited

euank commented Nov 14, 2016 • edited

Random-Liu commented Nov 14, 2016 • edited

euank commented Nov 14, 2016 • edited

Random-Liu commented Jan 6, 2017 • edited

dchen1107 commented Feb 1, 2017

euank commented Mar 10, 2017

Random-Liu commented Mar 10, 2017

Random-Liu commented May 22, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Random-Liu commented May 30, 2017

Random-Liu commented May 30, 2017 • edited

euank commented Nov 14, 2016 •

edited

euank commented Nov 14, 2016 •

edited

Random-Liu commented Nov 14, 2016 •

edited

euank commented Nov 14, 2016 •

edited

Random-Liu commented Jan 6, 2017 •

edited

Random-Liu commented May 22, 2017 •

edited

Random-Liu commented May 30, 2017 •

edited