Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive logging from shim into containerd logs (with containerd 1.6.8+) including all ENV vars #1532

Open
ibabou opened this issue Sep 29, 2022 · 7 comments

Comments

@ibabou
Copy link

ibabou commented Sep 29, 2022

Issue:

With the fix in containerd for (containerd/containerd#6990), where shim logs was being redirected incorrectly to panic.log. All shim logs now on 1.6.8+ get redirected and goes correctly into containerd.log. But with just info level, all shim calls are being called with operation and detailed trace of parameters. This now caused the containerd.log to increase. and also with some operations like HcsCreateProcess, it prints all ENV vars which might include secrets.

Test (containerd 1.6.8 + shim v0.9.4):

Use normal log level for containerd, and test with a sample spec like the following:

cat sample_secret.yaml 
apiVersion: v1
kind: Secret
metadata:
  name: mysecret
type: Opaque
stringData:
    SECRET_KEY: "THE_ACTUAL_SECRET"

cat test-one-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test
spec:
  nodeSelector:
    kubernetes.io/os: windows
  containers:
  - name: test
    env:
      - name: THE_SECRET_VALUE
        valueFrom:
          secretKeyRef:
            name: mysecret
            key: SECRET_KEY
    command:
      - powershell.exe
      - -command
      - "while ($true) { Start-Sleep -Seconds 10; Get-Date; }"
    image: mcr.microsoft.com/windows/servercore:ltsc2019
PS C:\etc\kubernetes\logs> Get-Content -Last 1000 .\containerd.log | Select-String "ACTUAL"

time="2022-09-29T22:49:57.167533600Z" level=info msg=Span duration=312.406ms endTime="2022-09-29 22:49:57.4799396
+0000 GMT m=+79.298800901" name=HcsCreateProcess parentSpanID=0000000000000000
processParameters="{\"CommandLine\":\"powershell.exe -command \\\"while ($true) { Start-Sleep -Seconds 10; Get-Date; }\
\\"\",\"WorkingDirectory\":\"C:\\\\\",\"Environment\":{\"KUBERNETES_PORT\":\"tcp://10.28.16.1:443\",\"KUBERNETES_PORT_4
43_TCP\":\"tcp://10.28.16.1:443\",\"KUBERNETES_PORT_443_TCP_ADDR\":\"10.28.16.1\",\"KUBERNETES_PORT_443_TCP_PORT\":\"44
3\",\"KUBERNETES_PORT_443_TCP_PROTO\":\"tcp\",\"KUBERNETES_SERVICE_HOST\":\"10.28.16.1\",\"KUBERNETES_SERVICE_PORT\":\"
443\",\"KUBERNETES_SERVICE_PORT_HTTPS\":\"443\",\"THE_SECRET_VALUE\":\"THE_ACTUAL_SECRET\"},\"CreateStdOutPipe\":true,\
"CreateStdErrPipe\":true}" spanID=7f329befb4853b75 startTime="2022-09-29 22:49:57.1675336 +0000 GMT m=+78.986394901"
traceID=bf394e65462626d4cdfdc7e92d914bf2

Also, many repeatable debug logs are coming from the shim, like the following:

{"level":"debug","msg":"Substituting RuntimeConfig DNS Nameservers: [10.28.16.10]","time":"2022-09-29T21:47:33Z"}
{"level":"debug","msg":"Substituting RuntimeConfig DNS Search: [kube-system.svc.cluster.local svc.cluster.local cluster.local c.ibrahimab-gke-dev.internal google.internal]","time":"2022-09-29T21:47:33Z"}
{"level":"debug","msg":"Substituting RuntimeConfig DNS Options: [ndots:5]","time":"2022-09-29T21:47:33Z"}
{"level":"debug","msg":"Parsing port mappings from []","time":"2022-09-29T21:47:33Z"}
{"level":"debug","msg":"[cni-net] Found network 17a89c98-c4a3-47b0-a96d-75eb3920ebe7 with subnet [{{10.32.1.0 ffffff00} 10.32.1.1 []}].","time":"2022-09-29T21:47:33Z"}
  • Is there a way to control the amount, as seems debug info is written by default?
  • Also, for operations like HcsCreateProcess where span for the method is logged, can we control what included or keep it on a higher level with less details (by default)?
@ibabou
Copy link
Author

ibabou commented Sep 29, 2022

/cc @dcantah Hey Danny, I had discussed with Mark about the issue last week. Can you please take a look and advice about it?

@dcantah
Copy link
Contributor

dcantah commented Sep 29, 2022

@ibabou The log level is configurable for our shim luckily, we accept a string value denoting the logrus log level much like containerd does. This would be under our shims configuration in containrds toml config file. Now if there's much benefit from some of the info level logs, but there's simply just too many of them that don't provide value we should try and take a look at how to cut down on some of these or move them to debug level etc. which will be a fairly manual process.

Couple things on the logs posted however, all of the below are logs from the CNI plugin being invoked, not from the shim:

{"level":"debug","msg":"Substituting RuntimeConfig DNS Nameservers: [10.28.16.10]","time":"2022-09-29T21:47:33Z"}
{"level":"debug","msg":"Substituting RuntimeConfig DNS Search: [kube-system.svc.cluster.local svc.cluster.local cluster.local c.ibrahimab-gke-dev.internal google.internal]","time":"2022-09-29T21:47:33Z"}
{"level":"debug","msg":"Substituting RuntimeConfig DNS Options: [ndots:5]","time":"2022-09-29T21:47:33Z"}
{"level":"debug","msg":"Parsing port mappings from []","time":"2022-09-29T21:47:33Z"}
{"level":"debug","msg":"[cni-net] Found network 17a89c98-c4a3-47b0-a96d-75eb3920ebe7 with subnet [{{10.32.1.0 ffffff00} 10.32.1.1 []}].","time":"2022-09-29T21:47:33Z"}

For the secrets being logged I'll semi-proxy to @helsaawy as he set this up, but we expose an option now on the shim to scrub sensitive info from our logs, this just wasn't in the 0.9.x line of tags here but is present in main here at the moment: https://github.com/microsoft/hcsshim/blob/main/cmd/containerd-shim-runhcs-v1/options/runhcs.proto#L110.
Given it's present at the moment, come containerd 1.7 this should be a configurable option when we'd cut a 0.10.0 tag, but truthfully we should backport this given the nature.

A sample containerd.toml setting a log level for our shim is below:

[plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "runhcs-wcow-process"
      disable_snapshot_annotations = false
      discard_unpacked_layers = false
      ignore_rdt_not_enabled_errors = false
      no_pivot = false
      snapshotter = "windows"
      [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""
        [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]
          LogLevel = "warn"

@ibabou
Copy link
Author

ibabou commented Sep 30, 2022

Thanks a lot @dcantah for the information! Just to make sure I understand correctly, is the default being higher than "info" (I mean it logs more than info-level). Or, you're suggesting we set to "warn" to avoid having all "info" logs logged at all including the ones shown above?

It's great that there is change in 0.10.0 to scrub the logs though!

Re CNI, does everything by default is logged?

@dcantah
Copy link
Contributor

dcantah commented Sep 30, 2022

@ibabou Right, it'll only log Warning level or above in the example given. We just call logrus.SetLevel with whatever was passed https://pkg.go.dev/github.com/sirupsen/logrus#SetLevel

For the CNI bit, I'm not sure so I'd have to look into this more. I assume from how they're invoked that containerd just takes any stderr/out output from the plugin and prints it.

@ibabou
Copy link
Author

ibabou commented Sep 30, 2022

To summarize:

  • We can change the level of logging from the shim - but in that case we'll opt-out from all this info logs.
  • We may need in future to change some of those on case-by-case basis to have less tracing on info level.
  • Regarding the secrets, hopefully this will be handled by the upcoming change, and you'll be planning to backport on 0.9.X with containerd 1.6.X in future.

Thanks again Danny for all the inputs!

@ibabou
Copy link
Author

ibabou commented Oct 1, 2022

I just tried out the "warn" level configuration, but this doesn't seem to take an effect with regards to the above case (to filter the info level logs from shim).

Updating configuration for example to:

[plugins.cri.containerd]
  snapshotter = 'windows'
  default_runtime_name = 'runhcs-wcow-process'
  [plugins.cri.containerd.default_runtime]
    [plugins.cri.containerd.default_runtime.options]
      LogLevel = "warn"
....

Then, restarting (stop & then start containerd service):

time="2022-10-01T00:30:03.995917000Z" level=info msg="Start cri plugin with config
{PluginConfig:{ContainerdConfig:{Snapshotter:windows DefaultRuntimeName:runhcs-wcow-process DefaultRuntime:{Type:
Path: Engine: PodAnnotations:[] ContainerAnnotations:[] Root: Options:map[LogLevel:warn
default_runtime_name:runhcs-wcow-process disable_snapshot_annotations:true discard_unpacked_layers:true
snapshotter:windows]
PrivilegedWithoutHostDevices:false BaseRuntimeSpec:.....

I can still see similar log with newly created pod:

time="2022-10-01T00:31:08.794312000Z" level=info msg=Span duration=118.4902ms endTime="2022-10-01 00:31:08.9128022
+0000 GMT m=+3.896162501" name=HcsCreateProcess parentSpanID=0000000000000000
processParameters="{\"CommandLine\":\"powershell.exe -command \\\"while ($true) { Start-Sleep -Seconds 10; Get-Date; }\
\\"\",\"WorkingDirectory\":\"C:\\\\\",\"Environment\":{\"KUBERNETES_PORT\":\"tcp://10.28.16.1:443\",\"KUBERNETES_PORT_4
43_TCP\":\"tcp://10.28.16.1:443\",\"KUBERNETES_PORT_443_TCP_ADDR\":\"10.28.16.1\",\"KUBERNETES_PORT_443_TCP_PORT\":\"44
3\",\"KUBERNETES_PORT_443_TCP_PROTO\":\"tcp\",\"KUBERNETES_SERVICE_HOST\":\"10.28.16.1\",\"KUBERNETES_SERVICE_PORT\":\"
443\",\"KUBERNETES_SERVICE_PORT_HTTPS\":\"443\",\"THE_SECRET_VALUE\":\"THE_ACTUAL_SECRET\"},\"CreateStdOutPipe\":true,\
"CreateStdErrPipe\":true}" spanID=1451c6d99e065723 startTime="2022-10-01 00:31:08.794312 +0000 GMT m=+3.777672301"
traceID=b2455e3b5f85aca48b729f8a77e56b76

@dcantah any thoughts?

@samuelkarp
Copy link

Given it's present at the moment, come containerd 1.7 this should be a configurable option when we'd cut a 0.10.0 tag, but truthfully we should backport this given the nature.

@dcantah Given containerd 1.6 is now LTS, I think it'd definitely be a good idea to backport once you've got the 0.10.0 tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants