Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external-provisioner yields whole stack trace when it loses connection to the CSI driver #732

Closed
ialidzhikov opened this issue May 2, 2022 · 9 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/storage Categorizes an issue or PR as relevant to SIG Storage.

Comments

@ialidzhikov
Copy link
Contributor

/sig storage
/kind bug

What happened:
external-provisioner yields whole stack trace when it loses connection to the CSI driver.

What you expected to happen:
external-provisioner to only log connection lost and to exit (without logging thousands of lines of stack trace that is not useful for anyone).

How to reproduce it:

{"log":"2022-05-01T18:20:45.305918834Z stderr F E0501 18:20:45.305666       1 connection.go:131] Lost connection to unix:///var/lib/csi/sockets/pluginproxy/csi.sock."}
{"log":"2022-05-01T18:20:45.306436522Z stderr F F0501 18:20:45.306382       1 connection.go:87] Lost connection to CSI driver, exiting"}
{"log":"2022-05-01T18:20:46.106055473Z stderr F goroutine 82 [running]:"}
{"log":"2022-05-01T18:20:46.106122391Z stderr F k8s.io/klog/v2.stacks(0xc00000e001, 0xc000bec1e0, 0x57, 0x1cd)"}
{"log":"2022-05-01T18:20:46.106166734Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9"}
{"log":"2022-05-01T18:20:46.106178188Z stderr F k8s.io/klog/v2.(*loggingT).output(0x2606640, 0xc000000003, 0x0, 0x0, 0xc000472f50, 0x2554978, 0xd, 0x57, 0x0)"}
{"log":"2022-05-01T18:20:46.106196151Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:975 +0x19b"}
{"log":"2022-05-01T18:20:46.106206488Z stderr F k8s.io/klog/v2.(*loggingT).printf(0x2606640, 0x3, 0x0, 0x0, 0x0, 0x0, 0x19a688e, 0x26, 0x0, 0x0, ...)"}
{"log":"2022-05-01T18:20:46.106231016Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:750 +0x191"}
{"log":"2022-05-01T18:20:46.1062379Z stderr F k8s.io/klog/v2.Fatalf(...)"}
{"log":"2022-05-01T18:20:46.10624548Z stderr F \t/workspace/vendor/k8s.io/klog/v2/klog.go:1514"}
{"log":"2022-05-01T18:20:46.10626969Z stderr F github.com/kubernetes-csi/csi-lib-utils/connection.ExitOnConnectionLoss.func1(0x2606640)"}
{"log":"2022-05-01T18:20:46.106276233Z stderr F \t/workspace/vendor/github.com/kubernetes-csi/csi-lib-utils/connection/connection.go:87 +0x1d4"}
{"log":"2022-05-01T18:20:46.106282511Z stderr F github.com/kubernetes-csi/csi-lib-utils/connection.connect.func1(0xc00037cba0, 0x29, 0x4a8174a0a, 0x4a8174a0a, 0xc000027101, 0x10, 0xc000027108)"}
{"log":"2022-05-01T18:20:46.106289479Z stderr F \t/workspace/vendor/github.com/kubernetes-csi/csi-lib-utils/connection/connection.go:134 +0x2aa"}
{"log":"2022-05-01T18:20:46.106298558Z stderr F google.golang.org/grpc.WithDialer.func1(0x1ba5ce0, 0xc000fc2ea0, 0xc00037cba0, 0x29, 0x10, 0x17d6240, 0x990163506e6a24f4, 0x2637dc0)"}
{"log":"2022-05-01T18:20:46.106305463Z stderr F \t/workspace/vendor/google.golang.org/grpc/dialoptions.go:398 +0x8e"}
{"log":"2022-05-01T18:20:46.106314589Z stderr F google.golang.org/grpc/internal/transport.dial(0x1ba5ce0, 0xc000fc2ea0, 0xc0000320a0, 0xc00037cba0, 0x29, 0x1981038, 0x9, 0xc00068c0a8, 0x0, 0x0, ...)"}
{"log":"2022-05-01T18:20:46.106320822Z stderr F \t/workspace/vendor/google.golang.org/grpc/internal/transport/http2_client.go:143 +0x2dd"}
{"log":"2022-05-01T18:20:46.106326854Z stderr F google.golang.org/grpc/internal/transport.newHTTP2Client(0x1ba5ce0, 0xc000fc2ea0, 0x1ba5c60, 0xc00099df80, 0xc00037cba0, 0x29, 0x1981038, 0x9, 0xc00068c0a8, 0x0, ...)"}

<omitted>

Anything else we need to know?:
The issue is similar to kubernetes/kubernetes#107665.

Environment:

  • external-provisioner: v2.1.1
  • Kubernetes version (use kubectl version): 1.21.10
@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. kind/bug Categorizes issue or PR as related to a bug. labels May 2, 2022
@pohly
Copy link
Contributor

pohly commented May 2, 2022

Care to submit a PR?

Simply update to the latest klog and then use klog.ErrorS + klog.FlushAndExit instead of klog.Fatal.

The same change needs to go into all sidecars which use klog.Fatal.

@ialidzhikov
Copy link
Contributor Author

Simply update to the latest klog and then use klog.ErrorS + klog.FlushAndExit instead of klog.Fatal.

Thanks! klog dependency in the HEAD is already at the latest tag (v2.60.1). Hence, I guess the only thing I have to do is to adapt the klog.Fatal usages.

@ialidzhikov
Copy link
Contributor Author

@ialidzhikov
Copy link
Contributor Author

ialidzhikov commented May 2, 2022

Looks like this is already fixed in https://github.com/kubernetes-csi/csi-lib-utils with kubernetes-csi/csi-lib-utils#81.

@pohly
Copy link
Contributor

pohly commented May 2, 2022

Then the dependency update in #710 should fix it.

@ialidzhikov
Copy link
Contributor Author

It should be rather fixed for external-provider >= v3.0.0. This is the commit that updated to github.com/kubernetes-csi/csi-lib-utils >= v0.10.0 -251509c.


Does it make sense to update external-provider release-2.1 and release-2.2 branches by updating github.com/kubernetes-csi/csi-lib-utils from v0.9.0 and v0.9.1 to v0.10.0 (or a potential v0.9.2)?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 31, 2022
@ialidzhikov
Copy link
Contributor Author

/close
as the issue is fixed in external-provider >= v3.0.0

@k8s-ci-robot
Copy link
Contributor

@ialidzhikov: Closing this issue.

In response to this:

/close
as the issue is fixed in external-provider >= v3.0.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/storage Categorizes an issue or PR as relevant to SIG Storage.
Projects
None yet
Development

No branches or pull requests

4 participants