Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi driver crashes on eks nodes, sometimes. #1131

Closed
jacek-czernik opened this issue Jan 4, 2023 · 3 comments
Closed

csi driver crashes on eks nodes, sometimes. #1131

jacek-czernik opened this issue Jan 4, 2023 · 3 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@jacek-czernik
Copy link

What steps did you take and what happened:
we had ( or still have ) an issue on the EKS cluster where pods got stuck in the PodInitialization status when deploying.
In the log of the csi-secret-store-driver on the same k8s node, I found that it complains about a lack of Secretproviderclasses object, while it was created successfully.

At first, I tried to restart the csi driver on the node, but it starts failing with a memory violation error coming from one of the containers:
k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.2.0.

` panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x15a7746]

goroutine 46 [running]:
sigs.k8s.io/controller-runtime/pkg/client/apiutil.(*dynamicRESTMapper).RESTMapping.func1(0xc00009aeb8, 0x40e338)
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/client/apiutil/dynamicrestmapper.go:255 +0x46
sigs.k8s.io/controller-runtime/pkg/client/apiutil.(*dynamicRESTMapper).checkAndReload.func1(0xc0000a19f0, 0xc00009af70, 0x0, 0x0)
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/client/apiutil/dynamicrestmapper.go:155 +0x6b
sigs.k8s.io/controller-runtime/pkg/client/apiutil.(*dynamicRESTMapper).checkAndReload(0xc0000a19f0, 0x1d97840, 0xc000636000, 0xc00009af70, 0x0, 0x0)
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/client/apiutil/dynamicrestmapper.go:156 +0x8f
sigs.k8s.io/controller-runtime/pkg/client/apiutil.(*dynamicRESTMapper).RESTMapping(0xc0000a19f0, 0x1ba1590, 0x1a, 0x17fe8a2, 0x1c, 0xc000070c50, 0x1, 0x1, 0xc00009b1c8, 0x756ea1d45b7031, ...)
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/client/apiutil/dynamicrestmapper.go:253 +0x178
sigs.k8s.io/controller-runtime/pkg/cache/internal.createStructuredListWatch(0x1ba1590, 0x1a, 0x1b8aebd, 0x8, 0x17fe8a2, 0x1c, 0xc000230b60, 0x0, 0x2a00000000000001, 0x28)
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/cache/internal/informers_map.go:250 +0x142
sigs.k8s.io/controller-runtime/pkg/cache/internal.(*specificInformersMap).addInformerToMap(0xc000230b60, 0x1ba1590, 0x1a, 0x1b8aebd, 0x8, 0x17fe8a2, 0x1c, 0x1da1d58, 0x0, 0x0, ...)
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/cache/internal/informers_map.go:213 +0x159
sigs.k8s.io/controller-runtime/pkg/cache/internal.(*specificInformersMap).Get(0xc000230b60, 0x1dcaeb0, 0xc00011cf40, 0x1ba1590, 0x1a, 0x1b8aebd, 0x8, 0x17fe8a2, 0x1c, 0x1da1d58, ...)
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/cache/internal/informers_map.go:185 +0x407
sigs.k8s.io/controller-runtime/pkg/cache/internal.(*InformersMap).Get(0xc00046dd40, 0x1dcaeb0, 0xc00011cf40, 0x1ba1590, 0x1a, 0x1b8aebd, 0x8, 0x17fe8a2, 0x1c, 0x1da1d58, ...)
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/cache/internal/deleg_map.go:105 +0x18a
sigs.k8s.io/controller-runtime/pkg/cache.(*informerCache).List(0xc00019b628, 0x1dcaeb0, 0xc00011cf40, 0x1ddf840, 0xc000166b60, 0xc00044db70, 0x1, 0x1, 0x0, 0x0)
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/cache/informer_cache.go:79 +0xfd
sigs.k8s.io/secrets-store-csi-driver/controllers.(*SecretProviderClassPodStatusReconciler).Patcher(0xc00007cf60, 0x1dcaeb0, 0xc00011cf40, 0x0, 0x0)
	/go/src/sigs.k8s.io/secrets-store-csi-driver/controllers/secretproviderclasspodstatus_controller.go:118 +0x2f7
sigs.k8s.io/secrets-store-csi-driver/controllers.(*SecretProviderClassPodStatusReconciler).RunPatcher(0xc00007cf60, 0x1dcaeb0, 0xc00011cf40)
	/go/src/sigs.k8s.io/secrets-store-csi-driver/controllers/secretproviderclasspodstatus_controller.go:102 +0x129
main.main.func4(0xc00007cf60, 0x1dcaeb0, 0xc00011cf40)
	/go/src/sigs.k8s.io/secrets-store-csi-driver/cmd/secrets-store-csi-driver/main.go:179 +0x3f
created by main.main
	/go/src/sigs.k8s.io/secrets-store-csi-driver/cmd/secrets-store-csi-driver/main.go:178 +0xe31

`

The only workaround found so far is to replace the node

What did you expect to happen:
At least the pod restart should be successful.

Anything else you would like to add:
it happened twice in the last 5 days, both times on nodes running >161 days.
maybe related to kubernetes-sigs/controller-runtime#1891

Which provider are you using:
Aws secretmanager

Environment:

  • Secrets Store CSI Driver version: (use the image tag):
    k8s.gcr.io/csi-secrets-store/driver:v0.2.0
  • Kubernetes version: (use kubectl version):
    v1.21.14-eks-fb459a0
@jacek-czernik jacek-czernik added the kind/bug Categorizes issue or PR as related to a bug. label Jan 4, 2023
@aramase
Copy link
Member

aramase commented Jan 4, 2023

Hello 👋🏻 k8s.gcr.io/csi-secrets-store/driver:v0.2.0 is an unsupported version of the driver. Please refer to https://secrets-store-csi-driver.sigs.k8s.io/#project-status for the supported versions.

kubernetes-sigs/controller-runtime#1891 was included in v0.12.0 controller-runtime release and this project uses v0.13.0:

sigs.k8s.io/controller-runtime v0.13.0

I would recommend upgrading to the latest supported version and reopening the issue if you still encounter the error.

/close

@k8s-ci-robot
Copy link
Contributor

@aramase: Closing this issue.

In response to this:

Hello 👋🏻 k8s.gcr.io/csi-secrets-store/driver:v0.2.0 is an unsupported version of the driver. Please refer to https://secrets-store-csi-driver.sigs.k8s.io/#project-status for the supported versions.

kubernetes-sigs/controller-runtime#1891 was included in v0.12.0 controller-runtime release and this project uses v0.13.0:

sigs.k8s.io/controller-runtime v0.13.0

I would recommend upgrading to the latest supported version and reopening the issue if you still encounter the error.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jacek-czernik
Copy link
Author

Thank you @aramase

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants