New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RBD: OOMKills occurs when secret metadata encryption type is used with multiple PVC create request. #3472
Comments
based on testing with secrets metadata and 128Mi memory limit with #3422 fix ceph-csi/internal/rbd/rbd_journal.go Line 345 in 07e9ded
Only important thing that occurs after this logline is Lifting up memory limit will lead to all PVCs going to bound state and can be succesfully mounted to a pod.
|
Edit
My fix will call setupencryption() again in repairvol step. This is causing looping OOMKill. |
OOMKill is due to this function ceph-csi/internal/kms/secretskms.go Line 274 in c65a4e1
https://pkg.go.dev/golang.org/x/crypto@v0.1.0/scrypt#Key This issue is isolated to secrets-metadata kms alone. |
Interesting! There have been some reports in the go crypto package issue tracker about scrypt memory: Is this perhaps not a leak, but the GC not kicking in soon enough? |
I have been looking into this a bit. Sharing my findings here hoping that they will be useful.
I'm not sure how to interpret these really, but here is the
And this is after it restarted:
One thing that caught my eye is the |
Some more experiments also hints that the issue is the parallel Any ideas for how to solve this? I'm not familiar with the code so unsure where to start looking. 🙁 |
On Mon, Jan 09, 2023 at 12:59:35AM -0800, Lennart Jern wrote:
Some more experiments also hints that the issue is the parallel `scrypt.Key` calls. With a memory limit of 256MiB, I can create 100 PVCs without issue as long as I do it one at a time with 1 sec sleep between. On the other hand, if I create 5-10 PVCs in one go I immediately get OOM.
Any ideas for how to solve this? I'm not familiar with the code so unsure where to start looking. :slightly_frowning_face:
Sounds like we need limit the number of concurrent scrypt.Key calls. A
semaphore like https://pkg.go.dev/golang.org/x/sync/semaphore might help
with that.
I guess it needs some configurable option, passed on the commandline of
the container.
|
Is anyone planning to work on this? |
@lentzi90 please feel free to take it |
Currently no one is working on this, you can take this up. As @nixpanic suggested we can surround the call to |
I picked this up today. Added a semaphore as suggested, proceeded to test it. Everything was looking good. With 256Mi limit I had no issues creating 100 PVCs. Then I went on to check that the issue still existed when using the upstream canary image and... had difficulty reproducing the issue 😮 It seems to me that it was actually solved already. I checked the commit log and found this one that could explain it: 36fe145
Can someone else confirm? |
On Fri, Jan 20, 2023 at 05:44:06AM -0800, Lennart Jern wrote:
I picked this up today. Added a semaphore as suggested, proceeded to test it. Everything was looking good. With 256Mi limit I had no issues creating 100 PVCs. Then I went on to check that the issue still existed when using the upstream canary image and... had difficulty reproducing the issue :open_mouth:
It seems to me that it was actually solved already. I checked the commit log and found this one that could explain it: 36fe145
From the release notes:
> ringhash: impose cap on max_ring_size to reduce possibility of OOMs (grpc/grpc-go#5801)
Can someone else confirm?
Thanks for looking into this :)
Good find! I have no idea if that is related... You could test your
build with ***@***.*** and see if that makes a
difference.
As creating the encryption keys is also very CPU intensive, having a
semaphore that limits the amount of concurrent requests would still be a
nice thing to have.
|
I would say this should be a separate issue and perhaps not best solved with a semaphore. What about using CPU limits? OOM was a big issue since the container would not recover from it and just go OOM again. But I didn't see any issues with CPU usage in my tests (all could easily be tested on a laptop, it stayed responsive throughout), so unless there is some concrete example of how and why a semaphore is needed for that, I would avoid it. It would be nice to get confirmation from someone else on the OOM issue though before closing this. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
I had hard times in figuring out a proper CPU limit and finally solved the OOM issue making my pod activities heavily CPU bounded, which helped with OOMs but slowed all the other operations. |
Although this change may improve it, I think the idea to limit the number of concurrent requests is still valid. Creating 100 PVS simultaneously will spawn 100 cryptsetup processes, and the pod memory utilization may reach the limit. |
@paolo-depa what version are you using? The original issue was reported for v3.7.1. Now v3.8.0 is out and from my tests before I would expect it to be fixed there. |
Just did some tests creating 100 pods using the v.3.8.0:
Adding a snippet from the message log of one of those events: 2023-02-23T16:48:06.349522+01:00 germ39 kernel: [ 5733.862631] cryptsetup invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=-997 |
This OOMKill happens in the csi-rbdplugin nodeplugin pod right ? This issue was originally to address OOMKills in provisioner pod during volume creation. That seems to be resolved now. I'd expect the CO in this case kublet or the one responsible for issuing nodepublish/stage calls to have some kind of Similar to the one for csi-provsioner https://github.com/kubernetes-csi/external-provisioner/blob/cd81ed5d31835d1aabad74db869c1165df9e3666/cmd/csi-provisioner/csi-provisioner.go#L81 |
I pushed the patch I was working on and created a draft PR: #3693 |
…h multiple PVC create request. ceph#3472 Signed-off-by: Stefan Haas <shaas@suse.com>
Running some tests having applied the PR: #3693 on top of the v3.8.0, results show high resilience even to the OOMkills occurring in the rbdplugin: having released any memory and cpu limitationson the pods, I could create flawlessly 100 PVCs. +1 to have it merged! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation. |
generateCipher is memory heavy, so to avoid OOM situations, a semaphore is added to limit concurrency here. Fixes: ceph#3472 Signed-off-by: Michael Fritch <mfritch@suse.com>
generateCipher is memory heavy, so to avoid OOM situations, a semaphore is added to limit concurrency here. Fixes: ceph#3472 Signed-off-by: Michael Fritch <mfritch@suse.com>
I tested secret based encryption with 3.7.1 i dont see any crash with below limits
but when i tested with metadata type encryption i can see the crash, this confirms we have a memory leak?
Originally posted by @Madhu-1 in #3402 (comment)
The text was updated successfully, but these errors were encountered: