Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kms: Add semaphore to limit concurrency #3693

Closed
wants to merge 1 commit into from

Conversation

lentzi90
Copy link
Contributor

@lentzi90 lentzi90 commented Feb 24, 2023

generateCipher is memory heavy, so to avoid OOM situations, a semaphore is added to limit concurrency here.

Fixes: #3472

NOTE: I think the original issue is already resolved (see discussion in the issue). I pushed this anyway because I had already done the work and hoping that it can be useful in other situations or just for discussion. Feel free to close!


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)
  • /retest all: run this in case the CentOS CI failed to start/report any test
    progress or results

generateCipher is memory heavy, so to avoid OOM situations, a semaphore
is added to limit concurrency here.

Signed-off-by: Lennart Jern <lennart.jern@est.tech>
Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to have, it adds to the stability of ceph-csi when multiple volumes are created at the same time. Volume creation is not a critical part of a workflow when running applications, a slight delay while generating the cipher should be acceptable for increased stability.

@@ -47,6 +47,8 @@ require (
sigs.k8s.io/controller-runtime v0.14.4
)

require golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, this could be part of the above require block

@nixpanic
Copy link
Member

commitlint fails because kms is not a valid prefix for commits. Use util instead?

@trociny
Copy link

trociny commented Feb 28, 2023

@lentzi90 Is there any estimate how "heavy" the generateCipher may be?

Because from the comment [1] of the related issue, the oom-killer was invoked by cryptsetup process, which is called by EncryptVolume, i.e. after generateCipher. And it looks like the memory limit was reached due to spawning many cryptsetup processes simultaneously and this patch will not prevent this (although it might throttle it a little and make the problem more difficult to reproduce).

It seems to me that a solution could be to limit the number of NodeStageVolume concurrent calls. Looking at the code, I suppose it is in this loop [2], where we spawn all NodeStageVolume calls concurrently [3] and then just wait for them to complete. I think it could do it in configurable chunks, e.g. not more than 10 calls simultaneously.

[1] #3472 (comment)
[2]

for i := range val.Items {

[3]
// run multiple NodeStageVolume calls concurrently

@lentzi90
Copy link
Contributor Author

lentzi90 commented Mar 1, 2023

@trociny like I mentioned on the issue already, I think this is a different issue. I just pushed the code that I already had for the original issue (which seems to be fixed already).
If you (or anyone else) would like to take this over and push a new PR I would be very happy. Unfortunately I won't have time to work on this in the near time.

@mergify
Copy link
Contributor

mergify bot commented Mar 7, 2023

This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏

@github-actions
Copy link

github-actions bot commented Apr 6, 2023

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Apr 6, 2023
@github-actions
Copy link

This pull request has been automatically closed due to inactivity. Please re-open if these changes are still required.

@github-actions github-actions bot closed this Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RBD: OOMKills occurs when secret metadata encryption type is used with multiple PVC create request.
3 participants