-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce goroutine leakage in test/integration/controlplane/transformation #111739
Conversation
Hi @sanwishe. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Instead of a manual stop method that is a no-op in many cases I think it would be better to have a context be used to cancel any implementations that explicitly need it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
@@ -198,7 +198,7 @@ func EtcdMain(tests func() int) { | |||
// like k8s.io/klog/v2.(*loggingT).flushDaemon() | |||
// TODO(#108483): Reduce this number once we address the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToDo can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hopefully can be deleted soon, but there are still some other leaks #108483 (comment)
ecdd747
to
1f7977e
Compare
@enj this is coming from #108483, it is not about context cancellation it is about storage shutdown, it needs a function that allows to stop and destroy all resources |
var once sync.Once | ||
destroyFunc := func() { | ||
// we know that storage destroy funcs are called multiple times (due to reuse in subresources). | ||
// Hence, we only destroy once. | ||
// TODO: fix duplicated storage destroy calls higher level | ||
once.Do(func() { | ||
transformer.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does order matters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It matters, and i have move it after client.close()
/assign @wojtek-t he is more familiar with this code |
b857165
to
110c4bc
Compare
/retest |
I understand what this code is trying to do. Contexts can be used to control lifetime, in the same way a manual stop method can. I do not want to expand the storage transformer interface with a method that one can forget to call. Instead, any implementation that needs any form of cleanup should take a context in its constructor and automatically perform said cleanup when the context is canceled. |
/assign |
a kms transformer which uses a grpc-service but doen't provide a mechnism to close the connection of grpc-service , so we should assure connection close while apiserver shutdown. since the lifecycle of the transformer is consistent with apiserver, does listenning a shutdown signal(or a stopCh) for grpc-service closing more suitable than a context? @aojea @enj @liggitt |
Storage is being destroyed at the very end:
@enj - I don't have very strong opinion on that and if we can make that work with contexts - I'm fine with it. kubernetes/staging/src/k8s.io/apiserver/pkg/server/options/encryptionconfig/config.go Line 245 in 18ce801
[it's called at the level of initializing options] So this approach would require significant changes to be implemented. OTOH, the current Destroy/Stop-like approach is relatively easy - we don't expect people to be doing it on their own. It's the responsibility of the server to do that on shutdown, and as such I don't think that the concern of someone forgetting to call it is a valid one. |
We already have existing bugs for CRDs not supporting encryption at rest because the wiring is incorrect so I don't buy the argument that "we won't forget because we only need to do it in one place in the API server" - nothing enforces that invariant. If we want to get the go routine number down before test freeze with this change, that is fine. I have to refactor a lot of this code for KMS v2 so I can clean it up then. |
My point was that cost is not per transformer-author. We have a fixed cost to properly wire the Initialize/Destroy interface (and yes, I agree with you that it's more than one place, because e.g. we do that differently for CRDs vs built-ins). I don't know what your plans for KMSv2 are exactly and I admit that the additional interface method isn't perfect. But it has one advantage of forcing people to think about the lifecycle, which definitely isn't the case for many authors now (not specific to transformers - it's the same for many controllers, plugins, etc.). I don't think it's super critical for 1.25, so if you are planning to changing that soon (in 1.26) I'm fine with waiting to see if we can make that work in a simpler way then. If this will take multiple releases to get there, I think we should try merging something like this (I didn't do detailed review of it - just skimmed it). |
KMS v2 is part of my day job so it will definitely get done 😉 @liggitt had expressed wanting this go routine bit fixed in 1.25 so this may be fine for now (I haven't looked thoroughly). |
/triage accepted |
nice @wojtek-t can you take a look and judge if we should add it to 1.25, we are still on the testing freeze period https://www.kubernetes.dev/resources/release/#summary |
I took a deeper look into this PR and this LGTM. From purely test-freeze perpective, that looks reasonable to merge. |
I have to do a lot of refactors to the KMS code to support the v2 feature set so this will not be a big issue for me to update. @wojtek-t you will have to approve the API server code changes. /lgtm |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: enj, sanwishe The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Test freeze for 1.25 was |
/lgtm cancel (since we missed the deadline) |
I have opened #111986 to fix this bug without expanding the transformer interface. |
@sanwishe: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@enj: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
fix leaking goroutines in
test/integration/controlplane/transformation
Which issue(s) this PR fixes:
Fixes #111674
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: