Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kms: fix go routine leak in gRPC connection #111986

Merged
merged 1 commit into from
Sep 8, 2022

Conversation

enj
Copy link
Member

@enj enj commented Aug 24, 2022

Signed-off-by: Monis Khan mok@microsoft.com

/kind bug
/sig auth
/milestone v1.26
/priority important-soon
/triage accepted
/assign @aramase @wojtek-t
Fixes #111674

NONE

@k8s-ci-robot k8s-ci-robot added the release-note-none Denotes a PR that doesn't merit a release note. label Aug 24, 2022
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. sig/auth Categorizes an issue or PR as relevant to SIG Auth. labels Aug 24, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Aug 24, 2022
@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 24, 2022
@k8s-ci-robot k8s-ci-robot added area/apiserver area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Aug 24, 2022
@@ -198,7 +198,7 @@ func EtcdMain(tests func() int) {
// like k8s.io/klog/v2.(*loggingT).flushDaemon()
// TODO(#108483): Reduce this number once we address the
// couple remaining issues.
if dg := runtime.NumGoroutine() - before; dg <= 15 {
if dg := runtime.NumGoroutine() - before; dg <= 10 {
Copy link
Member

@aramase aramase Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dg <= 10 was set because integration/controlplane/transformation had 9 leaked goroutines before. With this PR that number should have come down further because this closes the connection for v1 and v2. Could we evaluate what the current number of goroutines are leaked including this change and reduce the allowed number to a lower strict value?

xref: #108483 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The worst offender is:

=== FAIL: test/integration/clustercidr  (0.00s)
I0824 17:41:24.044151  116775 etcd.go:76] etcd already running at http://127.0.0.1:2379
PASS
F0824 17:41:53.931022  116775 etcd.go:213] unexpected number of goroutines: before: 2 after 11
FAIL	k8s.io/kubernetes/test/integration/clustercidr	29.958s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have set the limit to 9 to allow test/integration/clustercidr to pass while being as strict as possible.

Copy link
Member

@aramase aramase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 24, 2022
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 25, 2022
Copy link
Member

@aramase aramase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 25, 2022
@wojtek-t
Copy link
Member

Will take a look later this week.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple small comments - other than that is LGTM

@@ -536,3 +541,12 @@ func (u unionTransformers) TransformFromStorage(ctx context.Context, data []byte
func (u unionTransformers) TransformToStorage(ctx context.Context, data []byte, dataCtx value.Context) (out []byte, err error) {
return u[0].TransformToStorage(ctx, data, dataCtx)
}

func stopChToContext(stopCh <-chan struct{}) context.Context {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating your own method, let's use already existing one that does what you want:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go#L299

@@ -84,6 +84,12 @@ func NewGRPCService(endpoint string, callTimeout time.Duration) (Service, error)
}

s.kmsClient = kmsapi.NewKeyManagementServiceClient(s.connection)

go func() {
<-ctx.Done()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add defer HandleCrash() - we generally try to do that in all goroutines (in production code) for safety

cmd/kube-apiserver/app/server.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 31, 2022
@enj
Copy link
Member Author

enj commented Aug 31, 2022

@wojtek-t all comments addressed.

@enj
Copy link
Member Author

enj commented Aug 31, 2022

/retest

Signed-off-by: Monis Khan <mok@microsoft.com>
@liggitt
Copy link
Member

liggitt commented Sep 8, 2022

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enj, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 8, 2022
@k8s-ci-robot k8s-ci-robot merged commit cc4b7dc into kubernetes:master Sep 8, 2022
@wojtek-t
Copy link
Member

wojtek-t commented Sep 9, 2022

@enj - I'm sorry for delay (I'm recently more often out than around).

Your explanation makes sense (I still think that the DrainedInFlight is not exactly what we want, but we don't have a better one, so we can leave with it). I'm planning to get to draining long-running requests too, which would address this issue.

This LGTM too!

And thanks a lot for following on that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

reduce goroutine leakage in test/integration/controlplane/transformation
6 participants