-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kms: fix go routine leak in gRPC connection #111986
Conversation
e35204b
to
70594ba
Compare
test/integration/framework/etcd.go
Outdated
@@ -198,7 +198,7 @@ func EtcdMain(tests func() int) { | |||
// like k8s.io/klog/v2.(*loggingT).flushDaemon() | |||
// TODO(#108483): Reduce this number once we address the | |||
// couple remaining issues. | |||
if dg := runtime.NumGoroutine() - before; dg <= 15 { | |||
if dg := runtime.NumGoroutine() - before; dg <= 10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dg <= 10
was set because integration/controlplane/transformation
had 9 leaked goroutines before. With this PR that number should have come down further because this closes the connection for v1 and v2. Could we evaluate what the current number of goroutines are leaked including this change and reduce the allowed number to a lower strict value?
xref: #108483 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The worst offender is:
=== FAIL: test/integration/clustercidr (0.00s)
I0824 17:41:24.044151 116775 etcd.go:76] etcd already running at http://127.0.0.1:2379
PASS
F0824 17:41:53.931022 116775 etcd.go:213] unexpected number of goroutines: before: 2 after 11
FAIL k8s.io/kubernetes/test/integration/clustercidr 29.958s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have set the limit to 9
to allow test/integration/clustercidr
to pass while being as strict as possible.
9daae54
to
6f57a4d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
6f57a4d
to
1120f92
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Will take a look later this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple small comments - other than that is LGTM
@@ -536,3 +541,12 @@ func (u unionTransformers) TransformFromStorage(ctx context.Context, data []byte | |||
func (u unionTransformers) TransformToStorage(ctx context.Context, data []byte, dataCtx value.Context) (out []byte, err error) { | |||
return u[0].TransformToStorage(ctx, data, dataCtx) | |||
} | |||
|
|||
func stopChToContext(stopCh <-chan struct{}) context.Context { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating your own method, let's use already existing one that does what you want:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/util/wait/wait.go#L299
@@ -84,6 +84,12 @@ func NewGRPCService(endpoint string, callTimeout time.Duration) (Service, error) | |||
} | |||
|
|||
s.kmsClient = kmsapi.NewKeyManagementServiceClient(s.connection) | |||
|
|||
go func() { | |||
<-ctx.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add defer HandleCrash() - we generally try to do that in all goroutines (in production code) for safety
staging/src/k8s.io/apiserver/pkg/storage/value/encrypt/envelope/kmsv2/grpc_service.go
Show resolved
Hide resolved
1120f92
to
d1dd820
Compare
@wojtek-t all comments addressed. |
/retest |
Signed-off-by: Monis Khan <mok@microsoft.com>
d1dd820
to
4e68e9b
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enj, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@enj - I'm sorry for delay (I'm recently more often out than around). Your explanation makes sense (I still think that the DrainedInFlight is not exactly what we want, but we don't have a better one, so we can leave with it). I'm planning to get to draining long-running requests too, which would address this issue. This LGTM too! And thanks a lot for following on that! |
Signed-off-by: Monis Khan mok@microsoft.com
/kind bug
/sig auth
/milestone v1.26
/priority important-soon
/triage accepted
/assign @aramase @wojtek-t
Fixes #111674