-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test #106289
Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test #106289
Conversation
…HistogramVecFromGatherer unit test
Hi @CatherineF-dev. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -61,6 +69,8 @@ func TestRecordOperation(t *testing.T) { | |||
assert.HTTPBodyContains(t, http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | |||
mux.ServeHTTP(w, r) | |||
}), "GET", prometheusURL, nil, runtimeOperationsDurationExpected) | |||
|
|||
registry.Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary since the test will terminate after evaluating this expression so we will not be using the registry anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It failed with make test KUBE_RACE=-race KUBE_TIMEOUT=--timeout=600s GOFLAGS=-count=10 WHAT=./pkg/kubelet/kuberuntime/
. The test is a little bit special, it runs with -count=10
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if it is run 10 times, all of the tests should have independent registries which shouldn't collide with one another.
A potential reason why you were still seeing failures might be because you are using the prometheus.DefaultRegisterer
in the handler which is shared between the tests. Although the library might be protecting against that, I haven't checked. But it might be worth checking again with my suggestion from above: https://github.com/kubernetes/kubernetes/pull/106289/files#r746633125
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Damien, I think registry.Reset()
is needed.
Even though registry is local, metrics RuntimeOperations and RuntimeOperationsDuration are global.
I tested that adding metrics.RuntimeOperations.Reset()
would work ifregistry.Reset()
was removed.
CatherineF-dev@a8692c6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, thank you for looking into that @CatherineF-dev 🙂
/triage accepted |
/retest |
/kind failing-test |
// Use local registry | ||
var registry = compbasemetrics.NewKubeRegistry() | ||
var gather compbasemetrics.Gatherer = registry | ||
defer registry.Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to Reset at the end rather than the beginning? It seems to me that what this test func needs is for the count to be zero at the start, it does not care about the count at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both(defer it to the end
and reset at the begging
) seem to be OK.
I prefer defer it to the end
because we want to fix an issue when the test case runs multiple times.
The reset at the end of the test case will clear the metrics/env after running the case.
This is also a very clear way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing the Reset at the end works if every test does a Reset at the end. Doing a Reset at the start works regardless of what other tests do. A local condition is better than a global one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both ways are okay.
I prefer defer
. Because
- Other test files willn't affect this file. Because metrics RuntimeOperations is supposed to be tested in this file.
- It requires tests in this file doing Reset at the end. We have done it since metrics registration appears only once. Or, it could keep code style more consistent if metrics registration appears many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both ways will work if, in a given process, no other function adds data to metrics.RuntimeOperations
, metrics.RuntimeOperationsDuration
, or metrics.RuntimeOperationsErrors
before TestRecordOperation runs. That is a global condition. Note that metrics.RuntimeOperations
is registered in legacyregistry in some other code invoked by another test. Other test files can affect this one, if the Reset is done at the end in this one. Local conditions are better than global ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both ways will work if, in a given process, no other function adds data to
metrics.RuntimeOperations
,metrics.RuntimeOperationsDuration
, ormetrics.RuntimeOperationsErrors
before TestRecordOperation runs. That is a global condition. Note thatmetrics.RuntimeOperations
is registered in legacyregistry in some other code invoked by another test. Other test files can affect this one, if the Reset is done at the end in this one. Local conditions are better than global ones.
I find this argument persuasive. Globals present numerous problems, but clearing the state before starting a test keeps the scope of control inside this Test instead of requiring every other test to "be perfect".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clearing the state before starting a test keeps the scope of control inside this Test instead of requiring every other test to "be perfect".
yeah, we can't rely on other tests "to do the right thing", but we also need to clean up the state once the test ends, same as we do with the listeners, per example
defer l.Close()
I feel that we need both
diff --git a/pkg/kubelet/kuberuntime/instrumented_services_test.go b/pkg/kubelet/kuberuntime/instrumented_services_test.go
index e95d6bdb74a..1801f995d5f 100644
--- a/pkg/kubelet/kuberuntime/instrumented_services_test.go
+++ b/pkg/kubelet/kuberuntime/instrumented_services_test.go
@@ -37,6 +37,7 @@ func TestRecordOperation(t *testing.T) {
registry.MustRegister(metrics.RuntimeOperations)
registry.MustRegister(metrics.RuntimeOperationsDuration)
registry.MustRegister(metrics.RuntimeOperationsErrors)
+ registry.Reset()
l, err := net.Listen("tcp", "127.0.0.1:0")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not follow the analogy to calling Close. The Close method is about releasing expensive resources that an active connection holds. Reset is not analogous, it does not release expensive resources. I mean, there may be some internal side-effects of resetting some metrics, but it is nothing like open network connections that can accumulate and cause problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, my bad, I misinterpreted the reset on metrics.
Agree with you and David, the test has to clear the state before starting and not depend that other tests do the same after finishing
staging/src/k8s.io/component-base/metrics/testutil/metrics_test.go
Outdated
Show resolved
Hide resolved
Thanks everyone! Have changed to |
@CatherineF-dev : thank you for caring and seeing this through to completion! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
/assign @derekwaynecarr |
Thanks Mike! |
Approving to allow others to rebase on it. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: CatherineF-dev, deads2k, logicalhan, MikeSpreitzer, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Test:
Fixes #104940
It takes over #105809