Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test #106289

CatherineF-dev · 2021-11-10T03:30:29Z

Test:

make test KUBE_RACE=-race KUBE_TIMEOUT=--timeout=600s GOFLAGS=-count=10 WHAT=./staging/src/k8s.io/component-base/metrics/testutil

make test KUBE_RACE=-race KUBE_TIMEOUT=--timeout=600s GOFLAGS=-count=10 WHAT=./pkg/kubelet/kuberuntime/

Fixes #104940

It takes over #105809

…HistogramVecFromGatherer unit test

k8s-ci-robot · 2021-11-10T03:30:37Z

Hi @CatherineF-dev. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

CatherineF-dev · 2021-11-10T03:36:43Z

cc @MikeSpreitzer

pkg/kubelet/kuberuntime/instrumented_services_test.go

dgrisonnet · 2021-11-10T13:46:27Z

pkg/kubelet/kuberuntime/instrumented_services_test.go

@@ -61,6 +69,8 @@ func TestRecordOperation(t *testing.T) {
 	assert.HTTPBodyContains(t, http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		mux.ServeHTTP(w, r)
 	}), "GET", prometheusURL, nil, runtimeOperationsDurationExpected)
+
+	registry.Reset()


I don't think this is necessary since the test will terminate after evaluating this expression so we will not be using the registry anymore.

It failed with make test KUBE_RACE=-race KUBE_TIMEOUT=--timeout=600s GOFLAGS=-count=10 WHAT=./pkg/kubelet/kuberuntime/. The test is a little bit special, it runs with -count=10.

#104940 (comment)

Even if it is run 10 times, all of the tests should have independent registries which shouldn't collide with one another.

A potential reason why you were still seeing failures might be because you are using the prometheus.DefaultRegisterer in the handler which is shared between the tests. Although the library might be protecting against that, I haven't checked. But it might be worth checking again with my suggestion from above: https://github.com/kubernetes/kubernetes/pull/106289/files#r746633125

Hi Damien, I think registry.Reset() is needed.

Even though registry is local, metrics RuntimeOperations and RuntimeOperationsDuration are global.

I tested that adding metrics.RuntimeOperations.Reset() would work ifregistry.Reset() was removed.
CatherineF-dev@a8692c6

Make sense, thank you for looking into that @CatherineF-dev 🙂

wgahnagl · 2021-11-10T18:46:07Z

/triage accepted
/priority backlog

CatherineF-dev · 2021-11-12T23:15:39Z

/retest

pacoxu · 2021-11-15T03:42:03Z

/kind failing-test
/lgtm

MikeSpreitzer · 2021-11-15T04:58:31Z

pkg/kubelet/kuberuntime/instrumented_services_test.go

+	// Use local registry
+	var registry = compbasemetrics.NewKubeRegistry()
+	var gather compbasemetrics.Gatherer = registry
+	defer registry.Reset()


Why do we want to Reset at the end rather than the beginning? It seems to me that what this test func needs is for the count to be zero at the start, it does not care about the count at the end.

Both(defer it to the end and reset at the begging) seem to be OK.
I prefer defer it to the end because we want to fix an issue when the test case runs multiple times.
The reset at the end of the test case will clear the metrics/env after running the case.
This is also a very clear way.

Doing the Reset at the end works if every test does a Reset at the end. Doing a Reset at the start works regardless of what other tests do. A local condition is better than a global one.

Both ways are okay.

I prefer defer. Because

Other test files willn't affect this file. Because metrics RuntimeOperations is supposed to be tested in this file.

It requires tests in this file doing Reset at the end. We have done it since metrics registration appears only once. Or, it could keep code style more consistent if metrics registration appears many times.

Both ways will work if, in a given process, no other function adds data to metrics.RuntimeOperations, metrics.RuntimeOperationsDuration, or metrics.RuntimeOperationsErrors before TestRecordOperation runs. That is a global condition. Note that metrics.RuntimeOperations is registered in legacyregistry in some other code invoked by another test. Other test files can affect this one, if the Reset is done at the end in this one. Local conditions are better than global ones.

Both ways will work if, in a given process, no other function adds data to metrics.RuntimeOperations, metrics.RuntimeOperationsDuration, or metrics.RuntimeOperationsErrors before TestRecordOperation runs. That is a global condition. Note that metrics.RuntimeOperations is registered in legacyregistry in some other code invoked by another test. Other test files can affect this one, if the Reset is done at the end in this one. Local conditions are better than global ones.

I find this argument persuasive. Globals present numerous problems, but clearing the state before starting a test keeps the scope of control inside this Test instead of requiring every other test to "be perfect".

clearing the state before starting a test keeps the scope of control inside this Test instead of requiring every other test to "be perfect".

yeah, we can't rely on other tests "to do the right thing", but we also need to clean up the state once the test ends, same as we do with the listeners, per example

defer l.Close()

I feel that we need both

diff --git a/pkg/kubelet/kuberuntime/instrumented_services_test.go b/pkg/kubelet/kuberuntime/instrumented_services_test.go index e95d6bdb74a..1801f995d5f 100644 --- a/pkg/kubelet/kuberuntime/instrumented_services_test.go +++ b/pkg/kubelet/kuberuntime/instrumented_services_test.go @@ -37,6 +37,7 @@ func TestRecordOperation(t *testing.T) { registry.MustRegister(metrics.RuntimeOperations) registry.MustRegister(metrics.RuntimeOperationsDuration) registry.MustRegister(metrics.RuntimeOperationsErrors) + registry.Reset() l, err := net.Listen("tcp", "127.0.0.1:0")

I do not follow the analogy to calling Close. The Close method is about releasing expensive resources that an active connection holds. Reset is not analogous, it does not release expensive resources. I mean, there may be some internal side-effects of resetting some metrics, but it is nothing like open network connections that can accumulate and cause problems.

You are right, my bad, I misinterpreted the reset on metrics.

Agree with you and David, the test has to clear the state before starting and not depend that other tests do the same after finishing

staging/src/k8s.io/component-base/metrics/testutil/metrics_test.go

CatherineF-dev · 2021-11-16T19:00:01Z

Thanks everyone! Have changed to Reset at the beginning.

MikeSpreitzer · 2021-11-16T19:20:22Z

@CatherineF-dev : thank you for caring and seeing this through to completion!

MikeSpreitzer

/lgtm

deads2k · 2021-11-16T19:59:44Z

/approve

CatherineF-dev · 2021-11-16T20:02:28Z

/retest

logicalhan

/lgtm
/approve

MikeSpreitzer · 2021-11-16T20:19:02Z

/assign @derekwaynecarr

CatherineF-dev · 2021-11-16T20:40:39Z

/assign @derekwaynecarr

Thanks Mike!

thockin · 2021-11-16T22:24:55Z

Approving to allow others to rebase on it.

/approve

k8s-ci-robot · 2021-11-16T22:25:33Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CatherineF-dev, deads2k, logicalhan, MikeSpreitzer, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/OWNERS~~ [thockin]
~~staging/src/k8s.io/component-base/metrics/OWNERS~~ [logicalhan,thockin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

CatherineF-dev added 2 commits November 10, 2021 03:23

Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGet…

ef0b2df

…HistogramVecFromGatherer unit test

format

8290400

k8s-ci-robot requested review from logicalhan and pacoxu November 10, 2021 03:31

dgrisonnet reviewed Nov 10, 2021

View reviewed changes

wgahnagl added this to Triage in SIG Node PR Triage Nov 10, 2021

wgahnagl moved this from Triage to Needs Reviewer in SIG Node PR Triage Nov 10, 2021

wgahnagl moved this from Needs Reviewer to Waiting on Author in SIG Node PR Triage Nov 10, 2021

CatherineF-dev added 3 commits November 12, 2021 02:17

remove prometheus.DefaultRegisterer

744785e

add local registry.Reset()

03f7a8d

clean

a8324a3

k8s-ci-robot added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels Nov 15, 2021

pacoxu moved this from Waiting on Author to Needs Approver in SIG Node PR Triage Nov 15, 2021

k8s-ci-robot assigned pacoxu Nov 15, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 15, 2021

MikeSpreitzer reviewed Nov 15, 2021

View reviewed changes

SergeyKanzhelev added this to Triage in SIG Node CI/Test Board Nov 15, 2021

Use Reset at first

5646120

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2021

MikeSpreitzer approved these changes Nov 16, 2021

View reviewed changes

k8s-ci-robot assigned MikeSpreitzer Nov 16, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2021

logicalhan approved these changes Nov 16, 2021

View reviewed changes

k8s-ci-robot assigned derekwaynecarr Nov 16, 2021

MikeSpreitzer mentioned this pull request Nov 16, 2021

use golangci-lint #106448

Merged

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 16, 2021

k8s-ci-robot merged commit 42d8b2f into kubernetes:master Nov 17, 2021

SIG Node CI/Test Board automation moved this from Triage to Done Nov 17, 2021

SIG Node PR Triage automation moved this from Needs Approver to Done Nov 17, 2021

k8s-ci-robot added this to the v1.23 milestone Nov 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test #106289

Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test #106289

CatherineF-dev commented Nov 10, 2021

k8s-ci-robot commented Nov 10, 2021

CatherineF-dev commented Nov 10, 2021

dgrisonnet Nov 10, 2021

CatherineF-dev Nov 10, 2021 •

edited

dgrisonnet Nov 10, 2021

CatherineF-dev Nov 12, 2021 •

edited

dgrisonnet Nov 12, 2021

wgahnagl commented Nov 10, 2021

CatherineF-dev commented Nov 12, 2021

pacoxu commented Nov 15, 2021

MikeSpreitzer Nov 15, 2021

pacoxu Nov 15, 2021 •

edited

MikeSpreitzer Nov 15, 2021

CatherineF-dev Nov 15, 2021 •

edited

MikeSpreitzer Nov 15, 2021

deads2k Nov 16, 2021

aojea Nov 16, 2021 •

edited

MikeSpreitzer Nov 16, 2021

aojea Nov 16, 2021

CatherineF-dev commented Nov 16, 2021

MikeSpreitzer commented Nov 16, 2021

MikeSpreitzer left a comment

deads2k commented Nov 16, 2021

CatherineF-dev commented Nov 16, 2021

logicalhan left a comment

MikeSpreitzer commented Nov 16, 2021

CatherineF-dev commented Nov 16, 2021

thockin commented Nov 16, 2021

k8s-ci-robot commented Nov 16, 2021

Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test #106289

Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test #106289

Conversation

CatherineF-dev commented Nov 10, 2021

k8s-ci-robot commented Nov 10, 2021

CatherineF-dev commented Nov 10, 2021

Choose a reason for hiding this comment

CatherineF-dev Nov 10, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CatherineF-dev Nov 12, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wgahnagl commented Nov 10, 2021

CatherineF-dev commented Nov 12, 2021

pacoxu commented Nov 15, 2021

Choose a reason for hiding this comment

pacoxu Nov 15, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CatherineF-dev Nov 15, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea Nov 16, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CatherineF-dev commented Nov 16, 2021

MikeSpreitzer commented Nov 16, 2021

MikeSpreitzer left a comment

Choose a reason for hiding this comment

deads2k commented Nov 16, 2021

CatherineF-dev commented Nov 16, 2021

logicalhan left a comment

Choose a reason for hiding this comment

MikeSpreitzer commented Nov 16, 2021

CatherineF-dev commented Nov 16, 2021

thockin commented Nov 16, 2021

k8s-ci-robot commented Nov 16, 2021

CatherineF-dev Nov 10, 2021 •

edited

CatherineF-dev Nov 12, 2021 •

edited

pacoxu Nov 15, 2021 •

edited

CatherineF-dev Nov 15, 2021 •

edited

aojea Nov 16, 2021 •

edited