Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UT failure: panic: duplicate metrics collector registration attempted #104940

Closed
pacoxu opened this issue Sep 12, 2021 · 12 comments · Fixed by #106289
Closed

UT failure: panic: duplicate metrics collector registration attempted #104940

pacoxu opened this issue Sep 12, 2021 · 12 comments · Fixed by #106289
Assignees
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/testing Categorizes an issue or PR as relevant to SIG Testing. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@pacoxu
Copy link
Member

pacoxu commented Sep 12, 2021

Failure cluster 1b410c50dc9c60031af5

https://storage.googleapis.com/k8s-triage/index.html?text=prometheus

Error text:

TestRecordOperation

=== RUN   TestRecordOperation
--- FAIL: TestRecordOperation (0.00s)
panic: duplicate metrics collector registration attempted [recovered]
	panic: duplicate metrics collector registration attempted

goroutine 281 [running]:
testing.tRunner.func1.2({0x2b86420, 0xc000526a20})
	/usr/local/go/src/testing/testing.go:1209 +0x36c
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1212 +0x3b6
panic({0x2b86420, 0xc000526a20})
	/usr/local/go/src/runtime/panic.go:1047 +0x266
k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0xc000164000, {0xc00081ac80, 0x1, 0x0})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus/registry.go:403 +0xa7
k8s.io/kubernetes/vendor/k8s.io/component-base/metrics.(*kubeRegistry).MustRegister(0xc000164000, {0xc00081ac70, 0x1, 0x10})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/metrics/registry.go:176 +0xcc
k8s.io/kubernetes/pkg/kubelet/kuberuntime.TestRecordOperation(0x0)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kuberuntime/instrumented_services_test.go:33 +0xe4
testing.tRunner(0xc000602b60, 0x2e77168)
	/usr/local/go/src/testing/testing.go:1259 +0x230
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1306 +0x727

TestRecordOperation

=== RUN   TestRecordOperation
--- FAIL: TestRecordOperation (0.00s)
panic: duplicate metrics collector registration attempted [recovered]
	panic: duplicate metrics collector registration attempted

goroutine 281 [running]:
testing.tRunner.func1.2({0x2b86420, 0xc000526a20})
	/usr/local/go/src/testing/testing.go:1209 +0x36c
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1212 +0x3b6
panic({0x2b86420, 0xc000526a20})
	/usr/local/go/src/runtime/panic.go:1047 +0x266
k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0xc000164000, {0xc00081ac80, 0x1, 0x0})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus/registry.go:403 +0xa7
k8s.io/kubernetes/vendor/k8s.io/component-base/metrics.(*kubeRegistry).MustRegister(0xc000164000, {0xc00081ac70, 0x1, 0x10})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/metrics/registry.go:176 +0xcc
k8s.io/kubernetes/pkg/kubelet/kuberuntime.TestRecordOperation(0x0)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubelet/kuberuntime/instrumented_services_test.go:33 +0xe4
testing.tRunner(0xc000602b60, 0x2e77168)
	/usr/local/go/src/testing/testing.go:1259 +0x230
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1306 +0x727

TestGetHistogramVecFromGatherer

=== RUN   TestGetHistogramVecFromGatherer/filter_with_one_label
    --- FAIL: TestGetHistogramVecFromGatherer/filter_with_one_label (0.00s)
panic: duplicate metrics collector registration attempted [recovered]
	panic: duplicate metrics collector registration attempted

goroutine 131 [running]:
testing.tRunner.func1.2({0x9caea0, 0xc00008a7e0})
	/usr/local/go/src/testing/testing.go:1209 +0x36c
testing.tRunner.func1()
	/usr/local/go/src/testing/testing.go:1212 +0x3b6
panic({0x9caea0, 0xc00008a7e0})
	/usr/local/go/src/runtime/panic.go:1047 +0x266
k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).MustRegister(0xc000320000, {0xc000408b10, 0x1, 0x0})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/prometheus/client_golang/prometheus/registry.go:403 +0xa7
k8s.io/kubernetes/vendor/k8s.io/component-base/metrics.(*kubeRegistry).MustRegister(0xc000320000, {0xc000408b00, 0x1, 0x10})
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/metrics/registry.go:176 +0xcc
k8s.io/kubernetes/vendor/k8s.io/component-base/metrics/testutil.TestGetHistogramVecFromGatherer.func1.1()
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/metrics/testutil/metrics_test.go:580 +0xa2
sync.(*Once).doSlow(0xc00041ab84, 0xc000078e38)
	/usr/local/go/src/sync/once.go:68 +0x128
sync.(*Once).Do(0xc00041ab84, 0xc00008a760)
	/usr/local/go/src/sync/once.go:59 +0x47
k8s.io/kubernetes/vendor/k8s.io/component-base/metrics/testutil.TestGetHistogramVecFromGatherer.func1(0xc000419860)
	/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/metrics/testutil/metrics_test.go:579 +0x35f
testing.tRunner(0xc000419860, 0xc0001415a8)
	/usr/local/go/src/testing/testing.go:1259 +0x230
created by testing.(*T).Run
	/usr/local/go/src/testing/testing.go:1306 +0x727

Recent failures:

2021/9/4 上午2:00:19 ci-kubernetes-generate-make-test-count10
2021/9/3 下午8:00:05 ci-kubernetes-generate-make-test-count10
2021/9/3 下午2:00:15 ci-kubernetes-generate-make-test-count10
2021/9/3 上午7:59:43 ci-kubernetes-generate-make-test-count10
2021/9/3 上午1:59:22 ci-kubernetes-generate-make-test-count10

/kind failing-test

/sig testing

This may be related with prometheus/client_golang#716 \ prometheus/client_golang#733

@k8s-ci-robot k8s-ci-robot added kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/testing Categorizes an issue or PR as relevant to SIG Testing. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 12, 2021
@pacoxu
Copy link
Member Author

pacoxu commented Sep 12, 2021

/assign
I will do some initial researches on this.

#76956 is similar

@pacoxu pacoxu removed their assignment Sep 12, 2021
@pacoxu

This comment has been minimized.

@k8s-ci-robot
Copy link
Contributor

@pacoxu: The label(s) sig/instrument cannot be applied, because the repository doesn't have them.

In response to this:

/sig instrument

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@pacoxu
Copy link
Member Author

pacoxu commented Sep 13, 2021

/sig instrumentation

@k8s-ci-robot k8s-ci-robot added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Sep 13, 2021
@ehashman
Copy link
Member

/triage accepted
/assign @pacoxu
cc @CatherineF-dev

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 23, 2021
@pacoxu
Copy link
Member Author

pacoxu commented Sep 24, 2021

To fix it gracefully, we may refactor the code in vendor/k8s.io/component-base/metrics like
eh-am/pyroscope@70733d1
https://github.com/pyroscope-io/pyroscope/pull/335/files
The refactor would change a lot and need some confirmations or demo.(I prefer)

A temporary workaround before refactoring it:

@CatherineF-dev
Copy link
Contributor

Three testings failed due to panic: duplicate metrics collector registration attempted. Haven't found package promauto (prometheus/client_golang#716 (comment)) being used in these testings. It might be other issues.

These testings are:

  1. TestGetHistogramVecFromGatherer/filter_with_one_label
  2. TestRecordOperation
  3. TestSampler

@logicalhan
Copy link
Member

What are you guys doing here? What test is this? You cannot change the component-base metrics code base it is doing what it is supposed to be doing. Does this test suite involve multiple sequential runs of the same test?

@pacoxu
Copy link
Member Author

pacoxu commented Oct 14, 2021

What are you guys doing here? What test is this? You cannot change the component-base metrics code base it is doing what it is supposed to be doing. Does this test suite involve multiple sequential runs of the same test?

@logicalhan I think this is odd to change the component-base metrics code base. But the refactor to use promauto.With(reg).NewCounter is another approach.

You can try to reproduce it with

 make test KUBE_RACE=-race KUBE_TIMEOUT=--timeout=600s GOFLAGS=-count=10 WHAT=./vendor/k8s.io/apiserver/pkg/util/flowcontrol/metrics/

ci-kubernetes-generate-make-test-count10 is with GOFLAGS=-count=10

@CatherineF-dev
Copy link
Contributor

Oh, yes. This test suite involves multiple sequential runs of the same test to catch flakes.

It uses GOFLAGS=-count=10

make test KUBE_RACE=-race KUBE_TIMEOUT=--timeout=600s GOFLAGS=-count=10'

@aojea
Copy link
Member

aojea commented Oct 21, 2021

You can use sync.Once to register the registry only once.
It is important to reset the metrics, or next iteration in the test will fail
#105809

go test -timeout 100s -run ^TestRecordOperation$ k8s.io/kubernetes/pkg/kubelet/kuberuntime -v -count 10
=== RUN   TestRecordOperation
--- PASS: TestRecordOperation (0.00s)
=== RUN   TestRecordOperation
--- PASS: TestRecordOperation (0.00s)
=== RUN   TestRecordOperation
--- PASS: TestRecordOperation (0.00s)
=== RUN   TestRecordOperation
--- PASS: TestRecordOperation (0.00s)
=== RUN   TestRecordOperation
--- PASS: TestRecordOperation (0.00s)
=== RUN   TestRecordOperation
--- PASS: TestRecordOperation (0.00s)
=== RUN   TestRecordOperation
--- PASS: TestRecordOperation (0.00s)
=== RUN   TestRecordOperation
--- PASS: TestRecordOperation (0.00s)
=== RUN   TestRecordOperation
--- PASS: TestRecordOperation (0.00s)
=== RUN   TestRecordOperation
--- PASS: TestRecordOperation (0.00s)
PASS
ok      k8s.io/kubernetes/pkg/kubelet/kuberuntime       0.038s

@CatherineF-dev
Copy link
Contributor

Thanks @aojea, I added other fixes in #105886

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/testing Categorizes an issue or PR as relevant to SIG Testing. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
6 participants