Controller runtime metrics not exposed before first reconciliation/error #4372

ari-e · 2021-01-07T20:59:34Z

Bug Report

What did you do?

Launch an operator built on top of the operator-sdk to my cluster then curl the metrics endpoint from within the pod.

What did you expect to see?

Expected to see all relevant controller metrics, most importantly controller_runtime_reconcile_total and controller_runtime_reconcile_errors_total set to 0.

What did you see instead? Under which circumstances?

Only the workqueue_* metrics were present. After I ran a reconciliation, most of the controller_runtime* metrics appeared but still not controller_runtime_reconcile_errors_total. This last metric only appeared after there was an error.

This is an issue because we set up alerts around these metrics which usually work as expected, but whenever we deploy a new version of the operator there is a period with no metrics reporting until the next reconciliation / error so our alerts start firing thinking we can't scrape any metrics from the pod when in reality these metrics just aren't exported with value 0 like they should be.

Environment

Operator type:

/language go

Kubernetes cluster type:

vanilla

$ operator-sdk version

operator-sdk version: "v0.18.2", commit: "f059b5e17447b0bbcef50846859519340c17ffad", kubernetes version: "v1.18.2", go version: "go1.13.10 linux/amd64"

$ go version (if language is Go)

go version go1.13.15 linux/amd64

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.10", GitCommit:"575467a0eaf3ca1f20eb86215b3bde40a5ae617a", GitTreeState:"clean", BuildDate:"2019-12-11T12:41:00Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.10", GitCommit:"575467a0eaf3ca1f20eb86215b3bde40a5ae617a", GitTreeState:"clean", BuildDate:"2019-12-11T12:32:32Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Additional context

It's possible this is an issue with the underlying controller-runtime library but I figured I would start here.

Relevant controller-runtime code for reference:

https://github.com/kubernetes-sigs/controller-runtime/blob/3645df01769a3f131a6d9adfe4dfd6f76206ff82/pkg/internal/controller/metrics/metrics.go#L29-L39

https://github.com/kubernetes-sigs/controller-runtime/blob/2b423ec0a6646ad7fec022f76aa7c94358dc1135/pkg/internal/controller/controller.go#L263-L282

The text was updated successfully, but these errors were encountered:

joelanford · 2021-01-08T19:23:17Z

Submitted a PR to fix this here: kubernetes-sigs/controller-runtime#1324

joelanford · 2021-01-11T17:16:42Z

Upstream PR has been merged. This will be released in controller-runtime v0.8.0.

openshift-ci-robot added the language/go Issue is related to a Go operator project label Jan 7, 2021

joelanford mentioned this issue Jan 8, 2021

✨ initialize reconciler metrics when controller is started kubernetes-sigs/controller-runtime#1324

Merged

joelanford self-assigned this Jan 8, 2021

joelanford closed this as completed Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controller runtime metrics not exposed before first reconciliation/error #4372

Controller runtime metrics not exposed before first reconciliation/error #4372

ari-e commented Jan 7, 2021 •

edited

joelanford commented Jan 8, 2021

joelanford commented Jan 11, 2021

Controller runtime metrics not exposed before first reconciliation/error #4372

Controller runtime metrics not exposed before first reconciliation/error #4372

Comments

ari-e commented Jan 7, 2021 • edited

Bug Report

What did you do?

What did you expect to see?

What did you see instead? Under which circumstances?

Environment

Additional context

joelanford commented Jan 8, 2021

joelanford commented Jan 11, 2021

ari-e commented Jan 7, 2021 •

edited