Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller runtime metrics not exposed before first reconciliation/error #4372

Closed
ari-e opened this issue Jan 7, 2021 · 2 comments
Closed
Assignees
Labels
language/go Issue is related to a Go operator project

Comments

@ari-e
Copy link

ari-e commented Jan 7, 2021

Bug Report

What did you do?

Launch an operator built on top of the operator-sdk to my cluster then curl the metrics endpoint from within the pod.

What did you expect to see?

Expected to see all relevant controller metrics, most importantly controller_runtime_reconcile_total and controller_runtime_reconcile_errors_total set to 0.

What did you see instead? Under which circumstances?

Only the workqueue_* metrics were present. After I ran a reconciliation, most of the controller_runtime* metrics appeared but still not controller_runtime_reconcile_errors_total. This last metric only appeared after there was an error.

This is an issue because we set up alerts around these metrics which usually work as expected, but whenever we deploy a new version of the operator there is a period with no metrics reporting until the next reconciliation / error so our alerts start firing thinking we can't scrape any metrics from the pod when in reality these metrics just aren't exported with value 0 like they should be.

Environment

Operator type:

/language go

Kubernetes cluster type:

vanilla

$ operator-sdk version

operator-sdk version: "v0.18.2", commit: "f059b5e17447b0bbcef50846859519340c17ffad", kubernetes version: "v1.18.2", go version: "go1.13.10 linux/amd64"

$ go version (if language is Go)

go version go1.13.15 linux/amd64

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.10", GitCommit:"575467a0eaf3ca1f20eb86215b3bde40a5ae617a", GitTreeState:"clean", BuildDate:"2019-12-11T12:41:00Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.10", GitCommit:"575467a0eaf3ca1f20eb86215b3bde40a5ae617a", GitTreeState:"clean", BuildDate:"2019-12-11T12:32:32Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}

Additional context

It's possible this is an issue with the underlying controller-runtime library but I figured I would start here.

Relevant controller-runtime code for reference:

https://github.com/kubernetes-sigs/controller-runtime/blob/3645df01769a3f131a6d9adfe4dfd6f76206ff82/pkg/internal/controller/metrics/metrics.go#L29-L39

https://github.com/kubernetes-sigs/controller-runtime/blob/2b423ec0a6646ad7fec022f76aa7c94358dc1135/pkg/internal/controller/controller.go#L263-L282

@joelanford
Copy link
Member

Submitted a PR to fix this here: kubernetes-sigs/controller-runtime#1324

@joelanford joelanford self-assigned this Jan 8, 2021
@joelanford
Copy link
Member

Upstream PR has been merged. This will be released in controller-runtime v0.8.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language/go Issue is related to a Go operator project
Projects
None yet
Development

No branches or pull requests

3 participants