You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Launch an operator built on top of the operator-sdk to my cluster then curl the metrics endpoint from within the pod.
What did you expect to see?
Expected to see all relevant controller metrics, most importantly controller_runtime_reconcile_total and controller_runtime_reconcile_errors_total set to 0.
What did you see instead? Under which circumstances?
Only the workqueue_* metrics were present. After I ran a reconciliation, most of the controller_runtime* metrics appeared but still not controller_runtime_reconcile_errors_total. This last metric only appeared after there was an error.
This is an issue because we set up alerts around these metrics which usually work as expected, but whenever we deploy a new version of the operator there is a period with no metrics reporting until the next reconciliation / error so our alerts start firing thinking we can't scrape any metrics from the pod when in reality these metrics just aren't exported with value 0 like they should be.
Bug Report
What did you do?
Launch an operator built on top of the operator-sdk to my cluster then curl the metrics endpoint from within the pod.
What did you expect to see?
Expected to see all relevant controller metrics, most importantly
controller_runtime_reconcile_total
andcontroller_runtime_reconcile_errors_total
set to 0.What did you see instead? Under which circumstances?
Only the
workqueue_*
metrics were present. After I ran a reconciliation, most of thecontroller_runtime*
metrics appeared but still notcontroller_runtime_reconcile_errors_total
. This last metric only appeared after there was an error.This is an issue because we set up alerts around these metrics which usually work as expected, but whenever we deploy a new version of the operator there is a period with no metrics reporting until the next reconciliation / error so our alerts start firing thinking we can't scrape any metrics from the pod when in reality these metrics just aren't exported with value 0 like they should be.
Environment
Operator type:
/language go
Kubernetes cluster type:
vanilla
$ operator-sdk version
operator-sdk version: "v0.18.2", commit: "f059b5e17447b0bbcef50846859519340c17ffad", kubernetes version: "v1.18.2", go version: "go1.13.10 linux/amd64"
$ go version
(if language is Go)go version go1.13.15 linux/amd64
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.10", GitCommit:"575467a0eaf3ca1f20eb86215b3bde40a5ae617a", GitTreeState:"clean", BuildDate:"2019-12-11T12:41:00Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.10", GitCommit:"575467a0eaf3ca1f20eb86215b3bde40a5ae617a", GitTreeState:"clean", BuildDate:"2019-12-11T12:32:32Z", GoVersion:"go1.12.12", Compiler:"gc", Platform:"linux/amd64"}
Additional context
It's possible this is an issue with the underlying controller-runtime library but I figured I would start here.
Relevant controller-runtime code for reference:
https://github.com/kubernetes-sigs/controller-runtime/blob/3645df01769a3f131a6d9adfe4dfd6f76206ff82/pkg/internal/controller/metrics/metrics.go#L29-L39
https://github.com/kubernetes-sigs/controller-runtime/blob/2b423ec0a6646ad7fec022f76aa7c94358dc1135/pkg/internal/controller/controller.go#L263-L282
The text was updated successfully, but these errors were encountered: