Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics support #119

Closed
zqzten opened this issue May 18, 2022 · 5 comments · Fixed by #121
Closed

metrics support #119

zqzten opened this issue May 18, 2022 · 5 comments · Fixed by #121
Assignees

Comments

@zqzten
Copy link
Contributor

zqzten commented May 18, 2022

In production we want to collect some key metrics of kine for monitoring and alerting, such as the DBStats of Go SQL backend used by kine, SQL operation times and errors etc. Would it be applicable that we add some prometheus metrics for them?

For library mode, we can add an injectable registerer in endpoint config and for stand-alone, we can setup our own metrics handler like etcd.

@brandond
Copy link
Contributor

brandond commented May 18, 2022

I think this might already be covered by #104?

this PR adds a Maintenance service that only supports Status, populating the DbSize field if the driver has a query to report this data. Currently sqlite, mysql, and postgres all support this; cockroachdb does not. This field is periodically collected by the Kubernetes apiserver and copied into the etcd_db_total_size_in_bytes metric.

Just pull the etcd_db_total_size_in_bytes metric from your apiserver

For other metrics, I'm curious what you might expose and how. Kine is most frequently used in an embedded capacity where it only listens on a Unix socket, which isn't easy to scrape directly.

@zqzten
Copy link
Contributor Author

zqzten commented May 19, 2022

Our use case might be a little different from the most common, we integrate kine (as a library) with our custom apiserver so that it can directly call kine's endpoint.Listen to use SQL servers as etcd. In such use case, we can directly pass the apiserver's prometheus registerer to kine in endpoint config and let kine register its metrics to it. For standalone usage, I'm not sure what will be the best practice (since the use case is variable), a naive one may be just setting up a simple http prometheus metrics handler whose port can be configured by cmd flag.

For specific metrics, currently I want to expose below metrics:

  • SQL Total & Errors: Counter metrics which hold the total number of SQL operations and failed SQL operations. This is helpful for quickly tracking and alerting local SQL errors which have a variant of causes (network, db, etc.) thus not easy to monitor.
  • SQL Time: Histogram metric which keeps track of the duration of SQL operations. This duration is measured locally and can be used as a supplement to the SQL exec time measurement of db.
  • Compact Errors: Counter metrics which hold the total number of compact errors. Helpful for alerting when continuous compact errors occur which can lead to uncontrollable growth of db size.
  • DBStats metrics of sql.DB used by kine.

@zqzten
Copy link
Contributor Author

zqzten commented May 24, 2022

@brandond Any thoughts on this? Do you think these metrics are nice to have?

@brandond
Copy link
Contributor

brandond commented May 24, 2022

If you'd use them, then go for it! Love to see the PR. Let's just make sure that the metrics extension points don't add any additional overhead if metrics aren't being collected? Shouldn't be a problem I don't think.

@zqzten
Copy link
Contributor Author

zqzten commented May 24, 2022

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants