Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose scheduler metrics from runtime #1477

Open
clux opened this issue Apr 25, 2024 · 0 comments
Open

Expose scheduler metrics from runtime #1477

clux opened this issue Apr 25, 2024 · 0 comments
Labels
question Direction unclear; possibly a bug, possibly could be improved. runtime controller runtime related

Comments

@clux
Copy link
Member

clux commented Apr 25, 2024

Would you like to work on this feature?

Maybe

What problem are you trying to solve?

When debugging whether a controller is congested or not, we can measure how long our own reconcilers take to proccess an event using something like #[instrument] on the reconciler fn, but we have do not have access to how long an event spends in scheduling before it hits our reconciler, nor do we have information about how deep the queue is.

Having these numbers would be useful because it lets us more accurately tune controller parallelism and, as a result, vertically scale appropriately.

Ref: good article on common metrics for queues

Describe the solution you'd like

If it's something that can be exposed, then maybe two synchronised numbers is sufficient, but that might be quite difficult to actually thread through from Controller -> scheduler...

Describe alternatives you've considered

A feature-flagged metrics module using prometheus_client. This is the official rust prometheus client and it seems relatively light, but it should be fine behind an optional feature flag if we want to go down this route. Should try to do kube-rs/controller-rs#55 first if this is desirable.

..there is another that is well used and very optimized (prometheus) - but that also is lacking too many features and hasn't been released in a year so hard to recommend that these days

Documentation, Adoption, Migration Strategy

  • docs on scaling + metrics on kube.rs
  • release notes

imagine this will be purely additive

Target crate for feature

kube-runtime

@clux clux added question Direction unclear; possibly a bug, possibly could be improved. runtime controller runtime related labels Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Direction unclear; possibly a bug, possibly could be improved. runtime controller runtime related
Projects
None yet
Development

No branches or pull requests

1 participant