New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose paused and retired workers separately in prometheus #8613
base: main
Are you sure you want to change the base?
Conversation
Having paused and retiring and Thoughts about removing |
I think we don't have any strong preferences about keeping/removing the |
I said "kinda". It messes up anything that assumes states other than "connected" are exclusive, or that (eg) a chart of all states other than "connected" would make sense. Instead, one would need logic that includes |
Right, that makes sense. It's a shame we didn't introduce the split metrics earlier. For now, we're mostly interested in the paused metric because this may highlight a problematic cluster behavior (too small in size) while the retiring signal is a little bit of noise and is usually harmless. So I don't have a solution for the "nice chart" problem but when using this as a tag, the retired metric as a standalone thing already makes sense. |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 29 files 29 suites 11h 4m 38s ⏱️ For more details on these failures, see this check. Results for commit 97675e9. ♻️ This comment has been updated with latest results. |
I'd be fine with removing the paused_or_retiring metric, my main concern was backward compatibility. We mostly care about paused as @fjetter said |
Closes #xxxx
pre-commit run --all-files
cc @ntabris for the grafana dashboards