-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instability when running in AWS EKS on some instance families #1310
Comments
I think EKS is still in cgroup v1. In this case, can you add If this works on your end, a kepler-doc PR is welcome! |
The dashboard is not well tested. There is another similar case here #1321 Can you check if the metrics e.g. |
What happened?
We were testing different deployments in an AWS EKS cluster to monitor which uses how much energy. Although we could see individual pods and containers in the Kepler Dashboard, all values were zero except for those of the system processes.
We also checked the metrics endpoint of the Kepler exporters directly, so it isn't just an issue with the Kepler Dashboard.
We ran some loadtests on our deployments, so it isn't just a problem with values rounded down to zero.
We repeated our tests with various AWS EC2 instance families, instance sizes and Kubernetes Versions (using eksctl to deploy the cluster) and whenever one combination of the parameters worked, when we re-deployed the cluster with the same parameters it didn't work anymore (once it even stopped working while the cluster continued to run).
What did you expect to happen?
I expected to see measurements all the time.
How can we reproduce it (as minimally and precisely as possible)?
Unfortunately, at this moment, we are not able to reproduce the bug consistently.
The setup which ceased to work while the cluster was running was with the Bottlerocket ami-family (which seems to be the only Linux distribution which is supported by eksctl and supports cgroup v2) on t3.large instances in eu-north-1 with Kubernetes Version 1.28.
Anything else we need to know?
No response
Kepler image tag
Kubernetes version
We witnessed the behaviour on 1.24, 1.28 and 1.29.
Cloud provider or bare metal
AWS
OS version
Install tools
eksctl and helm
Kepler deployment config
Container runtime (CRI) and version (if applicable)
No response
Related plugins (CNI, CSI, ...) and versions (if applicable)
No response
The text was updated successfully, but these errors were encountered: