New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet /stats/summary returns Zero from CPU usageNanoCores stats when more than one caller #122092
Comments
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
After debugging some in containerd (containerd/containerd#9531) we figured out that this is caused because previously in kubelet there was only one caller that calculated and stored the
Even though we have still have this flag on the kubelet side ( kubernetes/pkg/kubelet/stats/cri_stats_provider.go Lines 607 to 611 in b27670d
it is no longer honored in the way it was intended (containerd will always calculate the value and cached the latest value for every call). This means that every time getcontainer stats is called, including eviction manager or some tools going directly to the summary endpoint the values will be recalculated and stored. At first glance this looks like a containerd issue but containerd doesn't have the information that kubelet has (the flag from eviction manager that says to update the value consistently at 10 seconds). A few possibilities to solve this:
This is a problem from Windows today but not for Linux since the value is backfilled by cAdvisor still. When Moving to the CRIOnly this would likely become an issue for Linux as well. |
The downside for .1 will be a 10s usagenanocore data delay update, but on the other hand decouples the responsibility from the caller to manage an internal containerd cache, the outcome behavior is similar to the current situation. I like this approach of the container stats grpc endpoint being cache read-only. |
A fourth option is kubelet could do the calculation of usage nano cores, instead of CRI. We could deprecate and (eventually) drop the field so they're both not calculating. Another another option is we could hardcode the update period in the CRI similar to how it's done in the kubelet. |
I don't know that I have a strong opinion here, who else should we loop in to make this decision? Is this something that should be brought up at a node meeting? |
yeah let's chat about it there |
I've added to the sig-node agenda for today: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg |
@jsturtevant thanks for bringing this to today's SIG Node meeting. @haircommander can you summarize what you shared at today's meeting including the pros and cons for each options here? We will follow up with it. Thanks! cc/ @logicalhan |
we currently have something like 4 proposals:
A point that came up in SIG node: since the CRI spec allows a CRI impl to specify a timestamp for when the cpu metrics were collected, it would be quite tricky for the kubelet to have any control over usage nano seconds. point 4 couldn't work for the same reason it doesn't now: kubelet can't guarantee the period is long enough to have a non-zero usageNanoCores I think the simplest solution is to have the collection period a hardcoded or configured option in the CRI. That would mean the period is regular and likely long enough to have meaningful data about usageNanoCores. Note: I think doing point 3 could work as well, in tandem, to allow the kubelet to normalize the needed request to the CRI. This would also mean it wouldn't request the data more frequently than it's being generated. if the CRI is collecting and saving the stats every 10s, and kubelet requests every 5s, then the kubelet will be needlessly collecting half of the metrics it gathers. It would be up to the CRI or admin to choose a period that matches the kubelet's collection period |
The calculation of a rate within the kubelet/cadvisor was always somewhat imprecise. Since it is already part of the CRI API, I would just always use what comes from the CRI, and add let the container runtime decide the optimal caching strategy. |
Thanks for the discussion, It sounds like we should update Containerd to collect this periodically and returned the cached value. For now, we can hard code it to 10s. |
I've opened containerd/containerd#10010 with option 1. There were a few questions in containerd/containerd#10010 (comment). |
This started as a flakey test but was identified as related to changes to the way
usageNanoCores
are caulcated and used by kubelet. See #122092 (comment)Which jobs are flaking?
ci-kubernetes-e2e-capz-master-windows-serial-slow
Which tests are flaking?
[sig-windows] [Feature:Windows] Cpu Resources [Serial] Container limits should not be exceeded after waiting 2 minutes flaking
Since when has it been flaking?
~ November 1st
Testgrid link
https://testgrid.k8s.io/sig-windows-signal#capz-windows-master-serial-slow
Reason for failure (if possible)
STEP: Ensuring pods are still running - test/e2e/windows/cpu_limits.go:56 @ 11/27/23 02:07:27.458
STEP: Ensuring cpu doesn't exceed limit by >5% - test/e2e/windows/cpu_limits.go:76 @ 11/27/23 02:07:27.935
STEP: Gathering node summary stats - test/e2e/windows/cpu_limits.go:78 @ 11/27/23 02:07:27.935
Nov 27 02:07:28.181: INFO: Pod cpulimittest-eb76c25b-1fa9-41e7-a2c0-446bf6db30fb usage: 0
[FAILED] Pod cpu-resources-test-windows-9879/cpulimittest-eb76c25b-1fa9-41e7-a2c0-446bf6db30fb reported usage is 0, but it should be greater than 0
In [It] at: test/e2e/windows/cpu_limits.go:96 @ 11/27/23 02:07:28.182
Anything else we need to know?
example failure: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows-serial-slow/1728940368692514816
Relevant SIG(s)
/sig windows
The text was updated successfully, but these errors were encountered: