kubelet /stats/summary returns Zero from CPU usageNanoCores stats when more than one caller #122092

jsturtevant · 2023-11-28T17:32:38Z

This started as a flakey test but was identified as related to changes to the way usageNanoCores are caulcated and used by kubelet. See #122092 (comment)

Which jobs are flaking?

ci-kubernetes-e2e-capz-master-windows-serial-slow

Which tests are flaking?

[sig-windows] [Feature:Windows] Cpu Resources [Serial] Container limits should not be exceeded after waiting 2 minutes flaking

Since when has it been flaking?

~ November 1st

Testgrid link

https://testgrid.k8s.io/sig-windows-signal#capz-windows-master-serial-slow

Reason for failure (if possible)

STEP: Ensuring pods are still running - test/e2e/windows/cpu_limits.go:56 @ 11/27/23 02:07:27.458
STEP: Ensuring cpu doesn't exceed limit by >5% - test/e2e/windows/cpu_limits.go:76 @ 11/27/23 02:07:27.935
STEP: Gathering node summary stats - test/e2e/windows/cpu_limits.go:78 @ 11/27/23 02:07:27.935
Nov 27 02:07:28.181: INFO: Pod cpulimittest-eb76c25b-1fa9-41e7-a2c0-446bf6db30fb usage: 0
[FAILED] Pod cpu-resources-test-windows-9879/cpulimittest-eb76c25b-1fa9-41e7-a2c0-446bf6db30fb reported usage is 0, but it should be greater than 0
In [It] at: test/e2e/windows/cpu_limits.go:96 @ 11/27/23 02:07:28.182

Anything else we need to know?

example failure: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-capz-master-windows-serial-slow/1728940368692514816

Relevant SIG(s)

/sig windows

The text was updated successfully, but these errors were encountered:

k8s-ci-robot · 2023-11-28T17:32:45Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jsturtevant · 2024-02-13T21:55:26Z

After debugging some in containerd (containerd/containerd#9531) we figured out that this is caused because previously in kubelet there was only one caller that calculated and stored the UsageNanoCores as per

kubernetes/pkg/kubelet/stats/cri_stats_provider.go

Line 112 in 7ec1a89

// ListPodStatsAndUpdateCPUNanoCoreUsage updates the cpu nano core usage for

The implementation assumes a
// single caller to periodically invoke this function to update the metrics. If
// there exist multiple callers, the period used to compute the cpu usage may
// vary and the usage could be incoherent (e.g., spiky). If no caller calls
// this function, the cpu usage will stay nil. Right now, eviction manager is
// the only caller, and it calls this function every 10s.

Even though we have still have this flag on the kubelet side (

kubernetes/pkg/kubelet/stats/cri_stats_provider.go

Lines 607 to 611 in b27670d

    
           if updateCPUNanoCoreUsage { 
        
           	usageNanoCores = p.getAndUpdateContainerUsageNanoCores(stats) 
        
           } else { 
        
           	usageNanoCores = p.getContainerUsageNanoCores(stats) 
        
           }

)
it is no longer honored in the way it was intended (containerd will always calculate the value and cached the latest value for every call). This means that every time getcontainer stats is called, including eviction manager or some tools going directly to the summary endpoint the values will be recalculated and stored.

At first glance this looks like a containerd issue but containerd doesn't have the information that kubelet has (the flag from eviction manager that says to update the value consistently at 10 seconds).

A few possibilities to solve this:

containerd implements a background process that periodically updates the value every 10seconds and when this endpoint is called it always returns the cached value
add a new cri field like updatecache bool that can be passed
kubelet only calls the cri endpoint when the "update" flag is called, caches at kubelet layer and then on other calls to the summary endpoint it loads from the cache

This is a problem from Windows today but not for Linux since the value is backfilled by cAdvisor still. When Moving to the CRIOnly this would likely become an issue for Linux as well.

/cc @knabben @bobbypage @haircommander

knabben · 2024-02-14T14:33:43Z

The downside for .1 will be a 10s usagenanocore data delay update, but on the other hand decouples the responsibility from the caller to manage an internal containerd cache, the outcome behavior is similar to the current situation.

I like this approach of the container stats grpc endpoint being cache read-only.

haircommander · 2024-02-14T20:18:24Z

A fourth option is kubelet could do the calculation of usage nano cores, instead of CRI. We could deprecate and (eventually) drop the field so they're both not calculating.

Another another option is we could hardcode the update period in the CRI similar to how it's done in the kubelet.

jsturtevant · 2024-02-14T22:43:32Z

I don't know that I have a strong opinion here, who else should we loop in to make this decision? Is this something that should be brought up at a node meeting?

haircommander · 2024-02-15T16:32:52Z

yeah let's chat about it there

jsturtevant · 2024-02-20T16:40:05Z

I've added to the sig-node agenda for today: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg

dchen1107 · 2024-02-21T06:45:05Z

@jsturtevant thanks for bringing this to today's SIG Node meeting. @haircommander can you summarize what you shared at today's meeting including the pros and cons for each options here? We will follow up with it. Thanks!

cc/ @logicalhan

haircommander · 2024-02-21T14:33:06Z

we currently have something like 4 proposals:

CRI impl implements a background process that periodically updates the value every 10seconds and when this endpoint is called it always returns the cached value
add a new cri field like updatecache bool that can be passed
kubelet only calls the cri endpoint when the "update" flag is called, caches at kubelet layer and then on other calls to the summary endpoint it loads from the cache
kubelet could do the calculation of usage nano cores, instead of CRI

A point that came up in SIG node: since the CRI spec allows a CRI impl to specify a timestamp for when the cpu metrics were collected, it would be quite tricky for the kubelet to have any control over usage nano seconds. point 4 couldn't work for the same reason it doesn't now: kubelet can't guarantee the period is long enough to have a non-zero usageNanoCores

I think the simplest solution is to have the collection period a hardcoded or configured option in the CRI. That would mean the period is regular and likely long enough to have meaningful data about usageNanoCores.

Note: I think doing point 3 could work as well, in tandem, to allow the kubelet to normalize the needed request to the CRI. This would also mean it wouldn't request the data more frequently than it's being generated. if the CRI is collecting and saving the stats every 10s, and kubelet requests every 5s, then the kubelet will be needlessly collecting half of the metrics it gathers. It would be up to the CRI or admin to choose a period that matches the kubelet's collection period

dashpole · 2024-02-21T15:30:01Z

The calculation of a rate within the kubelet/cadvisor was always somewhat imprecise. Since it is already part of the CRI API, I would just always use what comes from the CRI, and add let the container runtime decide the optimal caching strategy.

jsturtevant · 2024-03-06T00:12:16Z

Thanks for the discussion, It sounds like we should update Containerd to collect this periodically and returned the cached value. For now, we can hard code it to 10s.

jsturtevant · 2024-04-17T23:25:23Z

I've opened containerd/containerd#10010 with option 1.

There were a few questions in containerd/containerd#10010 (comment).

jsturtevant added the kind/flake Categorizes issue or PR as related to a flaky test. label Nov 28, 2023

k8s-ci-robot added sig/windows Categorizes an issue or PR as relevant to SIG Windows. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 28, 2023

knabben mentioned this issue Dec 6, 2023

Pod CPU usage check from Windows node summary #122196

Closed

knabben mentioned this issue Dec 13, 2023

Zero values from CPU usageNanoCores stats containerd/containerd#9531

Open

jsturtevant changed the title ~~[sig-windows] [Feature:Windows] Cpu Resources [Serial] Container limits should not be exceeded after waiting 2 minutes flaking~~ kubelet /stats/summary returns Zero from CPU usageNanoCores stats when more than one caller Feb 13, 2024

jsturtevant mentioned this issue Mar 28, 2024

Add stats monitor for calculating UsageNanoCores periodically containerd/containerd#10010

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet /stats/summary returns Zero from CPU usageNanoCores stats when more than one caller #122092

kubelet /stats/summary returns Zero from CPU usageNanoCores stats when more than one caller #122092

jsturtevant commented Nov 28, 2023 •

edited

k8s-ci-robot commented Nov 28, 2023

jsturtevant commented Feb 13, 2024

knabben commented Feb 14, 2024

haircommander commented Feb 14, 2024 •

edited

jsturtevant commented Feb 14, 2024

haircommander commented Feb 15, 2024

jsturtevant commented Feb 20, 2024

dchen1107 commented Feb 21, 2024

haircommander commented Feb 21, 2024

dashpole commented Feb 21, 2024

jsturtevant commented Mar 6, 2024

jsturtevant commented Apr 17, 2024

kubelet /stats/summary returns Zero from CPU usageNanoCores stats when more than one caller #122092

kubelet /stats/summary returns Zero from CPU usageNanoCores stats when more than one caller #122092

Comments

jsturtevant commented Nov 28, 2023 • edited

Which jobs are flaking?

Which tests are flaking?

Since when has it been flaking?

Testgrid link

Reason for failure (if possible)

Anything else we need to know?

Relevant SIG(s)

k8s-ci-robot commented Nov 28, 2023

jsturtevant commented Feb 13, 2024

knabben commented Feb 14, 2024

haircommander commented Feb 14, 2024 • edited

jsturtevant commented Feb 14, 2024

haircommander commented Feb 15, 2024

jsturtevant commented Feb 20, 2024

dchen1107 commented Feb 21, 2024

haircommander commented Feb 21, 2024

dashpole commented Feb 21, 2024

jsturtevant commented Mar 6, 2024

jsturtevant commented Apr 17, 2024

jsturtevant commented Nov 28, 2023 •

edited

haircommander commented Feb 14, 2024 •

edited