New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e flake: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: failed to write #109182
Comments
@liggitt: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
kubelet log:
containerd log:
I prefer to think it is a container-related issue: |
/triage accepted /cc @kolyshkin @rphillips |
Was the runc (or containerd) binary updated on these jobs? |
Support for RDMA controller was indeed added to runc 1.1 (in opencontainers/runc#2883). The error message says that write failed. That write happens right after mkdir, which (apparently) succeeded. |
kubelet log also shows (this is the earliest mention of rdma):
Note that support for hybrid unified hierarchy was also appeared in runc 1.1. |
The removal failure happens because allegedly there is a process(es) left in rdma and unified cgroups which prevent the cgroup removal. I can't figure out why this can ever happen (kubelet does not know anything about rdma or unified, but it should not break things). My preliminary theory is, inability to write a pid to rdma is caused by too many rdma cgroups. In any case, we should figure out why rdma and unified are not empty upon removal. Following the source code, kubelet kills all the processes in these cgroups before trying to remove them, so I am puzzled. |
Any possibility of processes stuck in 'D' state? |
So, I added some debug in #109298 to see what is going on. Here is an excerpt from the kubelet log: Apr 05 01:54:38 kind-worker kubelet[261]: time="2022-04-05T01:54:38Z" level=error msg="Failed to remove cgroup" error="rmdir /sys/fs/cgroup/unified/kubelet/kubepods/burstable/pod1883213d8fec799ee2b7bf9f2185a5c7/5b078c521eefb476090c430cac51c128fabcd9094ebcaf0fa225d2b366c13c39: device or resource busy" All this means that
Adding more debug to #109298... My next two suspects are KIND and the kernel. As for KIND, I looked at the sources of the script that prepares cgroups, and found nothing bad. |
@mrunalp Looks like it's not it, cgroup.procs show no entries (nor any in the subdirectories) -- see the previous comment. |
@liggitt 👋 the release 1.24 bug triage shadow. While the test freeze phase is cut off tomorrow, do you think this issue will still be included in the current release? |
Until the issue is understood, it should remain in the milestone |
/assign @mrunalp |
@mrunalp lets catch up on why rdma is even an available controller on this host. rdma isnt in this allowed list: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cgroup_manager_linux.go#L260 |
@derekwaynecarr This "allowed list" is merely a way to specify controllers that must be present (I guess its naming is slightly misleading). IOW, the code you refer to ensures that memory, cpu etc paths do present. It has nothing to do with rdma or unified. runc creates cgroups for all supported controllers/subsystems (and add containers to all of them). What's unclear is why these cgroups can't be removed during destroy. I am still looking at it in #109298 (feeling under the weather today so it takes longer). |
One thing worth trying may be to see if we don't join the rdma controller, do we still hit the issue. |
@kolyshkin understood. rdma as an enabled cgroup controller on a target host for kubelet execution is what was new to me so I was wondering if there was a change to the test operating system configuration beyond just runc adding awareness. |
RDMA cgroup requires a kernel config parameter to be set. It is obviously set in Ubuntu kernels. In Fedora 35 kernels,
On CentOS Stream 9 kernel, it is set:
|
Not joining RDMA in vendored libcontainer did not help. It might help in runc binary (which I haven't done). Looking into the underlying cause. |
I'm not sure this is eliminated: =>
but CI should be using kind @ HEAD and kubernetes-sigs/kind#2709 merged two days ago |
reopening per #109182 (comment) to make sure this is resolved |
Looking at some of the issues around k8s/runc, I came across this issue where runc 1.1.0 didn't properly scope some cgroup objects. kubernetes/kubernetes#109182 Signed-off-by: Shane Jarych <sjarych@mirantis.com>
Looking at some of the issues around k8s/runc, I came across this issue where runc 1.1.0 didn't properly scope some cgroup objects. kubernetes/kubernetes#109182 Signed-off-by: Shane Jarych <sjarych@mirantis.com>
Looking at some of the issues around k8s/runc, I came across this issue where runc 1.1.0 didn't properly scope some cgroup objects. kubernetes/kubernetes#109182 Signed-off-by: Shane Jarych <sjarych@mirantis.com>
should that have actually closed this issue? not seeing how the linked commit in Ben's fork modified kind bringup |
Oh no, that's that GitHub "feature", I merely synced my fork to upstream but the commit contains "fixes". Not sure why CI didn't block this with the invalid-commit label. |
Ah, yes, those are always a pain. |
Potentially related: ... We should probably update the docker-in-docker in Kubernetes CI, it's going to have an outdated docker install and it's generally not well done, I've been meaning to clean that up ... https://github.com/kubernetes/test-infra/blob/master/images/krte/Dockerfile is still based on Debian Buster ... |
the tests that execute command on pods seems to be affected by this |
KIND's CI image is now on docker 20.10.15 / runc v1.1.1-0-g52de29d. Tentatively after this change we don't see any more loges about rdma cgroups, I've spot checked with a few (the CI dind is still naively done and I'm not sure what the underlying hosts are running currently, need to get back to that ...) |
https://storage.googleapis.com/k8s-triage/index.html?pr=1&text=unable%20to%20apply%20cgroup%20configuration&xjob=1-2 is indeed empty. https://storage.googleapis.com/k8s-triage/index.html?pr=1&text=unable%20to%20apply%20cgroup%20configuration on all jobs has a few, but those are:
|
The csi-driver-hostpath jobs are due to kubekins-e2e image not having the updated docker (and possibly not updated kind). CAPI is probably the same thing. These remaining flakes are rare and not affecting CI for this repo. |
@BenTheElder: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Looks like we just got a spike of a new run failure message in master: https://storage.googleapis.com/k8s-triage/index.html?pr=1&text=unable%20to%20apply%20cgroup%20configuration&xjob=1-2
Seen in https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/109178/pull-kubernetes-conformance-kind-ga-only-parallel/1509397620936675328
/milestone v1.24
/sig node
The text was updated successfully, but these errors were encountered: