api, cgroupv2: skip setting the devices cgroup #2474

giuseppe · 2020-06-16T13:58:08Z

The Kubelet uses libcontainer/cgroups to setup cgroups. It would be nice to have a way to skip setting the devices cgroup at all.

mrunalp · 2020-06-16T13:59:56Z

@kolyshkin @AkihiroSuda fyi

cyphar · 2020-06-17T07:35:08Z

😬 Is there a reason for wanting to skip the devices cgroup? The hard requirement for the devices cgroup is to make sure we have a fail-secure setup (if you don't set the devices whitelist, you will allow users to do all sorts of scary things). We could make it so that this requirement is only present for runc but I'm not sure how ugly that would become...

giuseppe · 2020-06-17T07:48:11Z

the Kubelet uses the libcontainer code to create the parent cgroups (e.g. /sys/fs/cgroup/kubepods). Each container will have its own cgroup under /sys/fs/cgroup/kubepods where the device cgroup is configured.
I don't think it is a problem on cgroup v1, but on cgroup v2, every time the Kubelet restarts and tries to configure the cgroup, it leaks an eBPF program

cyphar · 2020-06-17T07:52:05Z

Ah okay. Yeah we might need to add a way to configure that. The only important thing is that this should be strictly opt-out with a fairly large warning sign next to the configuration option.

kolyshkin · 2020-07-01T15:38:28Z

Addressed this one in #2490

cyphar · 2020-07-03T05:55:58Z

As discussed in #2490, I believe that the cgroupv2 eBPF issue is a runc bug that we should fix anyway (tracked by #2366). Would the Kubernetes folks still need this if we fixed the cgroupv2 eBPF issue?

giuseppe · 2020-07-03T07:16:12Z

the devices cgroup is not used at all by Kubernetes: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/container_manager_linux.go#L378-L385

Since the cost of running such eBPF program is close to 0, I am fine if the underlying issue is addressed instead of offering a different API.

odinuge · 2021-01-25T13:40:31Z

the devices cgroup is not used at all by Kubernetes:

Well, that is kinda true, but only applies to kubelet and its usage of libcontainer.

However, Kubernetes is implicitly using it via the CRI api since most CRI implementations use runc by default. As I understand the cgroup.SkipDevices, is only for users of the go api, not the binary.

When running kubernetes with with eg. containerd as the runtime (via CRI), it uses the runc binary. Kubernetes is also constantly updating the cpus-set for containers via a component called the "CPU Manager". When trying to run that setup with cgroup v2, containerd start replying to container resource updates (CRI update consists of only CPUset, but unsure what command containerd sends to runc, but can take a look at that) with runc did not terminate successfully: failed to call BPF_PROG_ATTACH (BPF_CGROUP_DEVICE, BPF_F_ALLOW_MULTI): argument list too long. This makes it essentially block cpuset updates for containers when using runc.

Can x-post this to #2366 tho, since it might be useful in that context.

xiaoxubeii · 2021-04-01T12:42:51Z

the devices cgroup is not used at all by Kubernetes:

Well, that is kinda true, but only applies to kubelet and its usage of libcontainer.

However, Kubernetes is implicitly using it via the CRI api since most CRI implementations use runc by default. As I understand the cgroup.SkipDevices, is only for users of the go api, not the binary.

When running kubernetes with with eg. containerd as the runtime (via CRI), it uses the runc binary. Kubernetes is also constantly updating the cpus-set for containers via a component called the "CPU Manager". When trying to run that setup with cgroup v2, containerd start replying to container resource updates (CRI update consists of only CPUset, but unsure what command containerd sends to runc, but can take a look at that) with runc did not terminate successfully: failed to call BPF_PROG_ATTACH (BPF_CGROUP_DEVICE, BPF_F_ALLOW_MULTI): argument list too long. This makes it essentially block cpuset updates for containers when using runc.

Can x-post this to #2366 tho, since it might be useful in that context.

Yes, I think it is no way to set cgroup.SkipDevices directly by CRI api. There is a bit tricky for Kubernetes to use cgroups v2 : (

bharathguvvala · 2021-06-08T06:09:45Z

We are seeing the update cpu resources flow breaking in kubernetes as mentioned in this comment with runc 1.0.0-rc93. Should this issue be reopened?

bharathguvvala · 2021-06-08T06:11:41Z

We are seeing the update cpu resources flow breaking in kubernetes as mentioned in this comment with runc 1.0.0-rc93. Should this issue be reopened?

@kolyshkin @AkihiroSuda can we discuss approaches to fix this, so that this will not be an issue for kubernetes?

cyphar · 2021-06-08T12:07:42Z

If CRI is using runc update, #2994 should've already fix this issue -- runc update now skips devices cgroup updates.

But if CRI implementations are calling something other than runc update that seems a bit questionable -- I'd really prefer to not add a new runc flag to disable a security feature (we already have --no-pivot-root and --no-new-keyring -- both of which have caused more headaches than we'd like, because people use them rather than running runc in a usable environment, reducing the security of their containers).

bharathguvvala · 2021-06-09T10:00:59Z

Thanks @cyphar -for the comment. I have verified it with the dev build from master and the issue seems to be absent with the fix #2951. May I know in when this is expected to rollout as part of a release.

cyphar · 2021-06-09T10:22:56Z

We're working on a 1.0.0 release at the moment, it was going to be released last week but there's a regression under Docker's CI (we've fixed the issue though -- see #3009). I would expect a new release in a week or so.

AkihiroSuda added area/cgroupv2 area/go-api libcontainer Go API enhancement labels Jun 17, 2020

kolyshkin mentioned this issue Jun 30, 2020

libct/cgroups: add SkipDevices to Resources #2490

Merged

cyphar mentioned this issue Jul 3, 2020

cgroup: devices updates appear to be broken #2366

Closed

2 tasks

mrunalp closed this as completed in #2490 Jul 6, 2020

odinuge mentioned this issue Jan 25, 2021

Fix cgroup handling for systemd with cgroup v2 kubernetes/kubernetes#98365

Merged

kolyshkin added this to the 1.0.0-rc93 milestone Feb 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api, cgroupv2: skip setting the devices cgroup #2474

api, cgroupv2: skip setting the devices cgroup #2474

giuseppe commented Jun 16, 2020

mrunalp commented Jun 16, 2020

cyphar commented Jun 17, 2020 •

edited

giuseppe commented Jun 17, 2020

cyphar commented Jun 17, 2020

kolyshkin commented Jul 1, 2020

cyphar commented Jul 3, 2020

giuseppe commented Jul 3, 2020

odinuge commented Jan 25, 2021

xiaoxubeii commented Apr 1, 2021 •

edited

bharathguvvala commented Jun 8, 2021 •

edited

bharathguvvala commented Jun 8, 2021

cyphar commented Jun 8, 2021 •

edited

bharathguvvala commented Jun 9, 2021

cyphar commented Jun 9, 2021

api, cgroupv2: skip setting the devices cgroup #2474

api, cgroupv2: skip setting the devices cgroup #2474

Comments

giuseppe commented Jun 16, 2020

mrunalp commented Jun 16, 2020

cyphar commented Jun 17, 2020 • edited

giuseppe commented Jun 17, 2020

cyphar commented Jun 17, 2020

kolyshkin commented Jul 1, 2020

cyphar commented Jul 3, 2020

giuseppe commented Jul 3, 2020

odinuge commented Jan 25, 2021

xiaoxubeii commented Apr 1, 2021 • edited

bharathguvvala commented Jun 8, 2021 • edited

bharathguvvala commented Jun 8, 2021

cyphar commented Jun 8, 2021 • edited

bharathguvvala commented Jun 9, 2021

cyphar commented Jun 9, 2021

cyphar commented Jun 17, 2020 •

edited

xiaoxubeii commented Apr 1, 2021 •

edited

bharathguvvala commented Jun 8, 2021 •

edited

cyphar commented Jun 8, 2021 •

edited