Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alpine] docker top, runc ps fail with cgroup2 with: unable to get all container pids #4097

Open
kholmanskikh opened this issue Oct 26, 2023 · 7 comments

Comments

@kholmanskikh
Copy link

Description

docker top and runc ps fail with:

alpine:~$ docker top 09e847645eec
Error response from daemon: runc did not terminate successfully: exit status 1: unable to get all container pids: read /sys/fs/cgroup/docker/09e847645eec8091d041c27b5ff969825b10155b60ca00230043c87764884135/cgroup.procs: operation not supported
: unknown

~ # runc --root /run/docker/runtime-runc/moby ps 09e847645eec8091d041c27b5ff969825b10155b60ca00230043c87764884135
ERRO[0000] unable to get all container pids: read /sys/fs/cgroup/docker/09e847645eec8091d041c27b5ff969825b10155b60ca00230043c87764884135/cgroup.procs: operation not supported 
~ # 

when the system has cgroup2 mounted as:

alpine:~$ mount|grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
alpine:~$ 

and this does not happen when cgroup v1 is mounted (in addition to, or instead of cgroup v2).

The issue was found on Alpine Edge with packages:

alpine:~$ apk list -I|grep -E 'runc|docker|containerd'|sort
containerd-1.7.7-r2 x86_64 {containerd} (Apache-2.0) [installed]
containerd-openrc-1.7.7-r2 x86_64 {containerd} (Apache-2.0) [installed]
docker-24.0.6-r4 x86_64 {docker} (Apache-2.0) [installed]
docker-cli-24.0.6-r4 x86_64 {docker} (Apache-2.0) [installed]
docker-cli-buildx-0.11.2-r3 x86_64 {docker-cli-buildx} (Apache-2.0) [installed]
docker-engine-24.0.6-r4 x86_64 {docker} (Apache-2.0) [installed]
docker-openrc-24.0.6-r4 x86_64 {docker} (Apache-2.0) [installed]
runc-1.1.9-r2 x86_64 {runc} (Apache-2.0) [installed]
alpine:~$ 

Alpine uses openrc, which allows to specify the cgroup mount strategy in /etc/rc.conf:

# This sets the mode used to mount cgroups.
# "hybrid" mounts cgroups version 2 on /sys/fs/cgroup/unified and
# cgroups version 1 on /sys/fs/cgroup.
# "legacy" mounts cgroups version 1 on /sys/fs/cgroup
# "unified" mounts cgroups version 2 on /sys/fs/cgroup
#rc_cgroup_mode="unified"

and the issue mentioned above is observed when rc_cgroup_mode is unified:

alpine:~$ mount|grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
alpine:~$ 

and is not observed when it's legacy:

alpine:~$ mount|grep cgroup
cgroup_root on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755,inode64)
openrc on /sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/rc/sh/cgroup-release-agent.sh,name=openrc)
cpuset on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cpu on /sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cpuacct on /sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct)
blkio on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
memory on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
devices on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
freezer on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
net_cls on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
perf_event on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
net_prio on /sys/fs/cgroup/net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio)
hugetlb on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
pids on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
alpine:~$ 

or hybrid:

alpine:~$ mount|grep cgroup
cgroup_root on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,relatime,size=10240k,mode=755,inode64)
openrc on /sys/fs/cgroup/openrc type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/lib/rc/sh/cgroup-release-agent.sh,name=openrc)
none on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cpuset on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cpu on /sys/fs/cgroup/cpu type cgroup (rw,nosuid,nodev,noexec,relatime,cpu)
cpuacct on /sys/fs/cgroup/cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct)
blkio on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
memory on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
devices on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
freezer on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
net_cls on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
perf_event on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
net_prio on /sys/fs/cgroup/net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio)
hugetlb on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
pids on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
alpine:~$ 

Steps to reproduce the issue

  1. Start any container with docker run -it --rm <any container>
  2. execute docker top <container id> or runc --root /run/docker/runtime-runc/moby ps <container id>

Describe the results you received and expected

The command should display a list of processes in the container.

What version of runc are you using?

runc version 1.1.9
commit: 82f18fe
spec: 1.0.2-dev
go: go1.21.3
libseccomp: 2.5.4

Host OS information

NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.19_alpha20230901
PRETTY_NAME="Alpine Linux edge"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"

Host kernel information

Linux alpine 6.1.59-0-lts #1-Alpine SMP PREEMPT_DYNAMIC Fri, 20 Oct 2023 06:24:46 +0000 x86_64 Linux

@kholmanskikh
Copy link
Author

The issue is reproducible with runc taken from the main git branch.

@AkihiroSuda AkihiroSuda changed the title docker top, runc ps fail with cgroup2 with: unable to get all container pids [Alpine] docker top, runc ps fail with cgroup2 with: unable to get all container pids Oct 26, 2023
bell-sw pushed a commit to bell-sw/alpaquita-aports that referenced this issue Oct 26, 2023
openrc-0.51-r0 switched the default rc_cgroup_mode from hybrid
to unified. This revealed an issue with `docker top` which
is critical for our infrastructure:

opencontainers/runc#4097

While the issue is being investigated, we are reverting
the mode back to hybrid to make `docker top` work with
the default openrc configuration.
@kolyshkin
Copy link
Contributor

@kholmanskikh can you please check and confirm/deny that this is because of nsdelegate option to cgroupv2 mount?

@kholmanskikh
Copy link
Author

The issue is also reproducible when the cgroup2 is mounted without the nsdelegate option:

alpine:~$ mount|grep cgroup
none on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
alpine:~$ docker run --rm -it -d alpine
2babd8f8f743beea96d6f2fba02de19036e0f734d8c1d249ac694b8ad501f0e6
alpine:~$ docker top 2babd8f8f743beea96d6f2fba02de19036e0f734d8c1d249ac694b8ad501f0e6
Error response from daemon: runc did not terminate successfully: exit status 1: unable to get all container pids: read /sys/fs/cgroup/docker/2babd8f8f743beea96d6f2fba02de19036e0f734d8c1d249ac694b8ad501f0e6/cgroup.procs: operation not supported
: unknown
alpine:~$ 

@ncopa
Copy link

ncopa commented Dec 13, 2023

related downstream issues:

It also fails to start containers with --memory option:

$ docker run --rm -it --memory 2G alpine
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: cannot enter cgroupv2 "/sys/fs/cgroup/docker" with domain controllers -- it is in domain threaded mode: unknown.

In this case I have a daemon.json:

{
        "storage-driver": "overlay2",
        "cgroup-parent": "/docker"
}

EDIT: but if I use:

{
  "cgroup-parent": "/dockerContainers"
}

It actually works.

@ncopa
Copy link

ncopa commented Dec 15, 2023

Could it be that runc sets docker/cgroup.type to domain threaded?

If I restart the docker daemon, it will initially be domain, but after first run container it changes to domain threaded:

ncopa-desktop:~$ doas /etc/init.d/docker start
 * Starting Docker Daemon ...                                                                 [ ok ]
ncopa-desktop:~$ cat /sys/fs/cgroup/docker/cgroup.type 
domain
ncopa-desktop:~$ docker run --rm alpine echo hello
hello
ncopa-desktop:~$ cat /sys/fs/cgroup/docker/cgroup.type 
domain threaded

Why does it end up with setting cgroup type as domain threaded?

@ncopa
Copy link

ncopa commented Dec 15, 2023

I found out that docker itself does not create /sys/fs/cgroup/docker. It is openrc that creates this.

It seems that also docker's default cgroup-parent also is docker. I think what happens here is that docker and openrc are stepping on each others toes.

@tbayart
Copy link

tbayart commented Jan 14, 2024

Hi, i have the same issue under Portainer.
I installer Alpine linux x64 and when i want to look at container stats in Portainer, i get the following error

"runc did not terminate successfully: exit status 1: unable to get all container pids: read /sys/fs/cgroup/docker/c7fe07c5253dba763ce8fde71945c3a5ac32998ae50dc1345dba7cffd6fab5fa/cgroup.procs: operation not supported: unknown"

I have many containers running fine for a while now but i'm unable to get stats

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants