runc pod respawn will destroy Qemu processes (with pid 1) #4116

senolcolak · 2023-11-07T17:05:20Z

Description

we are using runc on our k8s deployments that is running Openstack hypervisor on top of that. On Our compute nodes, libvirt pods are responsible for creating qemu instances via using the libvirtd service. Our installation stack was mainly on Centos 7, recently we begin to rollout our new clusters to Ubuntu 22.04. We figure out that on our recently installed (runc.1.1.7) clusters, killing the libvirt pod is also killing the qemu instances.

we need to revert this change

Steps to reproduce the issue

k8s Openstack deployment with ubuntu 22.04 on the nodes or StarlingX deployment with recent runc version
Create VM's on the compute nodes
On the compute node delete the (daemonset) libvirt-pod which will be respawned again
VM's will be gone

Describe the results you received and expected

we expect the instances to be running on the pod

openstack-node001-libvirt-pod:/# virsh list
 Id    Name                           State
----------------------------------------------------
 1     instance-0000269e              running
 2     instance-0000269b              running
 3     instance-00002698              running
 4     instance-00002695              running
 5     instance-00002692              running

but we got the following

openstack-node001-libvirt-pod:/# virsh list
 Id    Name                           State
----------------------------------------------------

What version of runc are you using?

our current version is 1.1.7
prior to version 1.1.6 we have this issue.

Host OS information

PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Host kernel information

Linux openstack-node001 5.15.0-60-generic #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

Sometimes, the init process is not in the root cgroup. This can be noted by GetInitPath, which already scrubs the path of `init.scope`. This was encountered when trying to patch the Kubelet to handle systemd being in a separate cpuset from root (to allow load balance disabling for containers). At present, there's no way to have libcontainer or runc manage cgroups in a hierarchy outside of the one init is in (unless the path contains `init.scope`, which is limiting) Signed-off-by: Peter Hunt <pehunt@redhat.com> (cherry picked from commit 54e2021)

lifubang · 2023-11-08T05:19:06Z

we already found the problem

Could you please give a detailed explanation how this commit impacts your case?

kolyshkin · 2023-11-10T00:54:18Z

@senolcolak it looks like you found that it's commit 10cfd81 that breaks your use case, am I right? Can you explain in more details as to why?

Cc @haircommander

senolcolak · 2023-11-27T18:05:56Z

@kolyshkin sorry for my late reply. I could not isolate the problem on a separated environment but the link I shared before was wrong. the real problem is in this commit
e4ce94e

the problem is when I create a process that has to be attached to the host system. (Qemu instance) the lifetime of the process depends on the pod lifetime.

basically we would need a process that will run on the host environment. Even if the pod is deleted and the cgroup is wiped the process should continue to run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runc pod respawn will destroy Qemu processes (with pid 1) #4116

runc pod respawn will destroy Qemu processes (with pid 1) #4116

senolcolak commented Nov 7, 2023 •

edited

lifubang commented Nov 8, 2023

kolyshkin commented Nov 10, 2023

senolcolak commented Nov 27, 2023

runc pod respawn will destroy Qemu processes (with pid 1) #4116

runc pod respawn will destroy Qemu processes (with pid 1) #4116

Comments

senolcolak commented Nov 7, 2023 • edited

Description

Steps to reproduce the issue

Describe the results you received and expected

What version of runc are you using?

Host OS information

Host kernel information

lifubang commented Nov 8, 2023

kolyshkin commented Nov 10, 2023

senolcolak commented Nov 27, 2023

senolcolak commented Nov 7, 2023 •

edited