Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc pod respawn will destroy Qemu processes (with pid 1) #4116

Open
senolcolak opened this issue Nov 7, 2023 · 3 comments
Open

runc pod respawn will destroy Qemu processes (with pid 1) #4116

senolcolak opened this issue Nov 7, 2023 · 3 comments

Comments

@senolcolak
Copy link

senolcolak commented Nov 7, 2023

Description

we are using runc on our k8s deployments that is running Openstack hypervisor on top of that. On Our compute nodes, libvirt pods are responsible for creating qemu instances via using the libvirtd service. Our installation stack was mainly on Centos 7, recently we begin to rollout our new clusters to Ubuntu 22.04. We figure out that on our recently installed (runc.1.1.7) clusters, killing the libvirt pod is also killing the qemu instances.

we need to revert this change

Steps to reproduce the issue

  1. k8s Openstack deployment with ubuntu 22.04 on the nodes or StarlingX deployment with recent runc version
  2. Create VM's on the compute nodes
  3. On the compute node delete the (daemonset) libvirt-pod which will be respawned again
  4. VM's will be gone

Describe the results you received and expected

we expect the instances to be running on the pod

openstack-node001-libvirt-pod:/# virsh list
 Id    Name                           State
----------------------------------------------------
 1     instance-0000269e              running
 2     instance-0000269b              running
 3     instance-00002698              running
 4     instance-00002695              running
 5     instance-00002692              running

but we got the following

openstack-node001-libvirt-pod:/# virsh list
 Id    Name                           State
----------------------------------------------------

What version of runc are you using?

our current version is 1.1.7
prior to version 1.1.6 we have this issue.

Host OS information

PRETTY_NAME="Ubuntu 22.04.2 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.2 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Host kernel information

Linux openstack-node001 5.15.0-60-generic #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

senolcolak referenced this issue Nov 7, 2023
Sometimes, the init process is not in the root cgroup.
This can be noted by GetInitPath, which already scrubs the path of `init.scope`.

This was encountered when trying to patch the Kubelet to handle systemd being in a separate cpuset
from root (to allow load balance disabling for containers). At present, there's no way to have libcontainer or runc
manage cgroups in a hierarchy outside of the one init is in (unless the path contains `init.scope`, which is limiting)

Signed-off-by: Peter Hunt <pehunt@redhat.com>
(cherry picked from commit 54e2021)
@lifubang
Copy link
Member

lifubang commented Nov 8, 2023

we already found the problem

Could you please give a detailed explanation how this commit impacts your case?

@kolyshkin
Copy link
Contributor

@senolcolak it looks like you found that it's commit 10cfd81 that breaks your use case, am I right? Can you explain in more details as to why?

Cc @haircommander

@senolcolak
Copy link
Author

@kolyshkin sorry for my late reply. I could not isolate the problem on a separated environment but the link I shared before was wrong. the real problem is in this commit
e4ce94e

the problem is when I create a process that has to be attached to the host system. (Qemu instance) the lifetime of the process depends on the pod lifetime.

basically we would need a process that will run on the host environment. Even if the pod is deleted and the cgroup is wiped the process should continue to run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants