Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubevirtv1.2.0 does not work on k8s v1.28.9, when creating a vm reported error message: {"component":"virt-launcher-monitor","level":"error","msg":"failed to run virt-launcher","pos":"virt-launcher-monitor.go:181","reason":"fork/exec /usr/bin/virt-launcher: operation not permitted","timestamp":"2024-04-24T11:53:56.790431Z"} #11784

Open
wingying opened this issue Apr 24, 2024 · 36 comments
Labels

Comments

@wingying
Copy link

What happened:
kubevirtv1.2.0 does not work on k8s v1.28.9, when creating a vm reported error message: {"component":"virt-launcher-monitor","level":"error","msg":"failed to run virt-launcher","pos":"virt-launcher-monitor.go:181","reason":"fork/exec /usr/bin/virt-launcher: operation not permitted","timestamp":"2024-04-24T11:53:56.790431Z"}

I also tested kubevirtv1.2.0+v1.27.11, same error message above.
Is it will related docker or containerd version? or anything else?

As I posted issue below: Only passed test is kubevirtv1.1.1 + k8s v1.25.6

Environment:

  • KubeVirt version: v1.2.0

  • Kubernetes version (use kubectl version):
    Client Version: v1.28.9
    Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
    Server Version: v1.28.9

  • OS (e.g. from /etc/os-release):
    PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
    NAME="Debian GNU/Linux"
    VERSION_ID="12"
    VERSION="12 (bookworm)"
    VERSION_CODENAME=bookworm
    ID=debian
    HOME_URL="https://www.debian.org/"
    SUPPORT_URL="https://www.debian.org/support"
    BUG_REPORT_URL="https://bugs.debian.org/"

  • Kernel (e.g. uname -a):
    Linux cdp 6.1.0-18-amd64 Add travis support #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux

  • Install tools: N/A
    Client: Docker Engine - Community
    Version: 24.0.7
    API version: 1.43
    Go version: go1.20.10
    Git commit: afdd53b
    Built: Thu Oct 26 09:08:02 2023
    OS/Arch: linux/amd64
    Context: default

Server: Docker Engine - Community
Engine:
Version: 24.0.7
API version: 1.43 (minimum version 1.12)
Go version: go1.20.10
Git commit: 311b9ff
Built: Thu Oct 26 09:08:02 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.27
GitCommit: a1496014c916f9e62104b33d1bb5bd03b0858e59
runc:
Version: 1.1.11
GitCommit: v1.1.11-0-g4bccb38
docker-init:
Version: 0.19.0
GitCommit: de40ad0

  • Others: N/A
    containerd containerd.io 1.6.27 a1496014c916f9e62104b33d1bb5bd03b0858e59
@aburdenthehand
Copy link
Contributor

/cc @xpivarc

@xpivarc
Copy link
Member

xpivarc commented Apr 24, 2024

Hi @wingying would you be able to provide audit logs from a node that the VM failed to start?
Also could you post virt-launcher POD yaml from working version and the version that doesn't work?

@wingying
Copy link
Author

Hi,

Some updates...

we enable - ROOT in featureGates, all works!! Could you explain why we have to add - ROOT featureGate, looks like it gets the high privilege of k8s. we do not need it in kubevirtv1.1.1 and before...

configuration:
      developerConfiguration:
        featureGates:
        - Root

@xpivarc
Copy link
Member

xpivarc commented Apr 24, 2024

Hi,
We are going to remove the Root feature gate. We made this obsolete in #8563 in order to improve security. The feature gate should not be needed.
Now I don't see any significant changes in 1.2.0 ... 1.1.1.

Please provide the logs and yamls in order to be able to investigate further.

@wingying
Copy link
Author

@wingying
Copy link
Author

update:
I increased the log verbose level, and attached the error virt-launcher pod virt-launcher-notwork.txt in failed phase.

@xpivarc
Copy link
Member

xpivarc commented Apr 25, 2024

Hi @wingying
I see that in a working case you are using our images while for the other you are using internal version. Could it be that this images is not 1:1 (in other words mirror) but a custom build?

@wingying
Copy link
Author

no customized build...just pull image from original registry, and retag and push to our internal harbor. no other changes for image itself.

I curious that no one reported the issue for new released v1.2.0 version? It nearly failure on all latest k8s version...

@xpivarc
Copy link
Member

xpivarc commented Apr 25, 2024

Would you be able to use the original image, just to be sure?

@wingying
Copy link
Author

@xpivarc
Copy link
Member

xpivarc commented Apr 25, 2024

Please add annotation "kubevirt.io/keep-launcher-alive-after-failure": "true" to the VMI and run it. This should keep the process running and allow you to execute arbitrary commands.
Please run getcap /usr/bin/virt-launcher.

@wingying
Copy link
Author

no result?
image

@xpivarc
Copy link
Member

xpivarc commented Apr 26, 2024

@wingying Great, we are getting somewhere. This means there is no file capability while it should be there. I verified that the file capability is there (locally) so this needs to be runtime/fs issue. Can you share what fs type are you using for your containers? Also can you find a backing source of the launcher image and see if the virt-launcher-monitor has the file capability there?

@wingying
Copy link
Author

wingying commented Apr 26, 2024

@xpivarc

Firstly, thank you for your continues support!!

for fstype see below:
image

Also can you find a backing source of the launcher image and see if the virt-launcher-monitor has the file capability there?
you mentioned, how to check?

@xpivarc
Copy link
Member

xpivarc commented Apr 26, 2024

Also can you find a backing source of the launcher image and see if the virt-launcher-monitor has the file capability there?
you mentioned, how to check?

You can run find / -name virt-launcher-monitor on the host and that will find both the underlay and overlay binary. Then you can run the getcap again to see if the underlay has it. While I believe there will be no file capability as the FS should handle it.

Once you confirm there is no capability even on the underlay you can do following.
buildah from <your not working image> - this will give you id for the image, please save it
buildah unshare
buildah mount <the id from 1.step> - this will output the location of the mount
Now you need to find the binary in the path outputted by the previous step and verify that you can see the file capability getcap <>
This will tell us if your image actually has the capability and further narrows down where the problem is...

@wingying
Copy link
Author

wingying commented Apr 26, 2024

@xpivarc looks only overlay found.
image
and when run getcap, no file found.
image

@xpivarc
Copy link
Member

xpivarc commented Apr 26, 2024

@wingying Please also try the steps described previously with buildah

@wingying
Copy link
Author

@xpivarc what id I should use? is the warning related?
image

@wingying
Copy link
Author

@xpivarc below step?
image

@wingying
Copy link
Author

wingying commented Apr 26, 2024

@xpivarc nearly to the cause? see below.

image

@xpivarc
Copy link
Member

xpivarc commented Apr 26, 2024

@xpivarc below step? image

Here it is enough to do getcap /var/...../merged/usr/bin/virt-launcher-monitor

@wingying
Copy link
Author

wingying commented Apr 26, 2024

@xpivarc see above reply. I hightlighted as red.

In addition, I tried to log on previous working one v1.1.1 in k8s 1.25.16, getcap has no result as neither...but why it is working?

image

@xpivarc
Copy link
Member

xpivarc commented Apr 26, 2024

@xpivarc see above reply. I hightlighted as read.

I missed it. Ok so the capability is there.

In addition, I tried to log on previous working one v1.1.1 in k8s 1.25.16, getcap has no result as neither...but why it is working?

Interesting. Are you running Root feature on the v1.1.1? Please run id inside the working virt-launcher and cat /proc<pid of virt-launcher-monitor>/status

@wingying
Copy link
Author

wingying commented Apr 26, 2024

@xpivarc see above reply. I hightlighted as read.

I missed it. Ok so the capability is there.

In addition, I tried to log on previous working one v1.1.1 in k8s 1.25.16, getcap has no result as neither...but why it is working?

Interesting. Are you running Root feature on the v1.1.1? Please run id inside the working virt-launcher and cat /proc<pid of virt-launcher-monitor>/status

My fault. I tried to reset the kubevirt v1.1.1 on k8s 1.25.16 environment. now below is the correct result. Seems /usr/bin/virt-lanucher-monitor has cap while virt-launcher not
image

@wingying
Copy link
Author

while in kubevirt1.2.0 on k8s 1.28.9, still neither launcher or launcher-monitor has cap.
image

@xpivarc
Copy link
Member

xpivarc commented Apr 26, 2024

My fault. I tried to reset the kubevirt v1.1.1 on k8s 1.25.16 environment. now below is the correct result. Seems /usr/bin/virt-lanucher-monitor has cap while virt-launcher not

The capability is expected only on the -monitor so it is as expected. I started to worry that you are showing me some kind of magic.
Now would you be able to tell me what is different about the working and not working cluster? Runtime used, version of the runtime and so on...

@xpivarc
Copy link
Member

xpivarc commented Apr 26, 2024

Also, I should point out at this point that the Kubevirt is not at fault but the environment is wrong here.

@wingying
Copy link
Author

@xpivarc
I already posted few issues previously days if you noticed..at the beginning, I run kubevirtv1.2.0 on k8s 1.25.16, (same issue with kubevirtv1.1.1 on k8s 1.28.9 tested now) but you said k8s 1.25.16 is out of support, so I have to upgraded k8s 1.25.16 to 1.28.9, then same issue. Now I guess it is not k8s issue.

In summary. kubevirtv1.1.1 is working on k8s 1.25.16 while kubevirtv1.2.0 not working on from k8s 1.25.16 + version (I tested several k8s versions)

Below is other environment info:
debian version:12.5
docker version:24.0.7 (when upgraded to k8s 1.28.9, I also try to upgrade docker version to 25.0.5, but no use)
cri-dockerd version: cri-dockerd 0.3.12 (when upgraded to k8s 1.28.9, I also try to upgrade cri-dockerd version to cri-dockerd 0.3.13, but no use)

Again, in k8s 1.25.16, kubevirtv1.1.1 works while kubevirtv1.2.0 not work(unless configure - Root) other things are same...

@xpivarc
Copy link
Member

xpivarc commented Apr 26, 2024

@xpivarc I already posted few issues previously days if you noticed..at the beginning, I run kubevirtv1.2.0 on k8s 1.25.16, (same issue with kubevirtv1.1.1 on k8s 1.28.9 tested now) but you said k8s 1.25.16 is out of support, so I have to upgraded k8s 1.25.16 to 1.28.9, then same issue. Now I guess it is not k8s issue.

No, it is cri/runtime issue, in your case the Docker.

In summary. kubevirtv1.1.1 is working on k8s 1.25.16 while kubevirtv1.2.0 not working on from k8s 1.25.16 + version (I tested several k8s versions)

Below is other environment info: debian version:12.5 docker version:24.0.7 (when upgraded to k8s 1.28.9, I also try to upgrade docker version to 25.0.5, but no use) cri-dockerd version: cri-dockerd 0.3.12 (when upgraded to k8s 1.28.9, I also try to upgrade cri-dockerd version to cri-dockerd 0.3.13, but no use)

What is the docker and cri-dockerd version that is working? Would you be able to downgrade them on the new Kubernetes version?

Again, in k8s 1.25.16, kubevirtv1.1.1 works while kubevirtv1.2.0 not work(unless configure - Root) other things are same...

I can suggest following step to figure out if this is docker or cri issue:
Run the virt-launcher image on the host directly with the docker and check if the capability is there. Also don't forget to overwrite the entrypoint.

@wingying
Copy link
Author

What is the docker and cri-dockerd version that is working? Would you be able to downgrade them on the new Kubernetes version?

Actually, I first upgraded new k8s version, while NO docker and cri-dockerd upgrade. But not work.

@xpivarc
Copy link
Member

xpivarc commented Apr 26, 2024

Both 1.1.0 and 1.2.0 images contain the capability, so from Kubevirt side we can't do more. I suggest to try what I described in the last comment.

@wingying
Copy link
Author

wingying commented Apr 26, 2024

Both 1.1.0 and 1.2.0 images contain the capability, so from Kubevirt side we can't do more. I suggest to try what I described in the last comment.

@xpivarc

Yes, I followed your suggestion, make a customized entrypoint, Dockerfile, rebuild both v1.2.0 and v1.1.1 image, run in same server with same docker version, magic thing happens below: v1.1.1 still work!! on pure docker container, while v1.2.0 does not work!!
image
looks like we are closed to the truth.
The entrypoint.sh is quite easy.

#!/bin/bash

sleep 5 & echo "1" &
pid=$!

wait $pid

echo "Main command exited with code $exit_code"

while true; do
  sleep 1
done

Dockerfile is simple enough as well.

FROM quay.io/kubevirt/virt-launcher:v1.2.0

COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]
FROM quay.io/kubevirt/virt-launcher:v1.1.1

COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

So next?

@wingying
Copy link
Author

wingying commented Apr 27, 2024

@xpivarc
ok, seems containerd version issue, I upgrade it to containerd containerd.io 1.6.31 e377cd56a71523140ca6ae87e30244719194a521, clean old virt-launcher overlays path and data. just rebuild customized virt-launcher. now getcap works.

After that, I tried to re-install kubevirt again, this time virt-handler pod does not start lol...., and STATUS always CreateContainerError.

Below is related error message:
image

image

@wingying
Copy link
Author

It works finally! after upgrade to k8s v1.26.5, I will record all related components versions:
docker:
image
contained:
image
cri-dockerd:
image
kubevirt: v1.2.0
debian version: Linux cdp 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
k8s: v1.26.5 (I believe it will work upgrade to v1.26.5+)

anything I missed?

@xpivarc feel free give your comment.

@xpivarc
Copy link
Member

xpivarc commented Apr 29, 2024

Great that it works, correct me if I misunderstood. I think it can be beneficial for others if you record how did you find that containerd is the issue, maybe bug link? It is still weird that one version did work and the other not but I guess it was bug in contianerd.

@wingying
Copy link
Author

wingying commented Apr 29, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants