-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor containerdisks by accessing the artifacts via proc and pid namespace sharing #11845
base: main
Are you sure you want to change the base?
Conversation
Create the new version v3alpha for container disks. Container disks won't be mounted anymore by virt-handler, but instead the VM image or the kernel artifacts will be accessed through the sharing of the pid namespace. In this way, the code can be simplified and the container disk only needs to write itsown pid in a given directory. Additionally, the new binary is still statically compiled, but written in Go instead of C. Therefore, we can drop the C toolchain and setup. Signed-off-by: Alice Frosi <afrosi@redhat.com>
Add functionality for accessing VM images and kernel artifacts via the proc filesystem. The containerdisk container will use the same pid namespace as the compute container. QEMU can access the disk via the path: /proc/<pid>/root/<path>. To maintain compatibility with older container disk versions and prevent issues with upgrades and migrations between KubeVirt versions, this package provides a symlink between the old expected location and the image path via the proc filesystem. Signed-off-by: Alice Frosi <afrosi@redhat.com>
Add functions for render the virt-launcher pod with the containerdisk access via proc filesystem and pid namespace sharing. Signed-off-by: Alice Frosi <afrosi@redhat.com>
Add support for the new behavior how to access containerdisk artifacts via proc filesystem and the pid namespace sharing. Signed-off-by: Alice Frosi <afrosi@redhat.com>
Add new behavior and handling for containerdisk. In the previous version, containerdisk artifacts were mounted by virt-handler and virt-launcher needed to synchronize before being able to access them. With the new support, the coordination mechanism has been completly drop and virt-launcher needs simply to create the overlay images and the symlinks for the expected locations. Signed-off-by: Alice Frosi <afrosi@redhat.com>
Drop the outdated containerdisk package from virt-handler. The laucher can directly access the container drive, eliminating the requirement for the handler to mount the artifacts. Signed-off-by: Alice Frosi <afrosi@redhat.com>
Drop the container-disk binary copy since the handler and launcher no longer need to cooperate to mount container disk artifacts. Signed-off-by: Alice Frosi <afrosi@redhat.com>
With the new support, containerdisk artifacts are expected in a constant location. This simplifies the configuration and avoids any confusion of the path where the containerdisks artifacts are located. Signed-off-by: Alice Frosi <afrosi@redhat.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubevirt-e2e-k8s-1.30-sig-operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Alice! This is exactly how I was hoping that it would play out!
This mechanism is highly complex, requires the coordination between handler and launcher and can possibly creates clean-up issues if the handler doesn't correctly unmount the artifacts at container deletion
Also note that it removes all ways to to escape the container if we have a bug in the context of containerdisks (we had the one CVE which allowed that)
go_library( | ||
name = "go_default_library", | ||
srcs = ["container-disk.go"], | ||
importpath = "kubevirt.io/kubevirt/cmd/container-disk-v3alpha", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the final change which allows us to drop the alpha.
@alicefr: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@alicefr this is great work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great initiative!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file could really use a cleanup, in another PR though.
) | ||
|
||
func main() { | ||
noOp := flag.Bool("no-op", false, "program exits immediatly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noOp := flag.Bool("no-op", false, "program exits immediatly") | |
noOp := flag.Bool("no-op", false, "program exits immediately") |
return "", fmt.Errorf("more than one file found in folder %s, only one disk is allowed", path) | ||
} | ||
|
||
return filepath.Join(fullPath, files[0].Name()), nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if there are multiple files in a containerdisk? Is that a supportable use case? IIRC in CDI you can also choose which disk to import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the old code only supported a single image per containerdisk. I haven't changed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep these tests in another file? container-disk_test.go
?
@@ -0,0 +1,27 @@ | |||
package main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall the other program was written in C to reduce memory consumption. What will memory consumption be now?
See #2844
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It may be better if we now import less packages, but probably still requires a c application
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I didn't know that. I'll try to evaluate it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a C binary unfortunately, the size of the binary is still large, ~ 24M
As discussed with Roman offline, I'll probably need also to reintroduce the unmount logic. If we have old virt-launcher we still want to clean-up their bindmount. I'll do in the next round |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does
Before this PR:
Currently, containerdisks artifacts are located in a separate container respect to the
compute
container, and in order to make them available to virt-launcher, virt-handler needs to bind mount them in the container filesystem.This mechanism is highly complex, requires the coordination between handler and launcher and can possibly creates clean-up issues if the handler doesn't correctly unmount the artifacts at container deletion
After this PR:
This PR introduces a new mechanism for accessing containerdisk. Since the containerdisk container is deployed in the same pod as virt-laucher, it is possible to allow the PID namespace sharing between the containers in the same pod. In this way, virt-launcher can access the containerdisk artifacts via the proc filesystem, e.g
/proc/<pid>/root/<path-to-artifacts>
The coordination mechanism between launcher and handler is no longer needed. Therefore, we can further simplify the
containerdisk
binary. Additionally, we completely remove the C program and have replaced it with a simpler Go program which simply created a pidfile.In order to avoid any conflicts with upgrades and migrations, the new setup create a symlink between the location of the artifacts via proc and the old location.
This mechanism has been initially proposed by @rmohr in #10291
Fixes #
Release note