Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor containerdisks by accessing the artifacts via proc and pid namespace sharing #11845

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

alicefr
Copy link
Member

@alicefr alicefr commented May 3, 2024

What this PR does

Before this PR:
Currently, containerdisks artifacts are located in a separate container respect to the compute container, and in order to make them available to virt-launcher, virt-handler needs to bind mount them in the container filesystem.

This mechanism is highly complex, requires the coordination between handler and launcher and can possibly creates clean-up issues if the handler doesn't correctly unmount the artifacts at container deletion

After this PR:
This PR introduces a new mechanism for accessing containerdisk. Since the containerdisk container is deployed in the same pod as virt-laucher, it is possible to allow the PID namespace sharing between the containers in the same pod. In this way, virt-launcher can access the containerdisk artifacts via the proc filesystem, e.g /proc/<pid>/root/<path-to-artifacts>

The coordination mechanism between launcher and handler is no longer needed. Therefore, we can further simplify the containerdisk binary. Additionally, we completely remove the C program and have replaced it with a simpler Go program which simply created a pidfile.

In order to avoid any conflicts with upgrades and migrations, the new setup create a symlink between the location of the artifacts via proc and the old location.

This mechanism has been initially proposed by @rmohr in #10291

Fixes #

Release note

Access container disks via proc filesystem and the pid namespace sharing

Create the new version v3alpha for container disks. Container disks
won't be mounted anymore by virt-handler, but instead the VM image or
the kernel artifacts will be accessed through the sharing of the pid
namespace.

In this way, the code can be simplified and the container disk only needs
to write itsown pid in a given directory.

Additionally, the new binary is still statically compiled, but written
in Go instead of C. Therefore, we can drop the C toolchain and setup.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Add functionality for accessing VM images and kernel artifacts via the
proc filesystem.

The containerdisk container will use the same pid namespace as the compute
 container. QEMU can access the disk via the path: /proc/<pid>/root/<path>.

To maintain compatibility with older container disk versions and prevent
issues with upgrades and migrations between KubeVirt versions, this
package provides a symlink between the old expected location and the
image path via the proc filesystem.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Add functions for render the virt-launcher pod with the containerdisk
access via proc filesystem and pid namespace sharing.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Add support for the new behavior how to access containerdisk artifacts
via proc filesystem and the pid namespace sharing.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Add new behavior and handling for containerdisk. In the previous
version, containerdisk artifacts were mounted by virt-handler and
virt-launcher needed to synchronize before being able to access them.

With the new support, the coordination mechanism has been completly drop
and virt-launcher needs simply to create the overlay images and the
symlinks for the expected locations.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Drop the outdated containerdisk package from virt-handler.
The laucher can directly access the container drive, eliminating the
requirement for the handler to mount the artifacts.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Drop the container-disk binary copy since the handler and launcher no
longer need to cooperate to mount container disk artifacts.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
With the new support, containerdisk artifacts are expected in a constant
location. This simplifies the configuration and avoids any confusion of
the path where the containerdisks artifacts are located.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels May 3, 2024
@alicefr alicefr marked this pull request as draft May 3, 2024 13:04
@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL labels May 3, 2024
@kubevirt-bot kubevirt-bot added kind/build-change Categorizes PRs as related to changing build files of virt-* components sig/buildsystem Denotes an issue or PR that relates to changes in the build system. sig/compute labels May 3, 2024
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from alicefr. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alicefr
Copy link
Member Author

alicefr commented May 3, 2024

/test pull-kubevirt-e2e-k8s-1.30-sig-operator

@alicefr alicefr changed the title Refacto containerdisks by accessing the artifact via proc and pid namespace sharing Refactor containerdisks by accessing the artifact via proc and pid namespace sharing May 3, 2024
@alicefr alicefr changed the title Refactor containerdisks by accessing the artifact via proc and pid namespace sharing Refactor containerdisks by accessing the artifacts via proc and pid namespace sharing May 3, 2024
Copy link
Member

@rmohr rmohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Alice! This is exactly how I was hoping that it would play out!

This mechanism is highly complex, requires the coordination between handler and launcher and can possibly creates clean-up issues if the handler doesn't correctly unmount the artifacts at container deletion

Also note that it removes all ways to to escape the container if we have a bug in the context of containerdisks (we had the one CVE which allowed that)

go_library(
name = "go_default_library",
srcs = ["container-disk.go"],
importpath = "kubevirt.io/kubevirt/cmd/container-disk-v3alpha",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the final change which allows us to drop the alpha.

@kubevirt-bot
Copy link
Contributor

@alicefr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-unit-test 8e62784 link true /test pull-kubevirt-unit-test
pull-kubevirt-unit-test-arm64 8e62784 link false /test pull-kubevirt-unit-test-arm64
pull-kubevirt-build 8e62784 link true /test pull-kubevirt-build
pull-kubevirt-code-lint 8e62784 link true /test pull-kubevirt-code-lint
pull-kubevirt-goveralls 8e62784 link false /test pull-kubevirt-goveralls
pull-kubevirt-build-arm64 8e62784 link true /test pull-kubevirt-build-arm64
pull-kubevirt-generate 8e62784 link true /test pull-kubevirt-generate
pull-kubevirt-e2e-k8s-1.30-sig-compute-serial 8e62784 link true /test pull-kubevirt-e2e-k8s-1.30-sig-compute-serial
pull-kubevirt-e2e-k8s-1.30-sig-operator 8e62784 link true /test pull-kubevirt-e2e-k8s-1.30-sig-operator
pull-kubevirt-conformance-arm64 8e62784 link false /test pull-kubevirt-conformance-arm64
pull-kubevirt-e2e-k8s-1.30-sig-compute-migrations 8e62784 link true /test pull-kubevirt-e2e-k8s-1.30-sig-compute-migrations
pull-kubevirt-e2e-arm64 8e62784 link false /test pull-kubevirt-e2e-arm64
pull-kubevirt-e2e-k8s-1.30-sig-storage 8e62784 link true /test pull-kubevirt-e2e-k8s-1.30-sig-storage
pull-kubevirt-e2e-k8s-1.30-sig-compute 8e62784 link true /test pull-kubevirt-e2e-k8s-1.30-sig-compute
pull-kubevirt-e2e-k8s-1.30-sig-network 8e62784 link true /test pull-kubevirt-e2e-k8s-1.30-sig-network

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fabiand
Copy link
Member

fabiand commented May 5, 2024

@alicefr this is great work!

Copy link
Member

@0xFelix 0xFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great initiative!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file could really use a cleanup, in another PR though.

)

func main() {
noOp := flag.Bool("no-op", false, "program exits immediatly")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
noOp := flag.Bool("no-op", false, "program exits immediatly")
noOp := flag.Bool("no-op", false, "program exits immediately")

return "", fmt.Errorf("more than one file found in folder %s, only one disk is allowed", path)
}

return filepath.Join(fullPath, files[0].Name()), nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are multiple files in a containerdisk? Is that a supportable use case? IIRC in CDI you can also choose which disk to import?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the old code only supported a single image per containerdisk. I haven't changed that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep these tests in another file? container-disk_test.go?

@@ -0,0 +1,27 @@
package main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall the other program was written in C to reduce memory consumption. What will memory consumption be now?

See #2844

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It may be better if we now import less packages, but probably still requires a c application

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I didn't know that. I'll try to evaluate it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a C binary unfortunately, the size of the binary is still large, ~ 24M

@alicefr
Copy link
Member Author

alicefr commented May 7, 2024

As discussed with Roman offline, I'll probably need also to reintroduce the unmount logic. If we have old virt-launcher we still want to clean-up their bindmount. I'll do in the next round

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 21, 2024
@kubevirt-bot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/build-change Categorizes PRs as related to changing build files of virt-* components needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/buildsystem Denotes an issue or PR that relates to changes in the build system. sig/compute size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants