Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using nerdctl with rootless k3s #2831

Open
hinshun opened this issue Feb 16, 2024 · 11 comments
Open

Using nerdctl with rootless k3s #2831

hinshun opened this issue Feb 16, 2024 · 11 comments
Labels
area/rootless Rootless mode status/needs-more-information Needs more information from OP

Comments

@hinshun
Copy link

hinshun commented Feb 16, 2024

Description

Ideally I'd like to have a single rootless stack of k3s + containerd + image builder (e.g. buildkitd). We want to use nerdctl with the rootless k3s embedded containerd.

With the following upstream contributions:

We can now set the following to point nerdctl to the rootless k3s containerd:

export ROOTLESSKIT_STATE_DIR="$HOME/.rancher/k3s/rootless"
export CONTAINERD_ADDRESS="$XDG_RUNTIME_DIR/k3s/containerd/containerd.sock"
export CONTAINERD_NAMESPACE="k8s.io"

We can use several commands like nerdctl image ls successfully, but when attempting to run a container, it fails to readlink /proc/self/exe:

time="2024-02-16T07:32:44Z" level=debug msg="stateDir: /home/rootless/.rancher/k3s/rootless"
time="2024-02-16T07:32:44Z" level=debug msg="rootless parent main: executing \"/run/current-system/sw/bin/nsenter\" with [-r/ -w/home/rootless --preserve-credentials -m -n -U -t 1035 -F /run/current-system/sw/bin/nerdctl --debug-full run ghcr.io/pdtpartners/hello]"
time="2024-02-16T07:32:44Z" level=warning msg="cannot call os.Executable(), assuming the executable to be \"/run/current-system/sw/bin/nerdctl\"" error="readlink /proc/self/exe: no such file or directory"
time="2024-02-16T07:32:44Z" level=debug msg="verifying process skipped"
time="2024-02-16T07:32:46Z" level=debug msg="Failed to unmount snapshot \"/tmp/initialC3943988990\""
time="2024-02-16T07:32:46Z" level=fatal msg="readlink /proc/self/exe: no such file or directory"

I tried getting around it by patching nerdctl:

diff --git a/pkg/cmd/container/create.go b/pkg/cmd/container/create.go
index ca40bbe4..d205be26 100644
--- a/pkg/cmd/container/create.go
+++ b/pkg/cmd/container/create.go
@@ -406,7 +406,8 @@ func withBindMountHostIPC(_ context.Context, _ oci.Client, _ *containers.Contain
 func GenerateLogURI(dataStore string) (*url.URL, error) {
 	selfExe, err := os.Executable()
 	if err != nil {
-		return nil, err
+		log.L.WithError(err).Warnf("cannot call os.Executable(), assuming the executable to be %q", os.Args[0])
+		selfExe = os.Args[0]
 	}
 	args := map[string]string{
 		logging.MagicArgv1: dataStore,

It gets a little further but still have trouble with /proc/self/fd:

time="2024-02-16T07:39:21Z" level=debug msg="stateDir: /home/rootless/.rancher/k3s/rootless"
time="2024-02-16T07:39:21Z" level=debug msg="rootless parent main: executing \"/run/current-system/sw/bin/nsenter\" with [-r/ -w/home/rootless --preserve-credenn
tials -m -n -U -t 1019 -F /run/current-system/sw/bin/nerdctl --debug-full run ghcr.io/pdtpartners/hello]"
time="2024-02-16T07:39:21Z" level=warning msg="cannot call os.Executable(), assuming the executable to be \"/run/current-system/sw/bin/nerdctl\"" error="readlink /proc/self/exe: no such file or directory"
time="2024-02-16T07:39:21Z" level=debug msg="verifying process skipped"
time="2024-02-16T07:39:24Z" level=debug msg="Failed to unmount snapshot \"/tmp/initialC3550973183\""
time="2024-02-16T07:39:24Z" level=warning msg="cannot call os.Executable(), assuming the executable to be \"/run/current-system/sw/bin/nerdctl\"" error="readlinn
k /proc/self/exe: no such file or directory"
time="2024-02-16T07:39:24Z" level=debug msg="generated log driver: binary:///run/current-system/sw/bin/nerdctl?_NERDCTL_INTERNAL_LOGGING=%2Fvar%2Flib%2Fnerdctl%%
2F4a156993"
time="2024-02-16T07:39:24Z" level=debug msg="remote introspection plugin filters" filters="[type==io.containerd.snapshotter.v1, id==nix]"
time="2024-02-16T07:39:24Z" level=fatal msg="failed to open stdout fifo: couldn't stat /proc/self/fd/7: stat /proc/self/fd/7: no such file or directory"

I'm speculating the root cause is because rootless k3s sets up a PIDNS (See: https://github.com/k3s-io/k3s/blob/v1.29.1%2Bk3s2/pkg/rootless/rootless.go#L144)? Although it is required for cgroupv2 evacuation.

Do you have any ideas? cc @AkihiroSuda

Describe the results you received and expected

Possible to run containers using nerdctl with rootless k3s containerd

What version of nerdctl are you using?

v1.7.0

@hinshun hinshun added the kind/unconfirmed-bug-claim Unconfirmed bug claim label Feb 16, 2024
@fahedouch
Copy link
Member

fahedouch commented Feb 16, 2024

nerdctl should not (does not) enter the PIDNS setup by rootless k3s. Do you change the PID namespace of nerdctl at any point?

@fahedouch fahedouch added status/needs-more-information Needs more information from OP area/rootless Rootless mode and removed kind/unconfirmed-bug-claim Unconfirmed bug claim labels Feb 16, 2024
@hinshun
Copy link
Author

hinshun commented Feb 16, 2024

nerdctl should not (does not) enter the PIDNS setup by rootless k3s. Do you change the PID namespace of nerdctl at any point?

I did not. This is failing with stock v1.7.0, my patch above was just for investigation. I see what you mean though, you’re saying for the rootless child /proc/self/exe should be available since it didn’t enter the PIDNS?

To be honest, I’m unfamiliar with what conditions where readlink /proc/self/exe could fail. I will provide a docker run environment for reproducing this.

@fahedouch
Copy link
Member

you’re saying for the rootless child /proc/self/exe should be available since it didn’t enter the PIDNS?

nope, I am saying that nerdctl do not enter the PIDNS. But the rootless child is entering the PIDNS. Here, it is a nerdctl issue so this may not related to PIDNS

@hinshun
Copy link
Author

hinshun commented Feb 18, 2024

nope, I am saying that nerdctl do not enter the PIDNS. But the rootless child is entering the PIDNS. Here, it is a nerdctl issue so this may not related to PIDNS

I meant the rootless child of nerdctl, so I think we're saying the same thing!

I built & pushed a docker image to ghcr.io/pdtpartners/nix-snapshotter that reproduces the issue. The entrypoint of the image launches a non-gui QEMU VM with NixOS with rootless k3s in a systemd user service:

docker run --rm -it ghcr.io/pdtpartners/nix-snapshotter:rootless

nixos login: rootless # (Ctrl-a then x to quit)
Password: rootless

[rootless@nixos:~]$ nerdctl run --debug-full hello-world
DEBU[0000] stateDir: /home/rootless/.rancher/k3s/rootless
DEBU[0000] rootless parent main: executing "/run/current-system/sw/
WARN[0000] cannot call os.Executable(), assuming the executable to "
DEBU[0000] verifying process skipped
FATA[0000] readlink /proc/self/exe: no such file or directory

Would appreciate some help, so I've provided a cheat sheet:

$ echo $ROOTLESSKIT_STATE_DIR
/home/rootless/.rancher/k3s/rootless

$ echo $CONTAINERD_ADDRESS
/run/user/1000/k3s/containerd/containerd.sock

$ echo $CONTAINERD_NAMESPACE
k8s.io

# Show the rootless k3s systemd user service
$ systemctl --user status k3s

# Enter namespaces setup by k3s's rootlesskit
# Options here matches nerdctl.
$ nsenter -r/ --preserve-credentials -m -n -U -F -t $(cat $ROOTLESSKIT_STATE_DIR/child_pid)

# Note that commands that don't require nerdctl to nsenter works fine
# Just need to wait until k3s is healthy
$ kubectl get nodes
NAME    STATUS   ROLES                  AGE     VERSION
nixos   Ready    control-plane,master   9m58s   v1.27.9+k3s1

$ nerdctl image ls
REPOSITORY                          TAG                     IMAGE ID        CREATED               PLATFORM       SIZE         BLOB SIZE
rancher/klipper-helm                v0.8.2-build20230815    b0b0c4f73f23    10 minutes ago        linux/amd64    244.7 MiB    86.7 MiB
# ...

# k3s's embedded containerd state dir:
$ ls ~/.rancher/k3s/agent/containerd/

# Look at containerd logs
cat ~/.rancher/k3s/agent/containerd/containerd.log

# User `rootless` is a sudoer inside this QEMU VM in case you need it
$ sudo su
Password: rootless

@AkihiroSuda
Copy link
Member

"cgroup v2 evacuation" is quite complex, maybe k3s should just depend on k3d with rootless (Docker|Podman|nerdctl) to reimplement the rootless mode as in Usernetes Gen2

image
https://github.com/AkihiroSuda/AkihiroSuda/blob/master/slides/2024/20240201%20%5BHPC%20Containers%5D%20Rootless%20Containers.pdf

@hinshun
Copy link
Author

hinshun commented Feb 18, 2024

Can you elaborate why “cgroup v2 evacuation” might be related to this readlink /proc/self/exe issue?

@AkihiroSuda
Copy link
Member

Can you elaborate why “cgroup v2 evacuation” might be related to this readlink /proc/self/exe issue?

This doesn't seem directly related to cgroup per se, but as you mentioned in the OP this incurs unsharing PIDNS and mounting a new procfs, which seems related to /proc/self/exe errors

@hinshun
Copy link
Author

hinshun commented Feb 18, 2024

Would it make sense if rootless k3s had an option to run without “cgroup v2 evacuation” & PIDNS? I’m not sure what it does, so it’s unclear to me whether that’s reasonable or not.

@AkihiroSuda
Copy link
Member

Would it make sense if rootless k3s had an option to run without “cgroup v2 evacuation” & PIDNS? I’m not sure what it does, so it’s unclear to me whether that’s reasonable or not.

No, Kubernetes pods will not start then due to lack of access to cgroup

@fahedouch
Copy link
Member

Can you elaborate why “cgroup v2 evacuation” might be related to this readlink /proc/self/exe issue?

This doesn't seem directly related to cgroup per se, but as you mentioned in the OP this incurs unsharing PIDNS and mounting a new procfs, which seems related to /proc/self/exe errors

@AkihiroSuda I really don't understand how the unsharing a new PID ns should impact nerdctl here ? nerdctl do not change its PID ns

@hinshun
Copy link
Author

hinshun commented Mar 24, 2024

@AkihiroSuda @fahedouch Anything else I can help provide? I don't think this is isolated to nix-snapshotter but just nerdctl <-> rootless k3s altogether. Would love to have full docker-UX experience with rootless mode containerd & Kubernetes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rootless Rootless mode status/needs-more-information Needs more information from OP
Projects
None yet
Development

No branches or pull requests

3 participants