Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The restored container may be killed together with the privious one #9691

Open
loheagn opened this issue Jan 25, 2024 · 5 comments
Open

The restored container may be killed together with the privious one #9691

loheagn opened this issue Jan 25, 2024 · 5 comments
Labels

Comments

@loheagn
Copy link

loheagn commented Jan 25, 2024

Description

Let's say we have a running container whose id is nginx-1. We checkpoint this container nginx-1 and get a snapshot image whose name is checkpoint-nginx-1. Then we restore from this checkpoint and get a running container whose id is nginx-2.

Now, if we kill the nginx-1, the processes of nginx-2 will also exit.

Steps to reproduce the issue

  1. ctr run -d docker.io/library/nginx:1.19 nginx-1
  2. ctr c checkpoint --task nginx-1 checkpoint-nginx-1
  3. ctr c restore --live nginx-2 checkpoint-nginx-1

By now, we can see that nginx-2 is running

55483     1 55483  2472 /usr/local/bin/containerd-shim-runc-v2 -namespace default -id nginx-1 -address /run/containerd/containerd.sock
55504 55483 55504 55504  \_ nginx: master process nginx -g daemon off;
55567 55504 55504 55504      \_ nginx: worker process
55801     1 55801  2472 /usr/local/bin/containerd-shim-runc-v2 -namespace default -id nginx-2 -address /run/containerd/containerd.sock
55819 55801 55819 55819  \_ nginx: master process nginx -g daemon off;
55843 55819 55819 55819      \_ nginx: worker process
  1. But when I try to force remove the nginx-1 using ctr t rm -f nginx-1, the processes of nginx-2 will exit too.
55801     1 55801  2472 /usr/local/bin/containerd-shim-runc-v2 -namespace default -id nginx-2 -address /run/containerd/containerd.sock

Describe the results you received and expected

I am not sure whether this is a bug or a feature as when I use ctr t kill nginx-1 (not remove forcefully) everything works well.

What version of containerd are you using?

containerd github.com/containerd/containerd v1.7.11 64b8a81

Any other relevant information

runc version:

runc version 1.1.11
commit: v1.1.11-0-g4bccb38c
spec: 1.0.2-dev
go: go1.21.6
libseccomp: 2.5.4

criu version:

Version: 3.17.1

uname -a

Linux lima-ccr-dev 6.5.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Tue Nov 14 15:13:47 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

Show configuration if it is related to CRI plugin.

No response

@loheagn
Copy link
Author

loheagn commented Jan 25, 2024

I think the likely cause of this bug is that the restored container shares a cgroup with the previous container. I found that the problem does not occur when I move the processes of the restored container out of the previous cgroup.

I think we should perhaps provide an option in the checkpoint and restore commands to let users decide whether CRIU should dump and restore cgroups information.

I can help to submit a PR to fix this if necessary.

@kushalShukla-web
Copy link

hey @loheagn please assign this issue to me , i can do this !!

@loheagn
Copy link
Author

loheagn commented Jan 29, 2024

hey @loheagn please assign this issue to me , i can do this !!

@kushalShukla-web , thanks for the reply! But I don't think we have find the right way to solve the problem (if it is a bug) and to enhance the checkpoint feature. We may wait for the reply from the maintainers.

@loheagn
Copy link
Author

loheagn commented Jan 29, 2024

The solution may depdend the --manage-cgroups-mode arguement of runc. But there may be a bug that we cannot set --manage-cgroups-mode to ignore when execute runc checkpoint and runc restore. Please refer to opencontainers/runc#4178

@kushalShukla-web
Copy link

sure we can wait for maintainers . :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants