-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman checkpoint fails #11001
Comments
@adrianreber PTAL |
@vikas-goel Can you upload the dump.log file mentioned in the error message? |
Already attached in the result section. |
Ah, haven't seen it. Thanks. So CRIU says:
Never seen that one before. How did you create the container? I would be interested to be able to reproduce this. |
The container was created using docker. |
I meant how did you start the container. Can you share the |
And maybe also the Dockerfile? |
Ah, I think I was able to reproduce it:
Did you use Looking at the timezone mount I see in the container:
But I do not see that mount mentioned in the config.json of the container. For CRIU to be able to checkpoint external mounts they need to be listed in config.json. I see, however, following line in config.json:
@rhatdan do you have insights how timezone mounts work. Why aren't they listed in config.json? |
I started the container using docker-compose. Not sure if docker-compose is playing a role there as I didn't specify time zone info in the compose file. However, I do see the file mounted inside the container, as you mentioned. I did specify bind mount
Attaching the |
We handle |
I do not see the timezone mounts in |
Apart from time zone issue, there seems to be another one. I disabled the /etc/localtime bind mount and tried checkpoint again. It failed in a different code path.
|
You should open an issue at CRIU about this as this is not really a podman problem and chances are much higher someone knows why it fails if you directly take it to CRIU. |
Localtime should definitely be a bind mount - the same code that manages resolv.conf and hosts was extended to manage it as well. Very surprised it's not in the OCI spec JSON |
I can do that. I see some threads on same issue. One of them points to root cause that overay FS does not implement inotify interfaces. Do you happen to know if that is still the case? |
So if a timezone is specified there is an entry for
But I do not see
The question is, why is CRIU cannot checkpoint a mount if it does not know how to restore it. If |
I am not aware that it has been resolved. It seems your container cannot be checkpointed. But important to know, your application is probably not working correctly in the container anyway if it relies on inotify which does not work on overlayfs. You could try the graphdriver vfs to see if the checkpointing works then. |
Hi, all!
First thing I'd like to mention is that this error does not say anything about overlayfs. This error can be about any other virtual FS which does not support resolving file handles. You should check on your node which device is 0x48(72) to be sure... JFYI you should look for device id in format " 0:72 " in /proc/self/mountinfo just after you got an error. Second thing is that overlayfs supports simple inotifys fine, so your app is not affected. But to dump inotifys overlayfs should provide valid fhandle for it in proc, and this can only happen if overlayfs has:
You should enable those options on overlayfs (likely via podman) to be able to c/r inotifies on it. |
Thanks for your comments, @Snorch . I didn't find any device in format "0:72".
Is this concerning? |
I turned on overlay module's index and nfs_export parameters and tried again as suggested by @Snorch . This time I received a different error.
The complete dump.log is dump.log |
In order to bypass the socket error as seen in the previous comment, I tried taking checkpoint with memory only. Received another error.
|
A friendly reminder that this issue had no activity for 30 days. |
@vikas-goel Is this still an issue? @adrianreber Thoughts? |
Yes, there are three issues I reported here. None of them seem to be resolved.
|
No one really answered why the timezone mount exists in the container without the corresponding entry in config.json. If there is a mount in a container and it is not listed in config.json checkpoint/restore will not work.
This is probably a shortcoming of CRIU and it seems like you have an application which cannot be checkpointed. Can you try it with a simpler application to verify that it works at all. |
The errors I am encountering look generic in nature that any application may have as requirements. I understand a simpler (bare minimum) application may not face the issues. But then the value of container checkpoint feature is defeated. |
@adrianreber I ran locally and confirmed that there is a bind mount for |
If I do:
I get following
All files are mentioned in |
From a container I just created using
|
I can verify that the results are effectively identical when I use It looks like this is a symlink in the container to @giuseppe This sounds like it could be a bug in |
I don't think it is possible to create a bind mount on a symlink. If you use a symlink as target for mount, the kernel will resolve it. The runtime resolves the symlink to avoid it can point to paths outside the rootfs. crun uses |
If I understand the above correctly, that's what happens: Container runtime creates external bind-mount for a container and it's target path is a symlink and external bind-mount is created in the symlink resolved path instead which is different to target path. We actually have a problem here, because current CRIU mount engine can't understand that this mount with different mountpoint is external (because we tell criu about external mounts by mountpoint). Same happens if the container user would move external bind-mount to different mountpoint (or bind to other mountpoint and umount target mountpoint) inside container. I've already stated this problem here checkpoint-restore/criu#1396 and proposed a possible solution. Not sure if it will be done soon though, I would rather do it after switch to mount-v2. |
A friendly reminder that this issue had no activity for 30 days. |
@Snorch Anything happening with this? |
A friendly reminder that this issue had no activity for 30 days. |
No movement on this in the last month. I am not sure what has to change in Podman, or if this is just issues with CRIU. |
I also think there is nothing Podman can do here. The information in |
A friendly reminder that this issue had no activity for 30 days. |
Since we have not heard further and we don't believe Podman is doing anything wrong, closing. |
Please run podman with "sudo podman system service --time=0 unix:///tmp/podman.sock"
|
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
podman checkpoint operation fails
Steps to reproduce the issue:
podman container checkpoint --leave-running --tcp-established tme-mas-01
Describe the results you received:
dump.log
Describe the results you expected:
Should have created a checkpoint
Additional information you deem important (e.g. issue happens only occasionally):
Consistently
Output of
podman version
:Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)
Yes
Additional environment details (AWS, VirtualBox, physical, etc.):
VMware VM
The text was updated successfully, but these errors were encountered: