New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/1.5] oci: use readonly mount to read user/group info #6503
Conversation
In linux kernel, the umount writable-mountpoint will try to do sync-fs to make sure that the dirty pages to the underlying filesystems. The many number of umount actions in the same time maybe introduce performance issue in IOPS limited disk. When CRI-plugin creates container, it will temp-mount rootfs to read that UID/GID info for entrypoint. Basically, the rootfs is writable snapshotter and then after read, umount will invoke sync-fs action. For example, using overlayfs on ext4 and use bcc-tools to monitor ext4_sync_fs call. ``` // uname -a Linux chaofan 5.13.0-27-generic containerd#29~20.04.1-Ubuntu SMP Fri Jan 14 00:32:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux // open terminal 1 kubectl run --image=nginx --image-pull-policy=IfNotPresent nginx-pod // open terminal 2 /usr/share/bcc/tools/stackcount ext4_sync_fs -i 1 -v -P ext4_sync_fs sync_filesystem ovl_sync_fs __sync_filesystem sync_filesystem generic_shutdown_super kill_anon_super deactivate_locked_super deactivate_super cleanup_mnt __cleanup_mnt task_work_run exit_to_user_mode_prepare syscall_exit_to_user_mode do_syscall_64 entry_SYSCALL_64_after_hwframe syscall.Syscall.abi0 github.com/containerd/containerd/mount.unmount github.com/containerd/containerd/mount.UnmountAll github.com/containerd/containerd/mount.WithTempMount.func2 github.com/containerd/containerd/mount.WithTempMount github.com/containerd/containerd/oci.WithUserID.func1 github.com/containerd/containerd/oci.WithUser.func1 github.com/containerd/containerd/oci.ApplyOpts github.com/containerd/containerd.WithSpec.func1 github.com/containerd/containerd.(*Client).NewContainer github.com/containerd/containerd/pkg/cri/server.(*criService).CreateContainer github.com/containerd/containerd/pkg/cri/server.(*instrumentedService).CreateContainer k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler.func1 github.com/containerd/containerd/services/server.unaryNamespaceInterceptor github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1 github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1 go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1 github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1 k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler google.golang.org/grpc.(*Server).processUnaryRPC google.golang.org/grpc.(*Server).handleStream google.golang.org/grpc.(*Server).serveStreams.func1.2 runtime.goexit.abi0 containerd [34771] 1 ``` If there are comming several create requestes, umount actions might bring high IO pressure on the /var/lib/containerd's underlying disk. After checkout the kernel code[1], the kernel will not call __sync_filesystem if the mount is readonly. Based on this, containerd should use readonly mount to get UID/GID information. Reference: * [1] https://elixir.bootlin.com/linux/v5.13/source/fs/sync.c#L61 Closes: containerd#4604 Signed-off-by: Wei Fu <fuweid89@gmail.com> (cherry picked from commit 813a061) Signed-off-by: Qiutong Song <qiutongs@google.com>
Hi @qiutongs. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc fuweid dmcgowan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Backport of #6478 to 1.5