In linux kernel, the umount writable-mountpoint will try to do sync-fs
to make sure that the dirty pages to the underlying filesystems. The many
number of umount actions in the same time maybe introduce performance
issue in IOPS limited disk.
When CRI-plugin creates container, it will temp-mount rootfs to read
that UID/GID info for entrypoint. Basically, the rootfs is writable
snapshotter and then after read, umount will invoke sync-fs action.
For example, using overlayfs on ext4 and use bcc-tools to monitor
ext4_sync_fs call.
```
// uname -a
Linux chaofan 5.13.0-27-generic containerd#29~20.04.1-Ubuntu SMP Fri Jan 14 00:32:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
// open terminal 1
kubectl run --image=nginx --image-pull-policy=IfNotPresent nginx-pod
// open terminal 2
/usr/share/bcc/tools/stackcount ext4_sync_fs -i 1 -v -P
ext4_sync_fs
sync_filesystem
ovl_sync_fs
__sync_filesystem
sync_filesystem
generic_shutdown_super
kill_anon_super
deactivate_locked_super
deactivate_super
cleanup_mnt
__cleanup_mnt
task_work_run
exit_to_user_mode_prepare
syscall_exit_to_user_mode
do_syscall_64
entry_SYSCALL_64_after_hwframe
syscall.Syscall.abi0
github.com/containerd/containerd/mount.unmount
github.com/containerd/containerd/mount.UnmountAll
github.com/containerd/containerd/mount.WithTempMount.func2
github.com/containerd/containerd/mount.WithTempMount
github.com/containerd/containerd/oci.WithUserID.func1
github.com/containerd/containerd/oci.WithUser.func1
github.com/containerd/containerd/oci.ApplyOpts
github.com/containerd/containerd.WithSpec.func1
github.com/containerd/containerd.(*Client).NewContainer
github.com/containerd/containerd/pkg/cri/server.(*criService).CreateContainer
github.com/containerd/containerd/pkg/cri/server.(*instrumentedService).CreateContainer
k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler.func1
github.com/containerd/containerd/services/server.unaryNamespaceInterceptor
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler
google.golang.org/grpc.(*Server).processUnaryRPC
google.golang.org/grpc.(*Server).handleStream
google.golang.org/grpc.(*Server).serveStreams.func1.2
runtime.goexit.abi0
containerd [34771]
1
```
If there are comming several create requestes, umount actions might
bring high IO pressure on the /var/lib/containerd's underlying disk.
After checkout the kernel code[1], the kernel will not call
__sync_filesystem if the mount is readonly. Based on this, containerd
should use readonly mount to get UID/GID information.
Reference:
* [1] https://elixir.bootlin.com/linux/v5.13/source/fs/sync.c#L61
Closes: containerd#4604
Signed-off-by: Wei Fu <fuweid89@gmail.com>
(cherry picked from commit 813a061)
Signed-off-by: Qiutong Song <qiutongs@google.com>