Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/1.5] oci: use readonly mount to read user/group info #6503

Merged
merged 1 commit into from Feb 3, 2022

Commits on Feb 3, 2022

  1. oci: use readonly mount to read user/group info

    In linux kernel, the umount writable-mountpoint will try to do sync-fs
    to make sure that the dirty pages to the underlying filesystems. The many
    number of umount actions in the same time maybe introduce performance
    issue in IOPS limited disk.
    
    When CRI-plugin creates container, it will temp-mount rootfs to read
    that UID/GID info for entrypoint. Basically, the rootfs is writable
    snapshotter and then after read, umount will invoke sync-fs action.
    
    For example, using overlayfs on ext4 and use bcc-tools to monitor
    ext4_sync_fs call.
    
    ```
    // uname -a
    Linux chaofan 5.13.0-27-generic containerd#29~20.04.1-Ubuntu SMP Fri Jan 14 00:32:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
    
    // open terminal 1
    kubectl run --image=nginx --image-pull-policy=IfNotPresent nginx-pod
    
    // open terminal 2
    /usr/share/bcc/tools/stackcount ext4_sync_fs -i 1 -v -P
    
      ext4_sync_fs
      sync_filesystem
      ovl_sync_fs
      __sync_filesystem
      sync_filesystem
      generic_shutdown_super
      kill_anon_super
      deactivate_locked_super
      deactivate_super
      cleanup_mnt
      __cleanup_mnt
      task_work_run
      exit_to_user_mode_prepare
      syscall_exit_to_user_mode
      do_syscall_64
      entry_SYSCALL_64_after_hwframe
      syscall.Syscall.abi0
      github.com/containerd/containerd/mount.unmount
      github.com/containerd/containerd/mount.UnmountAll
      github.com/containerd/containerd/mount.WithTempMount.func2
      github.com/containerd/containerd/mount.WithTempMount
      github.com/containerd/containerd/oci.WithUserID.func1
      github.com/containerd/containerd/oci.WithUser.func1
      github.com/containerd/containerd/oci.ApplyOpts
      github.com/containerd/containerd.WithSpec.func1
      github.com/containerd/containerd.(*Client).NewContainer
      github.com/containerd/containerd/pkg/cri/server.(*criService).CreateContainer
      github.com/containerd/containerd/pkg/cri/server.(*instrumentedService).CreateContainer
      k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler.func1
      github.com/containerd/containerd/services/server.unaryNamespaceInterceptor
      github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
      github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1
      github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
      go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc.UnaryServerInterceptor.func1
      github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
      github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
      k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_CreateContainer_Handler
      google.golang.org/grpc.(*Server).processUnaryRPC
      google.golang.org/grpc.(*Server).handleStream
      google.golang.org/grpc.(*Server).serveStreams.func1.2
      runtime.goexit.abi0
        containerd [34771]
        1
    ```
    
    If there are comming several create requestes, umount actions might
    bring high IO pressure on the /var/lib/containerd's underlying disk.
    
    After checkout the kernel code[1], the kernel will not call
    __sync_filesystem if the mount is readonly. Based on this, containerd
    should use readonly mount to get UID/GID information.
    
    Reference:
    
    * [1] https://elixir.bootlin.com/linux/v5.13/source/fs/sync.c#L61
    
    Closes: containerd#4604
    
    Signed-off-by: Wei Fu <fuweid89@gmail.com>
    (cherry picked from commit 813a061)
    Signed-off-by: Qiutong Song <qiutongs@google.com>
    fuweid authored and qiutongs committed Feb 3, 2022
    Copy the full SHA
    61be716 View commit details
    Browse the repository at this point in the history