Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker commands (rm/kill/inspect/...) hangs on a said running but already exited container #42894

Open
Sh4d1 opened this issue Sep 28, 2021 · 7 comments

Comments

@Sh4d1
Copy link
Contributor

Sh4d1 commented Sep 28, 2021

Description

Context:

A running container (launched with docker-compose) and a restart: no policy, with a process that exit with a status code of 0.
Here is the docker-compose file (docker-compose version 1.25.4, build 8d51620a) (just anonymized some info with ***):

version: '3'
services:
  ***:
    image: ***
    container_name: ***
    restart: never
    network_mode: bridge
    hostname: ***
    command: ['***', '***']
    volumes:
      - ./data:/data
      - /etc/ssl/certs:/etc/ssl/certs:ro

    logging:
      driver: fluentd
      options:
        fluentd-address: localhost
        fluentd-async-connect: 'true'
        fluentd-buffer-limit: 2M

When seeing this, restart: never is not a valid policy yet docker-compose does not mind, so I guess it's the no default restart policy that is in use (fixed with later docker-compose release).

Issue:

When trying to stop/kill/inspect/rm this container, all the docker <action> <container_id> hangs.

I've found #30927 which is kind of old and #40817 (see below but I don't have any hung runc processes)

The stuck container ID here is 1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602 and the process linked to this container, is non existant.

What I've seen:

  • ctr -n moby task ls -> nothing
  • ctr -n moby c ls -> I can see the container
  • ctr -n moby containers info 1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
Click to see
{
    "ID": "1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602",
    "Labels": {
        "com.docker/engine.bundle.path": "/var/run/docker/containerd/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602"
    },
    "Image": "",
    "Runtime": {
        "Name": "io.containerd.runc.v2",
        "Options": {
            "type_url": "containerd.runc.v1.Options",
            "value": "MgRydW5jOhwvdmFyL3J1bi9kb2NrZXIvcnVudGltZS1ydW5j"
        }
    },
    "SnapshotKey": "",
    "Snapshotter": "",
    "CreatedAt": "2021-09-28T12:49:45.578665569Z",
    "UpdatedAt": "2021-09-28T12:49:45.578665569Z",
    "Extensions": null,
    "Spec": {
        "ociVersion": "1.0.2-dev",
        "process": {
            "user": {
                "uid": 101,
                "gid": 101,
                "additionalGids": [
                    101
                ]
            },
            "args": [
                "***",
                "***"
            ],
            "env": [
                "PATH=/opt/venv/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "HOSTNAME=***",
                "LANG=C.UTF-8"
            ],
            "cwd": "/",
            "capabilities": {
                "bounding": [
                    "CAP_CHOWN",
                    "CAP_DAC_OVERRIDE",
                    "CAP_FSETID",
                    "CAP_FOWNER",
                    "CAP_MKNOD",
                    "CAP_NET_RAW",
                    "CAP_SETGID",
                    "CAP_SETUID",
                    "CAP_SETFCAP",
                    "CAP_SETPCAP",
                    "CAP_NET_BIND_SERVICE",
                    "CAP_SYS_CHROOT",
                    "CAP_KILL",
                    "CAP_AUDIT_WRITE"
                ],
                "inheritable": [
                    "CAP_CHOWN",
                    "CAP_DAC_OVERRIDE",
                    "CAP_FSETID",
                    "CAP_FOWNER",
                    "CAP_MKNOD",
                    "CAP_NET_RAW",
                    "CAP_SETGID",
                    "CAP_SETUID",
                    "CAP_SETFCAP",
                    "CAP_SETPCAP",
                    "CAP_NET_BIND_SERVICE",
                    "CAP_SYS_CHROOT",
                    "CAP_KILL",
                    "CAP_AUDIT_WRITE"
                ]
            },
            "apparmorProfile": "docker-default",
            "oomScoreAdj": 0
        },
        "root": {
            "path": "/var/lib/docker/overlay2/0ddc2295cbbe3affd23cfd474488f88d715ae4413f8f655830fbaf0274441002/merged"
        },
        "hostname": "***",
        "mounts": [
            {
                "destination": "/proc",
                "type": "proc",
                "source": "proc",
                "options": [
                    "nosuid",
                    "noexec",
                    "nodev"
                ]
            },
            {
                "destination": "/dev",
                "type": "tmpfs",
                "source": "tmpfs",
                "options": [
                    "nosuid",
                    "strictatime",
                    "mode=755",
                    "size=65536k"
                ]
            },
            {
                "destination": "/dev/pts",
                "type": "devpts",
                "source": "devpts",
                "options": [
                    "nosuid",
                    "noexec",
                    "newinstance",
                    "ptmxmode=0666",
                    "mode=0620",
                    "gid=5"
                ]
            },
            {
                "destination": "/sys",
                "type": "sysfs",
                "source": "sysfs",
                "options": [
                    "nosuid",
                    "noexec",
                    "nodev",
                    "ro"
                ]
            },
            {
                "destination": "/sys/fs/cgroup",
                "type": "cgroup",
                "source": "cgroup",
                "options": [
                    "ro",
                    "nosuid",
                    "noexec",
                    "nodev"
                ]
            },
            {
                "destination": "/dev/mqueue",
                "type": "mqueue",
                "source": "mqueue",
                "options": [
                    "nosuid",
                    "noexec",
                    "nodev"
                ]
            },
            {
                "destination": "/data",
                "type": "bind",
                "source": "***",
                "options": [
                    "rbind",
                    "rprivate"
                ]
            },
            {
                "destination": "/etc/resolv.conf",
                "type": "bind",
                "source": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/resolv.conf",
                "options": [
                    "rbind",
                    "rprivate"
                ]
            },
            {
                "destination": "/etc/hostname",
                "type": "bind",
                "source": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/hostname",
                "options": [
                    "rbind",
                    "rprivate"
                ]
            },
            {
                "destination": "/etc/hosts",
                "type": "bind",
                "source": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/hosts",
                "options": [
                    "rbind",
                    "rprivate"
                ]
            },
            {
                "destination": "/dev/shm",
                "type": "bind",
                "source": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/mounts/shm",
                "options": [
                    "rbind",
                    "rprivate"
                ]
            },
            {
                "destination": "/etc/ssl/certs",
                "type": "bind",
                "source": "/etc/ssl/certs",
                "options": [
                    "rbind",
                    "ro",
                    "rprivate"
                ]
            }
        ],
        "hooks": {
            "prestart": [
                {
                    "path": "/proc/5754/exe",
                    "args": [
                        "libnetwork-setkey",
                        "-exec-root=/var/run/docker",
                        "1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602",
                        "8d348b5ac848"
                    ]
                }
            ]
        },
        "linux": {
            "sysctl": {
                "net.ipv4.ip_unprivileged_port_start": "0"
            },
            "resources": {
                "devices": [
                    {
                        "allow": false,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 5,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 3,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 9,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 1,
                        "minor": 8,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 5,
                        "minor": 0,
                        "access": "rwm"
                    },
                    {
                        "allow": true,
                        "type": "c",
                        "major": 5,
                        "minor": 1,
                        "access": "rwm"
                    },
                    {
                        "allow": false,
                        "type": "c",
                        "major": 10,
                        "minor": 229,
                        "access": "rwm"
                    }
                ],
                "memory": {
                    "disableOOMKiller": false
                },
                "cpu": {
                    "shares": 0
                },
                "blockIO": {
                    "weight": 0
                }
            },
            "cgroupsPath": "/docker/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602",
            "namespaces": [
                {
                    "type": "mount"
                },
                {
                    "type": "network"
                },
                {
                    "type": "uts"
                },
                {
                    "type": "pid"
                },
                {
                    "type": "ipc"
                }
            ],
            "seccomp": {
                "defaultAction": "SCMP_ACT_ERRNO",
                "architectures": [
                    "SCMP_ARCH_X86_64",
                    "SCMP_ARCH_X86",
                    "SCMP_ARCH_X32"
                ],
                "syscalls": [
                    {
                        "names": [
                            "accept",
                            "accept4",
                            "access",
                            "adjtimex",
                            "alarm",
                            "bind",
                            "brk",
                            "capget",
                            "capset",
                            "chdir",
                            "chmod",
                            "chown",
                            "chown32",
                            "clock_adjtime",
                            "clock_adjtime64",
                            "clock_getres",
                            "clock_getres_time64",
                            "clock_gettime",
                            "clock_gettime64",
                            "clock_nanosleep",
                            "clock_nanosleep_time64",
                            "close",
                            "close_range",
                            "connect",
                            "copy_file_range",
                            "creat",
                            "dup",
                            "dup2",
                            "dup3",
                            "epoll_create",
                            "epoll_create1",
                            "epoll_ctl",
                            "epoll_ctl_old",
                            "epoll_pwait",
                            "epoll_pwait2",
                            "epoll_wait",
                            "epoll_wait_old",
                            "eventfd",
                            "eventfd2",
                            "execve",
                            "execveat",
                            "exit",
                            "exit_group",
                            "faccessat",
                            "faccessat2",
                            "fadvise64",
                            "fadvise64_64",
                            "fallocate",
                            "fanotify_mark",
                            "fchdir",
                            "fchmod",
                            "fchmodat",
                            "fchown",
                            "fchown32",
                            "fchownat",
                            "fcntl",
                            "fcntl64",
                            "fdatasync",
                            "fgetxattr",
                            "flistxattr",
                            "flock",
                            "fork",
                            "fremovexattr",
                            "fsetxattr",
                            "fstat",
                            "fstat64",
                            "fstatat64",
                            "fstatfs",
                            "fstatfs64",
                            "fsync",
                            "ftruncate",
                            "ftruncate64",
                            "futex",
                            "futex_time64",
                            "futimesat",
                            "getcpu",
                            "getcwd",
                            "getdents",
                            "getdents64",
                            "getegid",
                            "getegid32",
                            "geteuid",
                            "geteuid32",
                            "getgid",
                            "getgid32",
                            "getgroups",
                            "getgroups32",
                            "getitimer",
                            "getpeername",
                            "getpgid",
                            "getpgrp",
                            "getpid",
                            "getppid",
                            "getpriority",
                            "getrandom",
                            "getresgid",
                            "getresgid32",
                            "getresuid",
                            "getresuid32",
                            "getrlimit",
                            "get_robust_list",
                            "getrusage",
                            "getsid",
                            "getsockname",
                            "getsockopt",
                            "get_thread_area",
                            "gettid",
                            "gettimeofday",
                            "getuid",
                            "getuid32",
                            "getxattr",
                            "inotify_add_watch",
                            "inotify_init",
                            "inotify_init1",
                            "inotify_rm_watch",
                            "io_cancel",
                            "ioctl",
                            "io_destroy",
                            "io_getevents",
                            "io_pgetevents",
                            "io_pgetevents_time64",
                            "ioprio_get",
                            "ioprio_set",
                            "io_setup",
                            "io_submit",
                            "io_uring_enter",
                            "io_uring_register",
                            "io_uring_setup",
                            "ipc",
                            "kill",
                            "lchown",
                            "lchown32",
                            "lgetxattr",
                            "link",
                            "linkat",
                            "listen",
                            "listxattr",
                            "llistxattr",
                            "_llseek",
                            "lremovexattr",
                            "lseek",
                            "lsetxattr",
                            "lstat",
                            "lstat64",
                            "madvise",
                            "membarrier",
                            "memfd_create",
                            "mincore",
                            "mkdir",
                            "mkdirat",
                            "mknod",
                            "mknodat",
                            "mlock",
                            "mlock2",
                            "mlockall",
                            "mmap",
                            "mmap2",
                            "mprotect",
                            "mq_getsetattr",
                            "mq_notify",
                            "mq_open",
                            "mq_timedreceive",
                            "mq_timedreceive_time64",
                            "mq_timedsend",
                            "mq_timedsend_time64",
                            "mq_unlink",
                            "mremap",
                            "msgctl",
                            "msgget",
                            "msgrcv",
                            "msgsnd",
                            "msync",
                            "munlock",
                            "munlockall",
                            "munmap",
                            "nanosleep",
                            "newfstatat",
                            "_newselect",
                            "open",
                            "openat",
                            "openat2",
                            "pause",
                            "pidfd_open",
                            "pidfd_send_signal",
                            "pipe",
                            "pipe2",
                            "poll",
                            "ppoll",
                            "ppoll_time64",
                            "prctl",
                            "pread64",
                            "preadv",
                            "preadv2",
                            "prlimit64",
                            "pselect6",
                            "pselect6_time64",
                            "pwrite64",
                            "pwritev",
                            "pwritev2",
                            "read",
                            "readahead",
                            "readlink",
                            "readlinkat",
                            "readv",
                            "recv",
                            "recvfrom",
                            "recvmmsg",
                            "recvmmsg_time64",
                            "recvmsg",
                            "remap_file_pages",
                            "removexattr",
                            "rename",
                            "renameat",
                            "renameat2",
                            "restart_syscall",
                            "rmdir",
                            "rseq",
                            "rt_sigaction",
                            "rt_sigpending",
                            "rt_sigprocmask",
                            "rt_sigqueueinfo",
                            "rt_sigreturn",
                            "rt_sigsuspend",
                            "rt_sigtimedwait",
                            "rt_sigtimedwait_time64",
                            "rt_tgsigqueueinfo",
                            "sched_getaffinity",
                            "sched_getattr",
                            "sched_getparam",
                            "sched_get_priority_max",
                            "sched_get_priority_min",
                            "sched_getscheduler",
                            "sched_rr_get_interval",
                            "sched_rr_get_interval_time64",
                            "sched_setaffinity",
                            "sched_setattr",
                            "sched_setparam",
                            "sched_setscheduler",
                            "sched_yield",
                            "seccomp",
                            "select",
                            "semctl",
                            "semget",
                            "semop",
                            "semtimedop",
                            "semtimedop_time64",
                            "send",
                            "sendfile",
                            "sendfile64",
                            "sendmmsg",
                            "sendmsg",
                            "sendto",
                            "setfsgid",
                            "setfsgid32",
                            "setfsuid",
                            "setfsuid32",
                            "setgid",
                            "setgid32",
                            "setgroups",
                            "setgroups32",
                            "setitimer",
                            "setpgid",
                            "setpriority",
                            "setregid",
                            "setregid32",
                            "setresgid",
                            "setresgid32",
                            "setresuid",
                            "setresuid32",
                            "setreuid",
                            "setreuid32",
                            "setrlimit",
                            "set_robust_list",
                            "setsid",
                            "setsockopt",
                            "set_thread_area",
                            "set_tid_address",
                            "setuid",
                            "setuid32",
                            "setxattr",
                            "shmat",
                            "shmctl",
                            "shmdt",
                            "shmget",
                            "shutdown",
                            "sigaltstack",
                            "signalfd",
                            "signalfd4",
                            "sigprocmask",
                            "sigreturn",
                            "socket",
                            "socketcall",
                            "socketpair",
                            "splice",
                            "stat",
                            "stat64",
                            "statfs",
                            "statfs64",
                            "statx",
                            "symlink",
                            "symlinkat",
                            "sync",
                            "sync_file_range",
                            "syncfs",
                            "sysinfo",
                            "tee",
                            "tgkill",
                            "time",
                            "timer_create",
                            "timer_delete",
                            "timer_getoverrun",
                            "timer_gettime",
                            "timer_gettime64",
                            "timer_settime",
                            "timer_settime64",
                            "timerfd_create",
                            "timerfd_gettime",
                            "timerfd_gettime64",
                            "timerfd_settime",
                            "timerfd_settime64",
                            "times",
                            "tkill",
                            "truncate",
                            "truncate64",
                            "ugetrlimit",
                            "umask",
                            "uname",
                            "unlink",
                            "unlinkat",
                            "utime",
                            "utimensat",
                            "utimensat_time64",
                            "utimes",
                            "vfork",
                            "vmsplice",
                            "wait4",
                            "waitid",
                            "waitpid",
                            "write",
                            "writev"
                        ],
                        "action": "SCMP_ACT_ALLOW"
                    },
                    {
                        "names": [
                            "ptrace"
                        ],
                        "action": "SCMP_ACT_ALLOW"
                    },
                    {
                        "names": [
                            "personality"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 0,
                                "op": "SCMP_CMP_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "personality"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 8,
                                "op": "SCMP_CMP_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "personality"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 131072,
                                "op": "SCMP_CMP_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "personality"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 131080,
                                "op": "SCMP_CMP_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "personality"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 4294967295,
                                "op": "SCMP_CMP_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "arch_prctl"
                        ],
                        "action": "SCMP_ACT_ALLOW"
                    },
                    {
                        "names": [
                            "modify_ldt"
                        ],
                        "action": "SCMP_ACT_ALLOW"
                    },
                    {
                        "names": [
                            "clone"
                        ],
                        "action": "SCMP_ACT_ALLOW",
                        "args": [
                            {
                                "index": 0,
                                "value": 2114060288,
                                "op": "SCMP_CMP_MASKED_EQ"
                            }
                        ]
                    },
                    {
                        "names": [
                            "chroot"
                        ],
                        "action": "SCMP_ACT_ALLOW"
                    }
                ]
            },
            "maskedPaths": [
                "/proc/asound",
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "readonlyPaths": [
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        }
    }
}

  • find / -name "*1c71de80*"
/sys/kernel/slab/sock_inode_cache/cgroup/sock_inode_cache(3068:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/kmalloc-rcl-96/cgroup/kmalloc-rcl-96(3068:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/kmalloc-rcl-96/cgroup/kmalloc-rcl-96(1347:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/radix_tree_node/cgroup/radix_tree_node(1347:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/dentry/cgroup/dentry(1347:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/dentry/cgroup/dentry(3068:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/dentry/cgroup/dentry(1597:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/kmalloc-rcl-64/cgroup/kmalloc-rcl-64(1347:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/kmalloc-rcl-64/cgroup/kmalloc-rcl-64(3068:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/kmalloc-rcl-64/cgroup/kmalloc-rcl-64(1597:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/ovl_inode/cgroup/ovl_inode(3068:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/ext4_inode_cache/cgroup/ext4_inode_cache(1347:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/kernel/slab/:A-0001152/cgroup/signal_cache(3068:1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)
/sys/fs/cgroup/pids/system.slice/var-lib-docker-containers-1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602-mounts-shm.mount
/sys/fs/cgroup/blkio/system.slice/var-lib-docker-containers-1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602-mounts-shm.mount
/sys/fs/cgroup/devices/system.slice/var-lib-docker-containers-1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602-mounts-shm.mount
/sys/fs/cgroup/memory/system.slice/var-lib-docker-containers-1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602-mounts-shm.mount
/sys/fs/cgroup/cpu,cpuacct/system.slice/var-lib-docker-containers-1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602-mounts-shm.mount
/sys/fs/cgroup/systemd/system.slice/var-lib-docker-containers-1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602-mounts-shm.mount
/sys/fs/cgroup/unified/system.slice/var-lib-docker-containers-1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602-mounts-shm.mount
/var/lib/docker/image/overlay2/layerdb/mounts/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
/run/docker/containerd/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
  • In /var/run/docker/containerd/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/ there is two named pipes -> init-stdout and init-stderr
  • tree /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
├── checkpoints
├── config.v2.json
├── container-cached.log
├── hostconfig.json
├── hostname
├── hosts
├── mounts
│   └── shm
├── resolv.conf
└── resolv.conf.hash

3 directories, 7 files
  • cat /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/config.v2.json
Click to see
{
  "StreamConfig": {},
  "State": {
    "Running": true,
    "Paused": false,
    "Restarting": false,
    "OOMKilled": false,
    "RemovalInProgress": false,
    "Dead": false,
    "Pid": 6108,
    "ExitCode": 0,
    "Error": "",
    "StartedAt": "2021-09-28T12:49:45.945753185Z",
    "FinishedAt": "2021-09-28T12:49:44.593011751Z",
    "Health": null
  },
  "ID": "1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602",
  "Created": "2021-09-07T12:27:41.549873025Z",
  "Managed": false,
  "Path": "***",
  "Args": [
    "***"
  ],
  "Config": {
    "Hostname": "***",
    "Domainname": "",
    "User": "***",
    "AttachStdin": false,
    "AttachStdout": false,
    "AttachStderr": false,
    "Tty": false,
    "OpenStdin": false,
    "StdinOnce": false,
    "Env": [
      "PATH=/opt/venv/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
      "LANG=C.UTF-8"
    ],
    "Cmd": [
      "***",
      "***"
    ],
    "Image": "***",
    "Volumes": {
      "/data": {},
      "/etc/ssl/certs": {}
    },
    "WorkingDir": "",
    "Entrypoint": null,
    "OnBuild": null,
    "Labels": {
      "com.docker.compose.config-hash": "318cec37934e0aa3669251c5d0c65762842db6aa7221a07df941f247529fb92d",
      "com.docker.compose.container-number": "1",
      "com.docker.compose.oneoff": "False",
      "com.docker.compose.project": "***",
      "com.docker.compose.project.config_files": "***",
      "com.docker.compose.project.working_dir": "***",
      "com.docker.compose.service": "***",
      "com.docker.compose.version": "1.25.4"
    }
  },
  "Image": "sha256:88efb1b4c07c8e691f504779ba534e7d079171a67a30ce4d90b6e894833e8da4",
  "NetworkSettings": {
    "Bridge": "",
    "SandboxID": "06dbfe1d245992abf3f075ecd893b1b6a44957519cbee8b77e3acaca579dc625",
    "HairpinMode": false,
    "LinkLocalIPv6Address": "",
    "LinkLocalIPv6PrefixLen": 0,
    "Networks": {
      "bridge": {
        "IPAMConfig": null,
        "Links": null,
        "Aliases": null,
        "NetworkID": "8b71e7a01854df19d6bf23ecbf76c1379317f39d34965d0c1992df62b40ed2e7",
        "EndpointID": "4395f7f3b45e21fd5b7516a771265d6ce81d9e93b01fd1c0d30767f642e98c6a",
        "Gateway": "100.64.0.1",
        "IPAddress": "100.64.0.4",
        "IPPrefixLen": 24,
        "IPv6Gateway": "",
        "GlobalIPv6Address": "",
        "GlobalIPv6PrefixLen": 0,
        "MacAddress": "02:42:64:40:00:04",
        "DriverOpts": null,
        "IPAMOperational": false
      }
    },
    "Service": null,
    "Ports": {},
    "SandboxKey": "/var/run/docker/netns/06dbfe1d2459",
    "SecondaryIPAddresses": null,
    "SecondaryIPv6Addresses": null,
    "IsAnonymousEndpoint": false,
    "HasSwarmEndpoint": false
  },
  "LogPath": "",
  "Name": "***",
  "Driver": "overlay2",
  "OS": "linux",
  "MountLabel": "",
  "ProcessLabel": "",
  "RestartCount": 0,
  "HasBeenStartedBefore": true,
  "HasBeenManuallyStopped": false,
  "MountPoints": {
    "/data": {
      "Source": "***",
      "Destination": "/data",
      "RW": true,
      "Name": "",
      "Driver": "",
      "Type": "bind",
      "Relabel": "rw",
      "Propagation": "rprivate",
      "Spec": {
        "Type": "bind",
        "Source": "***",
        "Target": "/data"
      },
      "SkipMountpointCreation": false
    },
    "/etc/ssl/certs": {
      "Source": "/etc/ssl/certs",
      "Destination": "/etc/ssl/certs",
      "RW": false,
      "Name": "",
      "Driver": "",
      "Type": "bind",
      "Relabel": "ro",
      "Propagation": "rprivate",
      "Spec": {
        "Type": "bind",
        "Source": "/etc/ssl/certs",
        "Target": "/etc/ssl/certs",
        "ReadOnly": true
      },
      "SkipMountpointCreation": false
    }
  },
  "SecretReferences": null,
  "ConfigReferences": null,
  "AppArmorProfile": "docker-default",
  "HostnamePath": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/hostname",
  "HostsPath": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/hosts",
  "ShmPath": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/mounts/shm",
  "ResolvConfPath": "/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/resolv.conf",
  "SeccompProfile": "",
  "NoNewPrivileges": false,
  "LocalLogCacheMeta": {
    "HaveNotifyEnabled": true
  }
}
  • file /var/run/docker/netns/06dbfe1d2459
/var/run/docker/netns/06dbfe1d2459: empty

Then I tried:

  • ctr -n moby c kill 1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602 -> ok, container is gone with ctr c ls but docker rm still hangs

  • rm -Rf /var/run/docker/containerd/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/ -> ok but docker rm still hangs

  • strace docker rm -f 1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602

[...]
sched_yield()                           = 0
futex(0x5577942964f8, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x557794296ed0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
epoll_pwait(4, [], 128, 0, NULL, 2)     = 0
futex(0xc000100150, FUTEX_WAKE_PRIVATE, 1) = 1
getpid()                                = 17018
tgkill(17018, 17022, SIGURG)            = 0
getpid()                                = 17018
tgkill(17018, 17022, SIGURG)            = 0
getpid()                                = 17018
tgkill(17018, 17022, SIGURG)            = 0
getpid()                                = 17018
tgkill(17018, 17022, SIGURG)            = 0
getpid()                                = 17018
tgkill(17018, 17022, SIGURG)            = 0
mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f58a0397000
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=17018, si_uid=0} ---
rt_sigreturn({mask=[]})                 = 140018621968384
futex(0x557794296618, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0xc00007c950, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x557794296ed0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x557794296ed0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x557794296ed0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x557794296ed0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x557794296ed0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x557794296ed0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x557794296ed0, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x557794296ed0, FUTEX_WAIT_PRIVATE, 0, NULL^Cstrace: Process 17018 detached
 <detached ...>

Then I fixed it with a known fix (had to fix the issue):

  • systemctl stop docker -> a bit long, in the logs :
Sep 28 15:53:44 *** dockerd[5754]: time="2021-09-28T15:53:44.924697565Z" level=info msg="Processing signal 'terminated'"
Sep 28 15:53:44 *** systemd[1]: Stopping Docker Application Container Engine...
Sep 28 15:53:45 *** dockerd[5754]: time="2021-09-28T15:53:45.081016651Z" level=info msg="ignoring event" container=70ba0a8a25d74f51cc4d3dbb4cc0661c48e6fde9aa12f65812d28dc571341647 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 28 15:53:47 *** dockerd[5754]: time="2021-09-28T15:53:47.571515956Z" level=info msg="ignoring event" container=b0db8fbfb7abfce045ee4d3c63feb5b2440d77c403817ef2ff184fb7ba6a77ce module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 28 15:53:54 *** dockerd[5754]: time="2021-09-28T15:53:54.950135393Z" level=info msg="Container cab32eeb09062fc4bda966f221f8b41d100c4d125cec7c723400967eece04565 failed to exit within 10 seconds of signal 15 - using the force"
Sep 28 15:53:54 *** dockerd[5754]: time="2021-09-28T15:53:54.970739162Z" level=info msg="Container 2c4de63fc67a3c29187c57f29eb6e22aa4f76642724cb80cc8b5aafbfec6deba failed to exit within 10 seconds of signal 15 - using the force"
Sep 28 15:53:55 *** dockerd[5754]: time="2021-09-28T15:53:55.085602746Z" level=info msg="ignoring event" container=2c4de63fc67a3c29187c57f29eb6e22aa4f76642724cb80cc8b5aafbfec6deba module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 28 15:53:55 *** dockerd[5754]: time="2021-09-28T15:53:55.098458549Z" level=info msg="ignoring event" container=cab32eeb09062fc4bda966f221f8b41d100c4d125cec7c723400967eece04565 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 28 15:53:59 *** dockerd[5754]: time="2021-09-28T15:53:59.929478076Z" level=error msg="Force shutdown daemon"
Sep 28 15:53:59 *** dockerd[5754]: time="2021-09-28T15:53:59.929583339Z" level=info msg="Daemon shutdown complete"
Sep 28 15:53:59 *** systemd[1]: docker.service: Succeeded.
Sep 28 15:53:59 *** systemd[1]: Stopped Docker Application Container Engine.
Sep 28 15:53:45 *** containerd[733]: time="2021-09-28T15:53:45.081003758Z" level=info msg="shim disconnected" id=70ba0a8a25d74f51cc4d3dbb4cc0661c48e6fde9aa12f65812d28dc571341647
Sep 28 15:53:45 *** containerd[733]: time="2021-09-28T15:53:45.081113276Z" level=error msg="copy shim log" error="read /proc/self/fd/24: file already closed"
Sep 28 15:53:47 *** containerd[733]: time="2021-09-28T15:53:47.573421210Z" level=info msg="shim disconnected" id=b0db8fbfb7abfce045ee4d3c63feb5b2440d77c403817ef2ff184fb7ba6a77ce
Sep 28 15:53:47 *** containerd[733]: time="2021-09-28T15:53:47.573531256Z" level=error msg="copy shim log" error="read /proc/self/fd/12: file already closed"
Sep 28 15:53:55 *** containerd[733]: time="2021-09-28T15:53:55.085920466Z" level=info msg="shim disconnected" id=2c4de63fc67a3c29187c57f29eb6e22aa4f76642724cb80cc8b5aafbfec6deba
Sep 28 15:53:55 *** containerd[733]: time="2021-09-28T15:53:55.086035828Z" level=error msg="copy shim log" error="read /proc/self/fd/15: file already closed"
Sep 28 15:53:55 *** containerd[733]: time="2021-09-28T15:53:55.099001865Z" level=info msg="shim disconnected" id=cab32eeb09062fc4bda966f221f8b41d100c4d125cec7c723400967eece04565
Sep 28 15:53:55 *** containerd[733]: time="2021-09-28T15:53:55.099122101Z" level=error msg="copy shim log" error="read /proc/self/fd/20: file already closed"

                                    
  • rm -Rf /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
rm: cannot remove '/var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/mounts/shm': Device or resource busy
  • umount /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/mounts/shm ok
  • rm -Rf /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602 ok
  • systemctl start docker
Sep 28 15:57:40 *** dockerd[17375]: time="2021-09-28T15:57:40.517315214Z" level=info msg="Loading containers: start."
Sep 28 15:57:40 *** dockerd[17375]: time="2021-09-28T15:57:40.517551932Z" level=error msg="failed to load container" container=2d36abfab7fa136d8d36359c2cae30314d83ddadc37985d25531a5f0a1529779 error="open /var/lib/docker/containers/2d36abfab7fa136d8d36359c2cae30314d83ddadc37985d25531a5f0a1529779/config.v2.json: no such file or directory"
Sep 28 15:57:40 *** dockerd[17375]: time="2021-09-28T15:57:40.518271443Z" level=error msg="failed to load container" container=1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602 error="open /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/config.v2.json: no such file or directory"
Sep 28 15:57:40 *** dockerd[17375]: time="2021-09-28T15:57:40.878044379Z" level=info msg="Removing stale sandbox 06dbfe1d245992abf3f075ecd893b1b6a44957519cbee8b77e3acaca579dc625 (1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602)"
Sep 28 15:57:40 *** dockerd[17375]: time="2021-09-28T15:57:40.896801126Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 8b71e7a01854df19d6bf23ecbf76c1379317f39d34965d0c1992df62b40ed2e7 4395f7f3b45e21fd5b7516a771265d6ce81d9e93b01fd1c0d30767f642e98c6a], retrying...."
Sep 28 15:57:41 *** dockerd[17375]: time="2021-09-28T15:57:41.845363492Z" level=info msg="Loading containers: done."
Sep 28 15:57:41 *** dockerd[17375]: time="2021-09-28T15:57:41.885190387Z" level=info msg="Docker daemon" commit=75249d8 graphdriver(s)=overlay2 version=20.10.8
Sep 28 15:57:41 *** dockerd[17375]: time="2021-09-28T15:57:41.885304842Z" level=info msg="Daemon has completed initialization"
Sep 28 15:57:41 *** dockerd[17375]: time="2021-09-28T15:57:41.931954248Z" level=info msg="API listen on /var/run/docker.sock"
Sep 28 15:57:41 *** systemd[1]: Started Docker Application Container Engine.

Steps to reproduce the issue:

Seems pretty random and kind of rare 😅

Output of docker version:

Client: Docker Engine - Community
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:54:27 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:52:33 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.6.1-docker)
  scan: Docker Scan (Docker Inc., v0.8.0)

Server:
 Containers: 8
  Running: 5
  Paused: 0
  Stopped: 3
 Images: 9
 Server Version: 20.10.8
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e25210fe30a0a703442421b0f60afac609f950a3
 runc version: v1.0.1-0-g4144b63
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-73-generic
 Operating System: Ubuntu 20.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 22.81GiB
 ID: XAWD:7ZZL:Q2TZ:NTTV:ZILV:S335:B3PR:BJRD:76XP:7KVY:CHPS:OGPS
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: ***:3128
 HTTPS Proxy: ***:3128
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Default Address Pools:
   Base: 100.64.0.0/15, Size: 24

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.):
VMs with libvirt on a physical hypervisor

@zq-david-wang
Copy link
Contributor

I had similar issue, some running container shows up in docker ps has already gone(no pid found). It seems to me that the internal state maintained by dockerd is inconsistent with the real state. And this mostly happened when the system is under memory/IO pressure. And restart docker restore the state.
Maybe you should check oomkill/hang errors in kernel log

@akerouanton
Copy link
Member

Hello @Sh4d1, when your CLI commands hang, could you generate a stack trace of dockerd ? It would help maintainers/contributors figure out where and why the daemon is stuck. You can find how to create a stack trace here and how to retrieve it here.

@Sh4d1
Copy link
Contributor Author

Sh4d1 commented Oct 24, 2021

👋 ah good to know! I'll attach it here if it happens again, thanks!

@akerouanton
Copy link
Member

akerouanton commented Oct 24, 2021

I looked at bit more at your description: given the containerd task is gone but the containerd container and the netns are still there, I believe Docker is stuck somewhere here (daemon.Cleanup() calls containerd to delete its container and removes the netns):

moby/daemon/monitor.go

Lines 27 to 63 in 4283e93

func (daemon *Daemon) handleContainerExit(c *container.Container, e *libcontainerdtypes.EventInfo) error {
c.Lock()
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
ec, et, err := daemon.containerd.DeleteTask(ctx, c.ID)
cancel()
if err != nil {
logrus.WithError(err).WithField("container", c.ID).Warnf("failed to delete container from containerd")
}
ctx, cancel = context.WithTimeout(context.Background(), 2*time.Second)
c.StreamConfig.Wait(ctx)
cancel()
c.Reset(false)
exitStatus := container.ExitStatus{
ExitCode: int(ec),
ExitedAt: et,
}
if e != nil {
exitStatus.ExitCode = int(e.ExitCode)
exitStatus.ExitedAt = e.ExitedAt
exitStatus.OOMKilled = e.OOMKilled
if e.Error != nil {
c.SetError(e.Error)
}
}
restart, wait, err := c.RestartManager().ShouldRestart(ec, daemon.IsShuttingDown() || c.HasBeenManuallyStopped, time.Since(c.StartedAt))
// cancel healthcheck here, they will be automatically
// restarted if/when the container is started again
daemon.stopHealthchecks(c)
attributes := map[string]string{
"exitCode": strconv.Itoa(int(ec)),
}
daemon.Cleanup(c)

I see you're using fluentd in async mode, do you know if the fluentd server was still running when you tried to stop/kill/rm the container? There's a bug that prevents fluentd logger to stop because it's blocked in an exponential backoff retry loop when there're logs to send but the fluentd server is down. This bug manifests the same symptoms (eg. hanging docker commands, etc...).

@Sh4d1
Copy link
Contributor Author

Sh4d1 commented Oct 26, 2021

Indeed, I suspected flutend at start, but found no clue. IIRC there were some issues with fluentd! I'm going to wait for it to happen again and get the stack trace then! Thanks!

@sparrc
Copy link
Contributor

sparrc commented Oct 26, 2022

Hello from AWS ECS, we believe we have also seen this issue, and as the original opener mentioned, it seems rare and hard to reproduce.

We have also noted the relationship to the fluentd log driver, and we have some reason to believe that recent fixes in the fluent-logger-golang library may have fixed it. These fixes were pulled into moby master and backported to docker 20.10.13 here: #43147

Has anyone seen this issue using docker 20.10.13+ ?

@vnovy
Copy link

vnovy commented Mar 10, 2023

I see this or similar issue with following docker logging config
"log-driver": "fluentd", "log-opts": { "mode": "non-blocking", "fluentd-async": "false", "fluentd-address": "tcp://x.x.x.x:24224", "tag": "docker.{{.ID}}", "fluentd-sub-second-precision": "true"
docker ver. 23.0.1
It seems to be 100% reproducible
1/ start stack with fluentd running
2/ stop fluentd
3/ wait a moment
4/ rm stack
5/ containers which tried to log something after fluentd stop are ghosts

"fluentd-async": "true", problem does not appear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants