New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker commands (rm/kill/inspect/...) hangs on a said running but already exited container #42894
Comments
I had similar issue, some running container shows up in |
👋 ah good to know! I'll attach it here if it happens again, thanks! |
I looked at bit more at your description: given the containerd task is gone but the containerd container and the netns are still there, I believe Docker is stuck somewhere here ( Lines 27 to 63 in 4283e93
I see you're using fluentd in async mode, do you know if the fluentd server was still running when you tried to stop/kill/rm the container? There's a bug that prevents fluentd logger to stop because it's blocked in an exponential backoff retry loop when there're logs to send but the fluentd server is down. This bug manifests the same symptoms (eg. hanging docker commands, etc...). |
Indeed, I suspected flutend at start, but found no clue. IIRC there were some issues with fluentd! I'm going to wait for it to happen again and get the stack trace then! Thanks! |
Hello from AWS ECS, we believe we have also seen this issue, and as the original opener mentioned, it seems rare and hard to reproduce. We have also noted the relationship to the fluentd log driver, and we have some reason to believe that recent fixes in the fluent-logger-golang library may have fixed it. These fixes were pulled into moby master and backported to docker 20.10.13 here: #43147 Has anyone seen this issue using docker 20.10.13+ ? |
I see this or similar issue with following docker logging config "fluentd-async": "true", problem does not appear |
Description
Context:
A running container (launched with docker-compose) and a
restart: no
policy, with a process that exit with a status code of 0.Here is the docker-compose file (docker-compose version 1.25.4, build 8d51620a) (just anonymized some info with
***
):When seeing this,
restart: never
is not a valid policy yet docker-compose does not mind, so I guess it's theno
default restart policy that is in use (fixed with later docker-compose release).Issue:
When trying to stop/kill/inspect/rm this container, all the
docker <action> <container_id>
hangs.I've found #30927 which is kind of old and #40817 (see below but I don't have any hung runc processes)
The stuck container ID here is 1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602 and the process linked to this container, is non existant.
What I've seen:
ctr -n moby task ls
-> nothingctr -n moby c ls
-> I can see the containerctr -n moby containers info 1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
Click to see
find / -name "*1c71de80*"
/var/run/docker/containerd/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/
there is two named pipes ->init-stdout
andinit-stderr
tree /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
cat /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/config.v2.json
Click to see
file /var/run/docker/netns/06dbfe1d2459
Then I tried:
ctr -n moby c kill 1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
-> ok, container is gone withctr c ls
butdocker rm
still hangsrm -Rf /var/run/docker/containerd/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/
-> ok butdocker rm
still hangsstrace docker rm -f 1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
Then I fixed it with a known fix (had to fix the issue):
systemctl stop docker
-> a bit long, in the logs :rm -Rf /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
umount /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602/mounts/shm
okrm -Rf /var/lib/docker/containers/1c71de80ad77d3b39579833bd61b80ae51c313ad7d629e1f658f3d6aeeb28602
oksystemctl start docker
Steps to reproduce the issue:
Seems pretty random and kind of rare 😅
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
VMs with libvirt on a physical hypervisor
The text was updated successfully, but these errors were encountered: