New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stopped container is shown in docker ps and is unresponsive due to blocked attached output reader #41827
Comments
I debugged this a bit to find a cause, this is what I found:
So that call to The reason why the lock of By looking at dockerd stack traces (and descriptions) it seems that #40455 and #38501 (comment) might be related to the same issue. |
I would be happy to provide a PR that fixes it, but I am not 100% sure what is the correct approach to do that. Still, I made a draft PR with one of the possible ways I could come up with (and a couple others in comments): #41828. The basic idea behind all of them is: add a cleanup method to |
Until moby/moby#41827 is resolved this code causes chaos to unfold on machines and causes servers to be non-terminatable. This logic was intially changed to logical purposes, but this io.Copy logic works perfectly fine (even if not immediately intuitive).
I believe we have the same problem or maybe the issue described in #41828 (comment) I attached my goroutine stacks, when the docker starts to not respond anymore. goroutine-stacks-2021-07-06T060002Z.log I'm currently unsure how I can reproduce this, this currently happens "randomly" every 10-30days on one server. On this server where this happens a lot of containers are stopped and started each minute, not sure if this is related. |
Until moby/moby#41827 is resolved this code causes chaos to unfold on machines and causes servers to be non-terminatable. This logic was intially changed to logical purposes, but this io.Copy logic works perfectly fine (even if not immediately intuitive).
Description
When a container logs a lot and there is a client attached to it that does not read it's output, that container will eventually block on write as it's output buffer fills up - this was discussed in #22502. However, if such a container stops - either due to it's main process exiting by itself or due to being killed with
docker kill
or by sending a death signal - the container will remain indocker ps
and all operations on it (docker exec
,docker kill
etc.) will get stuck. It will remain in that state until that attached client goes away (detaches or exits). If client remains attached, container will be in such "zombie" state for arbitrarily long time.Steps to reproduce the issue:
The easiest way to reproduce is to use
docker attach
command. (another way to do it might be using steps from #22502)docker run --name testing --rm -it ubuntu yes testing > /dev/null
docker attach --no-stdin testing | /bin/sleep infinity
-sleep
does not read it'sstdin
, so attach will block and won't read output from container.docker kill testing
(or killyes
process in container withkill -9
). Note that ifdocker kill
is used, it will get stuck.htop
orps
can be used to verify that processyes testing
is not running any more, as well as correspondingcontainerd-shim
.docker ps
will still show the container as running.docker inspect testing
,docker exec -it testing bash
,docker stop testing
,docker kill testing
anddocker rm -f testing
will all do nothing and just get stuck.sleep
for example) - then container shutdown will complete, it will disappear fromdocker ps
and all operations on it will unblock. At this point an error likestream copy error: read /proc/self/fd/24: file already closed
will be printed to Docker daemon log - this happens here because the read end with container's output coming fromcontainerd
has already been closed byWait
Describe the results you received:
A container that died (it's main process and it's
containerd-shim
exited) is still shown indocker ps
and operations on it get stuck forever without reporting an error.Describe the results you expected:
After container is stopped and it's main process exits (together with
containerd-shim
), it is not shown indocker ps
and all operations on it return immediately with error stating that this container is not running/does not exist.Additional information you deem important:
We originally faced this issue with docker-compose, that attaches to container to read it's logs - see docker/compose#6018
Output of
docker version
:It also reproduces with docker built from
master
git branch.Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
We were able to reproduce it both on a physical machine and on a VM.
The text was updated successfully, but these errors were encountered: