New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open file handle to the log file makes all container operations to hang #3620
Comments
Hmm.. hard to tell. I see you're not running the latest patch release of Docker 20.10, and there's been at least one fix related to logging in a later patch release (moby/moby#43165), One other aspect that could relate to this, is that the files used by the JSON-file logging driver were designed to be exclusively accessed by the docker daemon (they effectively are only an internal storage mechanism to facilitate the Unfortunately there's various tools (and I think kubernetes itself including) that access those files, which at times has resulted in either those files being locked, or ending up in multiple mount-namespaces (cAdvisor being a known actor in that), which can result in rather ugly situations (sometimes only resolvable by a reboot if mount namespaces get nested into other mount namespaces). |
We found out what's causing the issue, if it might spark an idea on your end.
That explains how the issue happens, not why. |
So the error message comes from the OCI runtime (runc); https://github.com/opencontainers/runc/blob/main/libcontainer/init_linux.go#L113-L136 Environment variables are indeed not allowed to contain I know at some point we added validation earlier in the stack to match what POSIX defines, but we had to revert that validation, because the specification about "what's allowed" is a bit of a grey area; and leaves a lot up to the implementation (so environment variables that should be invalid are actually accepted, and some frameworks depend on the invalid names, so we decided to leave it to Linux to decide what it accepts (without trying to match its validation). Perhaps we could add back a check for this specific case though, as it's unlikely As to the "why" for the log-file; that's a good question; it looks like the container state may be somewhere in between (container created, but the OCI runtime failing on it); usually that would mean it should catch that it failed, but perhaps there's some race condition here. If you have a test environment to see if it still reproduces on the latest patch release, and the latest containerd/runc versions that may still be useful I did a quick try with a plain |
I also failed reproducing it with plain docker. |
I have been able to get the bull byte error using the docker client library:
But, to my big surprise, this does not trigger the issue.
So there must be something else as well. |
Description
Some containers make
docker
commands to hang.Example:
This container is in
Created
state so it never run AFAIK.Any operation on it hangs:
dockerd
holds a file handle on the log file for that container:and I suspect that is the reason why everything is stuck.
systemctl restart docker
makes the container and the FD disappear and the problem is solved.Steps to reproduce the issue:
Unfortuately I don't know how I get into this situation.
Container is created by kubernetes and that's all I know.
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
This is an AWS EC2 instance.
The text was updated successfully, but these errors were encountered: