New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove container fails on 1.11 due to root filesystem busy when any container mounts host /var/run - regression #21969
Comments
I think your link is backwards: c48439a...dd51e85 |
Oops, git CLI is fine with it reversed but not github - fixed. |
@vikstrous is this the same as #21704? |
@thaJeztah I can't repro #21704 any more, so that makes me suspect that this is not the same. If you guys manage to track this one down, it might shed some light on whether or not they are the same. |
It appears this issue correlates to older kernels. The common theme on all the systems I've reproduced it on are older kernels, and the systems where it doesn't happen are much newer. Failures seen on:
|
I'm not sure if this is related, but based on our observations this error appears in the same set of kernels that exhibit this behavior:
|
I just upgraded my kernel and am seeing this bug constantly. Linux clone3 3.16.0-70-generic #90~14.04.1-Ubuntu SMP Wed Apr 6 22:56:34 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
Same here Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt25-2 (2016-04-08) x86_64 GNU/Linux |
@pdevine @ugurarpaci do you have any way to reproduce this? Is your use case similar to @dhiltgen with additional volume mounts? Are you also using aufs (or dm?). |
I think @dhiltgen may have the most reproducible work around. It's very intermittent for me. |
It appears that the scenario I'm hitting is related to /var/run being mounted in any container on these systems. On these older kernels, if any container mounts /var/run from the host, then other containers can't be removed. |
@justincormack the first problem occurrence could be about /var/run mounting, but that is the convention we have been using for months so I have ignored It. Therefore I thought that could be related to kernel version and I have migrated to the containers to a new VM with updated kernel (Debian 3.16.7). I tried to reproduce the problem again I could make It. Here is the case : |
Found something related to this issue that got fixed in 3.19 kernels, which could explain why the issue doesnt happen on newer kernels. |
@anusha-ragunathan For sure, I will try that. My experience about the problem is that this is more like a meta data problem somehow. I have tried the 1.11.0 version on different kernel versions like 3.2 and newer (3.16 for the latest scenario). The problem occurs randomly; the daemon complains about the aufs layers (which does not exist on the filesystem -interestingly-) which blocks the rm therefore rmi operations. After the docker daemon restart, everything becomes shiny. I try to collect more data about my scenario anyway =) |
@ugurarpaci kernel 3.2 is expected; docker does not run on kernels older than 3.10 |
A simple experiment confirms that 3.19 kernel has robust handling of file removal, which is what fixes the reported issue. On Debian 8 (which ships with 3.16.0-4-amd64 by default) On Ubuntu 15.04 (which ships with 3.19.0-58-generic by default) |
This avoid an extra bind mount within /var/run/docker/libcontainerd This should resolve situations where a container having the host /var/run bound prevents other containers from being cleanly removed (e.g. moby#21969). Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
Seeing similar issue, but with a container that did not have any mounts. It had a restart policy of |
@pheuter That sounds like a different issue. Thanks! |
@cpuguy83 gotcha, will do! |
@mlaventure should this be resolved by #22256? |
Yep. Closing since this is resolved now by not mounting the container's rootfs into /var/run |
This avoid an extra bind mount within /var/run/docker/libcontainerd This should resolve situations where a container having the host /var/run bound prevents other containers from being cleanly removed (e.g. moby#21969). Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com> (cherry picked from commit 3135874)
@thaJeztah @cpuguy83 Will nested bind mounts still be an issue for paths other than /var/run on kernels < 3.19? I anticipate it will but just want to be clear. |
/var/lib will also be an issue. |
@anusha-ragunathan Thanks for the clarification |
Something changed between commits c48439a...dd51e85 in 1.11 development where the daemon now fails removal of containers in some circumstances. I haven't managed to figure out exactly what is unique about our use-case that triggers the failure yet. Here's what I do know:
docker stop
anddocker rm
by hand works without failure.I've been attempting a
git bisect
on the docker/docker tree to find the exact commit that broke it but I'm having some challenges as the containerd integration was going through churn during this timeframe so many commits aren't yielding a testable setup for me.Examples from the client's perspective:
What you see on the damon log:
I'll continue my investigation and update this issue as I uncover more details.
The text was updated successfully, but these errors were encountered: