-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker fails to remove containers "driver overlay failed to remove root filesystem: readdirent: no such file or directory" #14474
Comments
Signed-off-by: Zhang Kun <zkazure@gmail.com>
Signed-off-by: Zhang Kun <zkazure@gmail.com>
Signed-off-by: Zhang Kun <zkazure@gmail.com>
Signed-off-by: Zhang Kun <zkazure@gmail.com>
We hit this bug (or something closely related) pretty frequently with the compose test suite on jenkins: "readdirent: no such file or directory" Other related failures
We run the full suite against docker 1.7.1 and 1.8.1, but all of these failures happened only during the 1.8.1 run. |
We're also hitting this. Docker version 1.7.1, build 786b29d, kernel 3.18.3-031803-generic Could this bug be related to : "Note: The OverlayFS filesystem was merged into the upstream Linux kernel 3.18 and is now Docker's preferred filesystem (instead of AUFS). However, there is a bug in OverlayFS that reports the wrong mnt_id in /proc//fdinfo/ and the wrong symlink target path for /proc//. Fortunately, these bugs have been fixed in the kernel v4.2-rc2. See below for instructions on how to apply the relevant patches." From the CRIU project @ http://criu.org/Docker ? They also have a number of other recommended kernel patches |
I'm also hitting this with Docker 1.8.1 and a 4.1 kernel. Relevant
|
We went up to 1.8.1 and kernel 4.2, touch wood, it seems like the problem is solved or at last vastly mitigated. |
@joelacrisp Thank you for the information! I'll figure out a way to get 4.2 on my boxes and see if that fixes things as well. |
If they're ubuntu there is a back-ported 4.2 kernel from the official repos. |
So this is kernel bug, not docker bug? |
I am still hitting this with a 4.2.0 kernel, and I tried docker 1.8.2 as well. In my case, reverting to docker 1.7.1 makes the bug go away.
|
@pwnall thanks for your report. what your backend filesystem, such as ext4 or xfs? you env is fedora 22, do you have testing another os on this case? |
@xiaods Here's my environment. My backend is ext4. Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.6-201.fc22.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 4
Total Memory: 7.797 GiB If there's interest in fixing this, I can spend some time narrowing things down, e.g. building docker from source and bisecting the commits between 1.7.1 and 1.8.1. |
@xiaods Also, I have a Vagrantfile + Ansible scripts that reliably build a VM where this issue reproduces. |
Kernel Version: 4.1.6-201.fc22.x86_64 don't know kernel 4.2 can resolve it. @pwnall do you have some env for kernel 4.2 + docker 1.8.1 testing.
also came info this issues on ops some containers |
@xiaods Yup, I have Ansible playbooks for deploying docker 1.8.2 from fedora 22 testing and kernel 4.2 from fedora 23. In this case, I ran into some SElinux issues first. After setting SElinux in permissive mode, I'm back to the bug here. FWIW, I tried wiping /var/lib/docker before doing my test, and it doesn't change anything. [vagrant@localhost ~]$ sudo docker info
Containers: 6
Images: 18
Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.0-300.fc23.x86_64
Operating System: Fedora 22 (Twenty Two)
CPUs: 1
Total Memory: 993.4 MiB
Name: localhost.localdomain
ID: GPHJ:3XDJ:WMEQ:CJ67:NPL5:VPTZ:G2TI:GPGP:4GJS:DZIM:FJRC:5DVE
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
[vagrant@localhost ~]$ sudo docker version
Client:
Version: 1.8.2-fc22
API version: 1.20
Package Version: docker-1.8.2-1.gitf1db8f2.fc22.x86_64
Go version: go1.4.2
Git commit: f1db8f2/1.8.2
Built:
OS/Arch: linux/amd64
Server:
Version: 1.8.2-fc22
API version: 1.20
Package Version:
Go version: go1.4.2
Git commit: f1db8f2/1.8.2
Built:
OS/Arch: linux/amd64 |
+1
|
Has anyone had any luck removing these dead containers without having to upgrade/downgrade their Docker/kernel version? Seeing this on ...
|
I'm seeing this as well on CoreOS:
Then I tried |
@pwnall do you have a reproducible case of for this bug? I am trying to reproduce with ...
|
@dmcgowan thank you very much for looking into this! I'm trying to build a VM with updated software now. I'll make a new post once I have a repro. |
@dmcgowan is this issue fixed or not? |
@tjlee still looking for a reproducible case. There was one fix #18907 which we expected to solve the original report but there may be other causes or issues users are running into. If you are able to reliably reproduce then please send your |
@dmcgowan Before tests:
After tests:
We run this configuration about 30 times per day then we set up another agent. But sometimes it fails with error mentioned earlier and blocks all the process. FYI:
|
Same here in a Jenkins slave that starts and stops a lot of compositions in parallel on the same machine.
|
For anyone encountering this issue on docker for mac, as implied by the error message, my /var/lib/docker folder just did not exist. Solution was simply |
I can confirm that creating the directory on the mac solves the issue. |
@JustinLivi @jeffdupont you created the directory on the mac ? I'm not sure how that could help, because the daemon runs inside a HyperKit VM, so on a different filesystem. |
Yes I was surprised as well that this solution worked. I know nearly
nothing about the internals of what could have caused the issue, but the
error message to me implied that it was a volumes issue where a host
directory was expected to exist. Creating the directory on the host machine
solved it for me.
…On Dec 21, 2016 8:27 PM, "Sebastiaan van Stijn" ***@***.***> wrote:
@JustinLivi <https://github.com/JustinLivi> @jeffdupont
<https://github.com/jeffdupont> you created the directory on the *mac* ?
I'm not sure how that could help, because the daemon runs inside a HyperKit
VM, so on a different filesystem.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14474 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACWkt390-_c9nHKG3gJmOt7pfl9Z3WRaks5rKdH4gaJpZM4FUmUu>
.
|
Didn't make sense to me either. Only tried it because it was the last
comment on the thread lol
On Wed, Dec 21, 2016 at 8:36 PM Justin Livi <notifications@github.com>
wrote:
… Yes I was surprised as well that this solution worked. I know nearly
nothing about the internals of what could have caused the issue, but the
error message to me implied that it was a volumes issue where a host
directory was expected to exist. Creating the directory on the host machine
solved it for me.
On Dec 21, 2016 8:27 PM, "Sebastiaan van Stijn" ***@***.***>
wrote:
> @JustinLivi <https://github.com/JustinLivi> @jeffdupont
> <https://github.com/jeffdupont> you created the directory on the *mac* ?
> I'm not sure how that could help, because the daemon runs inside a
HyperKit
> VM, so on a different filesystem.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#14474 (comment)>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/ACWkt390-_c9nHKG3gJmOt7pfl9Z3WRaks5rKdH4gaJpZM4FUmUu
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14474 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AANH-IIKvt7jlLEz88dCoyPAv8kzEjKXks5rKfAvgaJpZM4FUmUu>
.
|
close it. |
I am having the same problem on CentOS7 + docker 1.13. So far, my work around is to restart docker engine before do I am using /usr/local/docker for docker graph. I just saw @JustinLivi 's comment about |
OK
|
First, let me just say I consider this an issue we need to fix with Docker. Best thing to do in the short term is to track down anything that might be doing some as stated above. The issue you mention though is not related to the issue posted here. |
I am consistently seeing this issue when using Related issue: kubernetes/minikube#1130 |
@philipn I'm not sure how minikube works, but recently saw someone get this error while trying to share image dirs between two docker daemons. |
@cpuguy83 That's a good lead, thanks for that. It looks like |
So what happens here is Docker calls @philipn Is this specifically happening with overlay for you? |
@cpuguy83 Unfortunately, the only storage drivers available on the minikube VM are overlay and vfs. I tried out vfs but it was prohibitive to testing (e.g. a build took 20 hours on our dev stack). |
@soichih having the same issue on CentOS7 too. Did you figure it out? Cheers. |
Another CentOS 7 users. docker versionClient: Server: |
@matejzero |
I noticed that if I do: then I don't get an error. If I rebuild the container with docker-composer: it gives me an error. |
It seems it still happens. $ docker -v $ docker rm -f 83326 $ docker ps -a $ uname -a $ lsb_release -a /var/lib/docker# ls |
Closing since the OP's issue is fixed by https://github.com/moby/moby/pull/31012/files#diff-723dfe6d49672e6220c7b87d40f7fdd6R24 and is in 17.06. |
Indeed, it looks like there has been only partial cleanup of this container:
The text was updated successfully, but these errors were encountered: