-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error starting containers in 1.7.1 - could not find bridge docker0: no such network interface #14738
Comments
This is the results from launching a stopped container via the CLI
|
Same problem here on two servers BUG REPORT INFORMATIONDescription of problem: Randomly, Docker cannot start a container with an error message similar to this one:
Try to start enough time will solve the issue until next time.
Environment details (AWS, VirtualBox, physical, etc.): Physical server in both cases. How reproducible: Randomly, seems to happen more often with the number of running containers growing. Steps to Reproduce:
Actual Results: At one point a container cannot be started with the Expected Results: Container should start Additional info: Didn't happen on 1.7.0. Also happens on 1.8.0-dev
|
@BenHall @ggtools I tried the same test on my single cpu Virtualbox Ubuntu VM on my laptop
and a DO 2 cpu Ubuntu VM
and started many containers in both and could not reproduce this problem. What exact configuration in terms of CPU etc. are you guys running? And it is surprising that this kind of issue is popping up because there is no way the daemon would have started if the docker0 bridge is not present and the daemon could not create it. |
Hello, As requested. This is a physical box, where are CPU / network details. Looking at the Weave ticket and my Docker use-case, it might be something to do with load / number of containers running (weaveworks/weave#1188)
|
Server A
Server B
|
I just had the same problem with 1.7.1. After downgrading to 1.7.0 the problem disappeared. Looking at the changes between 1.7.0 and 1.7.1 (0baf609...786b29d) i believe commit 34815f3 is likely to blame as it is the only bridge related code that has been changed between the releases. |
I'm also observing the same problem with 1.7.1. docker0 definitely existed, and I was creating containers through the API. The containers created eventually die of their own accord, and are removed through the API. docker0 had 17 adapters according to brctl, and ifconfig confirmed those adapters were still around, but there were only 8 running containers. After a restart of the docker daemon, and with no running containers, docker0 still had 7 veth devices in it. I rebooted, and started the same containers that were running originally, and brctl showed 8 adapters. So it looks like we're also leaking veth devices. Here is my system info:
Here are my create options:
|
@bprodoehl When you said you restarted docker daemon how did you restart it? Did you kill it or did you just use |
@mrjana |
I also appear to be leaking a lot of interfaces - https://gist.github.com/BenHall/4a4e42575dd29d7b669b Box only has 9 containers running |
I'm experiencing the same problem. |
@BenHall @ggtools @pospispa we added a few fixes for 1.7.1 to solve centos/RHEL 6.6 issues reported under #14024 by replacing some of the unsupported netlink calls with ioctl. Since am are unable to reproduce the issue and the issue being inconsistent & basic (existing docker0 bridge interface is not returned in the netlink call on some cases), we feel that it could be some kernel issue got exposed by these changes. I added a quick fix in 1.7.1 branch to confirm the above theory. Would you be willing to test a docker binary (based on 1.7.1) which contains a possible fix (https://gist.githubusercontent.com/mavenugo/b68e24be97eeaf9d0eef/raw/ba48905331ed367589d00e91beb1ff817ab73d69/gistify818322.java) for this issue. |
@mavenugo sure, where's the build located? The issue increased in occurrences until we rebooted the server. |
@BenHall Thanks. uploaded it to box. https://app.box.com/s/74nbptdb58ff00krilwjxqkvrcg2pek1 |
I had the same issue, and after replacing my docker 1.7.1 with the above executable, I can now bring up new containers reliably without rebooting the server. Will continue to monitor. # docker info
docker info 1 ↵
Containers: 15
Images: 259
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 289
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.0-0.bpo.4-amd64
Operating System: Debian GNU/Linux 7 (wheezy)
CPUs: 2
Total Memory: 1.958 GiB
Name: ######
ID: T3EC:GIZT:AJWN:NEVX:DUBS:HGHQ:FE74:HMUZ:HT4I:AANJ:IRWN:3V6X
WARNING: No memory limit support
WARNING: No swap limit support
# uname -a
Linux bean.raywalker.it 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt11-1~bpo70+1 (2015-06-08) x86_64 GNU/Linux |
I've deployed it and will let you know. Will this fix be in 1.8? |
Still having an issue but looks like a different error. I create the containers as a separate action which didn't error. This error occurred when I attempted to start the container.
|
@BenHall Thanks for the confirmation. This is a different issue and it is strange. moby/libnetwork#350. was added specifically to address the @mrjana do you have any idea ? |
Works for me with 1.7.1, build 786b29d-dirty |
@ggtools thanks for the confirmation. As I mentioned, the fix : https://gist.githubusercontent.com/mavenugo/b68e24be97eeaf9d0eef/raw/ba48905331ed367589d00e91beb1ff817ab73d69/gistify818322.java is nothing more than trying to use netlink API first to create and program the bridge and in case of failure, fallback to ioctl call. |
@mavenugo yes I noticed and as you may noticed if this is a kernel bug this will affect both 3.13.0 & 3.16.0, at least the Unbuntu flavor |
Just for added confirmation - I was seeing this same error intermittently when trying to execute about 10 containers quickly (less than 500 ms) through the API running 1.7.1. So far, build 786b29d-dirty has fixed it for me. I even upped the executions an order of magnitude and so far so good. |
@joestubbs thanks for the additional confirmation. We will try and get this in for 1.8.0. |
@BenHall @mountkin helped fixing a possible leak issue moby/libnetwork#419. This fix + mine might help most of us here (including your issue). I would like to get these issues resolved asap & make it part of 1.8 RC, which you can try out-of-hours and give feedback. WDYT ? |
@mavenugo Sure, sounds good. Can you link me to the build which you would like me to deploy... |
@BenHall if it helps, I can provide another private image with mine & moby/libnetwork#419 in place which you can try (after cleaning up the existing leaked veths). That will help a great deal. Can you make yourselves available in #docker-network IRC channel so that we can debug this live ? |
As seen in moby/moby#14738 there is general instability in the later kernels under race conditions when ioctl calls are used in parallel with netlink calls for various operations. (We are yet to narrow down to the exact root-cause on the kernel). For those older kernels which doesnt support some of the netlink APIs, we can fallback to using ioctl calls. Hence bringing back the original code that used netlink (moby#349). Also, there was an existing bug in bridge creation using netlink which was setting bridge mac during bridge creation. That operation is not supported in the netlink library (and doesnt throw an error either). Included a fix for that condition by setting the bridge mac after creating the bridge. Signed-off-by: Madhu Venugopal <madhu@docker.com>
As seen in moby/moby#14738 there is general instability in the later kernels under race conditions when ioctl calls are used in parallel with netlink calls for various operations. (We are yet to narrow down to the exact root-cause on the kernel). For those older kernels which doesnt support some of the netlink APIs, we can fallback to using ioctl calls. Hence bringing back the original code that used netlink (moby#349). Also, there was an existing bug in bridge creation using netlink which was setting bridge mac during bridge creation. That operation is not supported in the netlink library (and doesnt throw an error either). Included a fix for that condition by setting the bridge mac after creating the bridge. Signed-off-by: Madhu Venugopal <madhu@docker.com>
As seen in moby/moby#14738 there is general instability in the later kernels under race conditions when ioctl calls are used in parallel with netlink calls for various operations. (We are yet to narrow down to the exact root-cause on the kernel). For those older kernels which doesnt support some of the netlink APIs, we can fallback to using ioctl calls. Hence bringing back the original code that used netlink (moby#349). Also, there was an existing bug in bridge creation using netlink which was setting bridge mac during bridge creation. That operation is not supported in the netlink library (and doesnt throw an error either). Included a fix for that condition by setting the bridge mac after creating the bridge. Signed-off-by: Madhu Venugopal <madhu@docker.com>
fixed in #15185 |
@BenHall both the veth leak and |
As seen in moby#14738 there is general instability in the later kernels under race conditions when ioctl calls are used in parallel with netlink calls for various operations. (We are yet to narrow down to the exact root-cause on the kernel). For those older kernels which doesnt support some of the netlink APIs, we can fallback to using ioctl calls. Hence bringing back the original code that used netlink (moby/libnetwork#349). Also, there was an existing bug in bridge creation using netlink which was setting bridge mac during bridge creation. That operation is not supported in the netlink library (and doesnt throw an error either). Included a fix for that condition by setting the bridge mac after creating the bridge. Signed-off-by: Madhu Venugopal <madhu@docker.com>
Hello,
Since upgrading to 1.7.1 I'm getting two new errors when I launch containers. The containers are being launched via the API, no code changed in-between the releases and these errors didn't occur in 1.7.0.
Sometimes the containers launch successfully, other times they result in the error below. The docker0 interface does exist.
Any suggestions?
Ben
The network settings for the container looks like this:
Machine information:
The text was updated successfully, but these errors were encountered: