Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error initializing network controller: list bridge addresses failed: no available network #41525

Closed
jhaprins opened this issue Oct 4, 2020 · 3 comments · Fixed by #42598
Closed

Comments

@jhaprins
Copy link

jhaprins commented Oct 4, 2020

Docker version 19.03.13, build 4484c46d9d

Hello,

Today a colleague of mine asked me if I had changed something on the network because his Docker configuration was suddenly giving a lot of problems. At first I did not know what he was talking about but after some questions, it slowly became clear to me that he had problems starting his docker environment when his VPN connection to the office was online. I looked at the error message that he received and I saw the following: "Error initializing network controller: list bridge addresses failed: no available network". This was very strange because the network he had configured in his daemon.yaml looked like this:
{
"default-address-pools":
[
{"base":"10.10.0.0/16","size":24}
]
}

In our corporate network we have a lot of RFC1918 networks, a few in the 10.x.x.x/8 range, a lot in the 172.16.0.0/12 and 192.168.0.0/16 ranges. But nothing that collides with above ranges, and even if something would collide, it was all local on his workstation where he was developing and testing some monitoring systems, and he is completely free to use whatever network he wants to use locally, as long as he doesn't interfere with the corporate network. On the VPN router I have a default set of routes set for RFC1918 networks pointing towards the corporate routers, so everyone can reach the internal corporate networks without having to worry about anything. The firewalls will take care of the rest.

I started debugging the error message and did some Google searches and I found a lot of people complaining about exactly this same problem. Some example tickets:
docker/for-linux#123
#35121
#33925 (this ticket, currently open, so I leave a comment here.)

At first the error didn't make any sense to me because:

  • a network is available
  • the configured network is not directly connected so docker is not able to say that it should not use it.
  • even if an overlapping network is used somewhere else, a more specific route would be configured locally and this should prevent any routing issues.

But then I thought about something. What if the docker code, searching for free networks, takes the local routing table and checks the configured network against EVERY route in the routing table. If something matches or overlaps the route in the routing table it gives this error. At first I thought that this couldn't be true because this would always fail because a default route of 0.0.0.0/0 would always match. But what if this default route is filtered out in the code for this specific reason. Then this hypothesis could be the truth.

I started testing locally on my own system, first I reproduced the error:

  • Setup my docker daemon with the same configuration
  • Had my normal local routing table without VPN.
  • Started docker and this worked fine.

The resulting routing table:
default via 192.168.178.1 dev enp62s0u1u1 proto static metric 1024
10.10.0.0/24 dev docker0 proto kernel scope link src 10.10.0.1 linkdown
192.168.178.0/24 dev enp62s0u1u1 proto kernel scope link src 192.168.178.74 metric 100

Then I started my VPN. The result was 3 extra routes:
10.0.0.0/8 via 192.168.2.1 dev tap0 proto static metric 50
172.16.0.0/12 via 192.168.2.1 dev tap0 proto static metric 50
192.168.0.0/16 via 192.168.2.1 dev tap0 proto static metric 50

I then stopped my docker daemon and tried to start it again, and indeed I received the same error. So I could reproduce the problem, now for my hypothesis: "Does the code check EVERY route in the routing table, filtering out the default route."

To test this I did the following:
I removed the default route and replaced it by 2 more specific routes that are together the whole internet:
0.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1
128.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1

My routing table then looks like this:
0.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1
128.0.0.0/1 via 192.168.178.1 dev enp62s0u1u1
192.168.178.0/24 dev enp62s0u1u1 proto kernel scope link src 192.168.178.74 metric 100

The only difference between this state and a clean state of my system, is not having a default route, but having two routes that are together the default route of my system. Now I tried to start the docker daemon again. If the daemon starts fine my hypothesis is wrong and I have to continue my search. If the daemon fails then my hypothesis must me correct because the default route is the only difference in my local configuration.

And indeed, I received the same error again. Now I'm sure there is absolutely no reason to give this error because:

  • I don't have the 10.10.0.0/16 network anywhere in my home network
  • I have a routing table that only routes for 192.168.178.0/24 and the internet

This also proves my hypothesis that every route in the routing table is being checked against the configured network, filtering out the default route. If any route matches the configured network, the configuration is rejected.

This is a bug in the docker code. The code should be changed to only match routes with "scope link" because these routes are directly connected and would be a problem when you start a docker daemon with an overlapping network configuration. Any route that is not "scope link" should be ignored because those routes could be:

  • Injected by DHCP
  • Injected by a routing protocol
  • Injected by a VPN config.
  • Less specific behind a router somewhere remote

There is one corner case where you could give a warning or maybe an error. This is when there is an equal or more specific route that is not "scope link". Because this could result in routing issues to other systems. But even then, I would make it configurable because it could very well be that this is intentional.

I'm not a developer but a network and systems engineer, so I am not able at the moment to provide a patch for this problem, but one of my colleagues thought that he had already found some parts of the code. So maybe ........

The version I have tested this with is: Docker version 19.03.13, build 4484c46d9d

Cheers,
Jan Hugo Prins

Originally posted by @jhaprins in #33925 (comment)

@rasmuspeders1
Copy link

I have also run into this issue.
Running Ubuntu 20.04 with included docker 19.3

I have configured a custom subnet for the docker_gwbridge with "default-address-pools": [{"base":"xx.xx.xx.xx/24","size":24}] in daemon.json

This works fine on the first startup on docker when the docker_gwbridge interface not present yet.

But when the server is restarted it breaks.

The routing entry for docker_gwbridge is present on bootup before docker is started.
I can work around this issue by using ip route del xx.xx.xx.xx/24 before starting docker and then re-adding the route after docker is started.

To me this looks like the feature of running with a custom subnet on docker_gwbridge is basically broken.

@akerouanton
Copy link
Member

akerouanton commented Oct 12, 2023

I'm reopening this issue as the fix proposed in #42598 doesn't do what the PR submitter thought it would. See #46630.

@akerouanton
Copy link
Member

We rediscussed this issue during libnet maintainers call today and we all agree that the current set of heuristics is imperfect but good enough for the time being. However, we might revisit these, or the ability for users to influence them at a later time. Thus, I'm going to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants