Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent masquerading failure caused by floating docker bridge MAC #14908

Closed
awh opened this issue Jul 23, 2015 · 5 comments
Closed

Intermittent masquerading failure caused by floating docker bridge MAC #14908

awh opened this issue Jul 23, 2015 · 5 comments

Comments

@awh
Copy link

awh commented Jul 23, 2015

Environment

$ docker version
Client version: 1.7.0
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 0baf609
OS/Arch (client): linux/amd64
Server version: 1.7.0
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 0baf609
OS/Arch (server): linux/amd64
DOCKER_OPTS="-H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375 --userland-proxy=false"

Problem

Whilst investigating weaveworks/weave#1171 I have uncovered a subtle problem involving the interplay between Linux bridge semantics, docker -p installed MASQUERADE rules and netfilter connection tracking. Refer to this comment for a detailed description, but essentially the failure to explicitly set the docker0 MAC address can under certain circumstances cause abnormal deletion of conntrack flows from the host, ultimately resulting in connections to published services being reset intermittently.

Symptoms

The following symptoms may be observed randomly on docker hosts at times when containers are being started:

  • Loss of TCP and UDP traffic involving ports published to the host
  • Spurious TCP resets (RST) causing connection flapping
  • Egress from the host of IP traffic with unroutable source addresses

Solution

Explicitly setting the docker bridge hardware address disables the floating MAC behaviour, presenting a stable default gateway MAC to application containers thus eliminating the circumstance which can cause the host kernel to prematurely drop conntrack flows.

@awh
Copy link
Author

awh commented Jul 28, 2015

@rade has determined that this bug affects 1.7.0 only:

The plot thickens... docker 1.6.2 sets the bridge MAC with ioctl. MAC. So does 1.7.1. But 1.7.0 uses some netlink magic. Which evidently doesn't work. Though I see no evidence that docker were aware of the issue we are seeing; they went back to ioctl for different reasons.

@mavenugo
Copy link
Contributor

@awh Thanks for bringing this to our attention. We brought back the ioctl calls for a few cases due to #14024. But the ioctl calls are causing other issues : #14738 and we are investigating the kernel issues relating to these. We will keep this issue in mind when we resolve #14738 in 1.8.0.

@awh
Copy link
Author

awh commented Jul 28, 2015

We will keep this issue in mind when we resolve #14738 in 1.8.0.

@mavenugo thankyou - in the meantime we're going to work around the problem for our users by implementing weaveworks/weave#1229. Any resolution to #14738 must yield the following:

$ cat /sys/class/net/docker0/addr_assign_type
3

to avoid this issue recurring; the following blog post explains the root cause in detail.

@thaJeztah
Copy link
Member

@mavenugo @awh not really read into this, but is this also resolved by #15185 ?

@mavenugo
Copy link
Contributor

@thaJeztah Yep. this is resolved via #14908. Closing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants