-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial ipv6 / iptables work #2147
Conversation
Does this make sense before Docker supports IPv6? |
} | ||
|
||
type Protocol bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use int or byte rather than bool - bool will just encourage implicit truth testing.
I don't see any obvious problems here, but there are a lot of unknowns (to me anyway). Disabling masquerade sounds OK, but I have no idea what that does to egress in v6 environments. (e.g. will the fabric egress traffic with a source IP that isn't "officially known" ? I know GCE won't (for v4, no v6 support). I don't know v6 well, and so can't visually vet the iptables changes without testing. Do we need v6 for service portals? Those IPs never hit the wire anyway... |
Here's my patch for docker ipv6 support: moby/moby#8896 Hopefully soon! But I'm trying to do it concurrently in case we want more features in docker itself. The idea is that we won't actually route IPv6 outside of our k8s cluster, on EC2. (Unless EC2 adds support for "real" IPv6 in a week or two ;-) Instead, we'll do something like NAT to get to the IPv4 internet from IPv6. NAT for IPv6, either means running a http proxy (for really locked down environments) or running something like NAT64 (I tried this with TAYGA and it worked, though I hope we can find something even better!) If we're on a machine which actually supports real IPv6 (so e.g. the host has a /64), then egress (and ingress!) will work. You often have to respond to neighbor-discovery requests, which you do like this: http://www.ipsidixit.net/2010/03/24/239/ I imagine we'll add this into docker for "real" ipv6. For inbound traffic on EC2, we will have to continue to listen on an IPv4 address, so that EC2 can talk to it. Most likely scenario is that we have nginx/haproxy running in a pod, which then forwards to the correct backend services (over IPv6). For service portals, IPv4 vs IPv6 doesn't really matter, but I think IPv6 not only gives us more IP addresses but also means that the pods could be IPv6 only. I think it's also likely that we end up with each Docker instance having a 172.16.x.x IPv4 and a routable IPv6 address. Early, but I think this is a good way to explore the option (?). Let me know if you'd rather switch to a different medium. |
I see -- you are working the entire stack :) I'd love to make sure that this works on top of GCE -- which also, unhappily, doesn't currently have IPv6 support. If it'll work on EC2 we can probably make it work on GCE too. |
fair enough. I don't see any problems with it, but I have no way to test it or know that it stays working. Maybe we can get some e2e support for ipv6 |
I'm definitely pushing forwards on multiple fronts here ;-)... I agree this needs e2e tests! I was able to launch a Docker instance on a k8s-minion on EC2 with an IPv6 address using |
We'd be happy to get your EC2 scripts checked in. |
Awesome - I will tidy up the EC2 scripts a little and get them pushed to a branch / PR. |
It uses a set (via a map) of allocated IPs
LGTM, merging. |
Initial ipv6 / iptables work
|
||
// Try randomly first | ||
for i := 0; i < ipa.randomAttempts; i++ { | ||
ip := ipa.createRandomIp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the random aspect of this important or just easy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping a full bitmap is out of the question for e.g. a /64, which is what motivated the change.
This should be more efficient than a linear scan, if we expect the address space to be sparsely populated. But there are also correctness aspects:
I've had problems with IP address reuse in the past:
- where ARP or the ipv6 equivalents got confused (surmountable with unsolicited ARP and equivalents)
- where the kernel cgroups or bridge got confused (particularly with IPv6; the symptom was that attempting to assign the IPv6 address to the LXC instance would just fail, but after a few instance restarts / time-delay it would eventually work. I don't know if this still happens, or whether I was just doing something wrong.)
Also, it seems a little risky to assign an IP address immediately to the next requester, in case that is a different tenant. Having an LRU queue would probably be better.
Of course, these are real problems, and randomizing just buries them in the long-tails. We can change randomAttempts to 0 or just remove the randomizing code, to see if any of these problems still occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was specifically asking if it random is what mattered or if "don't reuse"
is what matters. I agree with the latter, the former causes me a very
small but of angst wrt static addresses for cluster services like DNS.
On Mon, Nov 17, 2014 at 9:42 AM, Justin Santa Barbara <
notifications@github.com> wrote:
In pkg/registry/service/ip_allocator.go:
nextBit, err := ffs(freeMask)
if err != nil {
// If this happens, something really weird is going on.
glog.Errorf("ffs(%#x) had an unexpected error: %s", freeMask, err)
return nil, err
}
ipa.used[i] |= 1 << nextBit
offset := (i \* 8) + int(nextBit)
ip := ipAdd(ipa.subnet.IP, offset)
- if int64(ipa.used.Size()) == ipa.ipSpaceSize {
return nil, fmt.Errorf("can't find a free IP in %s", ipa.subnet)
- }
- // Try randomly first
- for i := 0; i < ipa.randomAttempts; i++ {
ip := ipa.createRandomIp()
Keeping a full bitmap is out of the question for e.g. a /64, which is what
motivated the change.This should be more efficient than a linear scan, if we expect the address
space to be sparsely populated. But there are also correctness aspects:I've had problems with IP address reuse in the past:
- where ARP or the ipv6 equivalents got confused (surmountable with
unsolicited ARP and equivalents)- where the kernel cgroups or bridge got confused (particularly with
IPv6; the symptom was that attempting to assign the IPv6 address to the LXC
instance would just fail, but after a few instance restarts / time-delay it
would eventually work. I don't know if this still happens, or whether I was
just doing something wrong.)Also, it seems a little risky to assign an IP address immediately to the
next requester, in case that is a different tenant. Having an LRU queue
would probably be better.Of course, these are real problems, and randomizing just buries them in
the long-tails. We can change randomAttempts to 0 or just remove the
randomizing code, to see if any of these problems still occur.Reply to this email directly or view it on GitHub
https://github.com/GoogleCloudPlatform/kubernetes/pull/2147/files#r20450818
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - gotcha! Yes, random is just a cheap-and-cheerful way of implementing (probably) don't-reuse
The only other thing is that random also avoids trivially disclosing how many other instances are running, which is important in some shared environments.
More for discussion than for actual merging (for now).
I think IPv6 could solve the address allocation problem that seems to be holding k8s back on EC2. EC2 "tolerates" IPv6 for internal networking by using protocol 41 encapsulation.
I'm currently working on getting cluster/kube-up.sh to work, but in the meantime feedback on this idea would be helpful!