New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Native Docker Multi-Host Networking #8951
Comments
This sounds good. What I am not seeing is the API and performance. How does one go about setting this up? How much does it hurt performance? One of the things we are trying to do in GCE is drive container network perf -> native. veth is awful from a perf perspective. We're working on networking (what you call underlay) without veth and a vbridge at all. |
I like the idea of underlay networking in Docker. The first question is: how much can be bundled by default? Does an ovs+vxlan solution make sense as a default, in replacement of veth + regular bridge? Or should they be reserved for opt-in plugins? @thockin do you have opinions on the best system mechanism to use? |
What exactly do you mean by "system mechanism" ? |
vxlan vs pcap/userland encapsulation vs nat with netfilter vs veth/bridge vs macvlan... use ovs by default vs. keep it out of the core.. Things like that. |
Ah. My experience is somewhat limited. Google has made good use of OVS internally. veth pair performance is awful and unlikely to get better. I have not plain with macvlan, but I understand it is ~wire speed, but a bit awkward to use. We have a patch cooking that fills the need for macvlan-like perf without actually being VLAN (more like old-skool eth0:0 aliases). If we're going to pick a default, I don't think OVS is the worst choice - it can't be worse perf than veth. But it's maybe more dependency heavy? Not sure. |
@thockin @shykes Thanks for the comments. OVS provides the flexibility of using VXLAN for overlay deployments or native network integration for underlay deployments without sacrificing performance or scale. I haven't done much work with macvlan to give an answer on how it stacks up to an overall solution that includes functionality, manageability, performance, scale and network operations. We believe that Native Docker networking solution should be flexible enough to accommodate L2, L3 and Overlay network architectures. |
Hi Madhu, Dave and Team: Definitely a wholesome view of the problem. Thanks for putting it out there. Few questions and comments (on both proposals [0] and [1], as they tie into each other quite a bit): Comments and Questions on proposal on Native-Docker Multi-Host Networking: [a] OVS Integration: The proposal is to natively instantiate ovs from docker is good.
[b] [c] [e] [f] Comments and Questions on proposal on ‘Network Drivers’: [g] Multiple-vNICs inside a container: Are the APIs proposed here (CreatePort) handle creation of multiple vNICs inside a container? [h] Update to Network configuration: Say a bridge is added with a VXLAN-VNID or a VLAN, would your suggestion be to call ‘InitBridge’ or be done during PortCreate() if the VLAN/tunnel/other-parameters-needed-for-port-create does not exist. [j] Driver API performance/scale requirements: It would be good to state an upfront design target for scale/performance. As always, will be happy to collaborate on this with you and other developers. Cheers, |
@thockin on the macvlan performance, are there any published figures? from an underlay integration standpoint, I'd imagine that having a bridge would be much easier to manage as you could trunk all vlans to the vswitch and place the container port in the appropriate vlan.... otherwise with a load of mac addresses loose on your underlay you'd need to configure your underlay edge switches to apply a vlan based on a mac address (which won't be known in advance). I feel like i'm missing something though so please feel free to correct me if i haven't quite grokked the macvlan use case |
@jainvipin thanks for the mega feedback. I think the answer to a lot of your questions lies in these simple statements. I firmly believe that all network configuration should be done natively, as a part of Docker. I also believe that Orchestration systems populating netns and/or bridge details on the host, then asking Docker to plumb this in to the container doesn't seem right to me. I'd much rather see orchestration systems converge on, or create a driver in this framework (or one like it) that does the necessary configuration in Docker itself. For multi-host, the Network Driver API will be extended to support the required primitives for programming the dataplane. This could take the form of OF datapath programming in the case of OVS, but it could also be adding plain old ip routes in the kernel. This is really up to the driver. To that end, all of the improvements we're suggesting here for multi-host designed to be agnostic to the backend used to deliver them. |
The caveat here is that Docker can not be everything to everyone, and the Having networking be externalized with a clean plugin interface (i.e. exec) On Tue, Nov 4, 2014 at 6:44 PM, Dave Tucker notifications@github.com
|
@dave-tucker There are trade-offs of pulling everything (management, data-plane, and control-plane) in docker. While you highlighted the advantages (and I agree with some as indicated in my comment), I was noting a few disadvantages (versioning/compatibility, inefficiency, docker performance, etc.) so we can weigh it better. This is based on my understanding of things reading the proposal (no experimentation yet). In contrast, if we can incorporate a small change (#8216) in docker, it can perhaps give scheduler/orchestrator/controller a good way to spawn the containers while allowing them to do networking related things themselves, and not have to move all networking natively inside docker – IMHO a good balance for what the pain point is and yet not make docker very heavy. 'docker run' has about 20-25 options now, some of them further provides more options (e.g. ‘-a’, or ‘—security-opt’). I don’t think it will remain 25 in near/short term, and likely grow rapidly to make it a flat unstructured set. The growth would come from valid use-cases (networking or non-networking), but must we consider solving that problem here in this proposal? I think libswarm can work with either of the two models, where an orchestrator has to play a role of spawning ‘swarmd’ with appropriate network glue points. |
What is about weave (https://github.com/zettio/weave)? Weave provides a very convenient SDN solution for Docker from my point of view. And it provides encryption out of the box, which is a true plus. And it is the only solution with out-of-the-box encryption so far, we have found on the open source market. Nevertheless weaves impact to network performance in HTTP based and REST-like protocols is substantial. About 30% performance loss for small message sizes (< 1000 byte) and up to 70% performance loss for big message sizes (> 200.000 bytes). Performance losses were measured for the indicators time per request, transfer rate and requests per second using apachebench against a simple ping-pong system exchanging data using a HTTP based REST-like protocol. We are writing a paper for the next CLOSER conference to present our performance results. There are some options to optimize weave performance (e.g. not containerizing the weave router should bring 10% to 15% performance plus according to our data). |
@thockin absolutely we will need to couple this with a plugin architecture. See #8968 for first steps in that direction :) At the same time, Docker will always have a default. Ideally that default should be enough for 80% of use cases , with plugins as a solution for the rest. When I ask about ovs as a viable default, it's in the context of this "batteries included but removable" model. |
Ping @erikh |
@dave-tucker, @mavenugo and @nerdalert (and indeed @ everyone else): It's really exciting to see this proposal for Docker! The lack of multi-host networking has been a glaring gap in Docker's solution for a while now. I just want to quickly propose an alternative, lighter-weight model that my colleagues and I have been working on. The OVS approach proposed here is great if it's necessary to put containers in layer 2 broadcast domains, but it's not immediately clear to me that this will be necessary for the majority of containerized workloads. An alternative approach is pursue network virtualization at Layer 3. A good reference example is Project Calico. This approach uses BGP and ACLs to route traffic between endpoints (in this case containers). This is a much lighter-weight approach, so long as you can accept certain limitations: IP only, and no IP address overlap. Both of these feel like extremely reasonable limitations for a default Docker case. We've prototyped Calico's approach with Docker, and it works perfectly, so the approach is simple to implement for Docker. Docker is in a unique position to take advantage of lighter-weight approaches to virtual networking because it doesn't have the legacy weight of hypervisor approaches. It would be a shame to simply follow the path laid by hypervisors without evaluating alternative approaches. (NB: I spotted #8952 and will comment there as well, I'd like the Calico approach to be viable for integration with Docker regardless of whether it's the default.) |
I have some simple opinions here but they may be misguided, so please feel free to correct my assumptions. Sorry if this seems overly simplistic but plenty of this is very new to me, so I’ll focus on how I think this should fit into docker instead. I’m not entirely sure what you wanted me to weigh in on @shykes, so I’m trying to cover everything from a design angle. I’ll weigh in on the nitty-gritty of the architecture after some more experimentation with openvswitch (you know, when I have a clue :). After some consideration, I think weave, or something like it, should be the default networking system in docker. While this may ruffle some feathers, we absolutely have to support the simple use case. I think it’s safe to say developers don’t care about openvswitch, they care that they can start postgres and rails and they just work together. Weave brings this capability without a lot of dependencies at the cost of performance, and it’s very possible to embed directly into docker, with some collaborative work between us and the zettio team. That said, openvswitch should definitely be available and first-class for production use (weave does not appear at a glance to be made especially demanding workloads) and ops professionals will appreciate the necessary complexity with the bonus flexibility. The socketplane guys seem extremely skilled and knowledgable with openvswitch and we should fully leverage that, standing on the shoulders of giants. In general, I am all for anything that gets rid of this iptables/veth mess we have now. The code is very brittle and racy, with tons of problems, and basically makes life for ops a lot harder than it needs to be even in trivial deployments. At the end of the day, if ops teams can’t scale docker because of a poor network implementation it simply won’t get adopted in a lot of institutions. The downside to all of this is if we execute on the above, that we have two first-class network solutions, both of which have to be meticulously maintained regularly, and devs and ops may have an impedance mismatch between dev and prod. I think that’s an acceptable trade for “it just works” on the dev side, as painful as it might end up being for docker maintainers. Ops can always create a staging environment (As they should) if they need to test network capabilities between alternatives, or help devs configure openvswitch if that’s absolutely necessary. I would like to take plugin discussion to the relevant pull requests instead of here, I think it’s distracting from the discussion. Additionally, I don’t think the people behind the work in the plugin system are not specifically focused on networking, but instead a wider goal, so the best place to have that discussion is there. I hope this was useful. :) -Erik= |
@thockin @jainvipin @shykes I just want to bring your attention to the point that this proposal tries to bring in solid foundation for network plumbing and is in no way precludes higher order orchestrators to add more value on top. I think adding more details on the API and integration will help clarify some of these concerns. From the past, we have some deep scars in approaches that lets non-native solutions dictate the basic plumbing model, leading to a crippled default behavior and it fractures the community. |
@Lukasa Please refer to a couple of important points in this proposal that exactly addresses yours : "Our experience leads us towards using similar consistency protocol such as a tenant aware BGP in order to achieve the worry free environment developers and operators desire. This also presents an evolvable architecture if a tighter coupling into the native network is of value in the future." "By extending L3 to the true edge of the network in the vSwitch it enables a proven network scale while still retaining the ability to perform disaggregated network services on the edge. Extending gateway protocols to the host will play a significant role in scaling a tight coupling to the network architecture." Please refer to #8952 which provides the details on how a driver / plugin can help in choosing appropriate networking backend. I believe that is the right place to bring the discussion on including an alternative choice of another backend that will fit best in a certain scenarios. This proposal is to explore all the multi-host networking options and exploring the Native Docker integration of those features. |
@erikh Thanks for weighing in. Is there anything specific in the proposal that leads you to believe that it will make life of the application developer more complex ? We wanted to provide a wholesome view of the Network operations & choices in a multi-host production deployment and hence the proposal description became network operations heavy. I just wanted to assure you that It will by no way expose any complexity to the application developers. One of the primary goals of Docker is to provide seamless and consistent mechanism from dev to production. Any impedance mismatch between dev and production should be discouraged. +1 to "I think it’s safe to say developers don’t care about openvswitch, they care that they can start postgres and rails and they just work together." This proposal is to bring multi-host networking Native to Docker, Transparent to Developers and Friendly to Operations. |
+1 I reckon that architecturally there are three layers here...
Crucially, 2) must make as few assumptions as possible about what docker networking looks like, such as to not artificially constrain/exclude different approaches. As a strawman for 2), how about wiring a @dave-tucker Is this broadly compatible with your thinking on #8952? |
I would like to see a simple but secure standard network solution (e.g. preventing arp spoofing. The current default config is vulnerable to this.). It should be easy to replace by something more comprehensive. And there should be an API that you can connect to your network management solution. |
I'd like to see this as a composable external tool that works well when wrapped up as a Docker plugin, but doesn't assume anything about the containers it is working with. There's no reason why this needs to be specific to Docker. This also will require service discovery and cluster communication to work effectively, which should be a pluggable layer. |
@erikh "developers don't care about openvswitch" - I agree. Our solution is designed to be totally transparent to developers such that they can deploy their rails or postgres containers safe in the knowledge that the plumbing will be taken care of. The other point of note here is that the backend doesn't have to be Open vSwitch - it could be whatever so long as it honours the API. You could theoretically have multi-host networking using this control plane, but linux bridge, iptables and whatever in the backend. We prefer OVS, the only downside being that we require "openvswitch" to be installed on the host, but we've wrapped up all the userland elements in a docker container - the kernel module is available in 3.7+ |
Hi @MalteJ, Thanks for the feedback.
|
Wanted to drop in and mention an alternative to VxLAN: GUE -> an in-kernel, L3 encap solution recently (soon to be?) merged into Linux: torvalds/linux@6106253 |
@maceip agreed with you. It seems to me that an efficient and minimal approach to networking in Docker would be using VXLAN + DOVE extensions or, even better, GUE. I'm inclined to think that OVS is too much for containers but I might be just biased. |
Given my limited experience, I don't see a compelling reason to do anything in L2 (ovs/vxlan). Is there an argument explaining why people want this? Generic UDP Encapsulation (GUE) seems to provide a simple, performant solution to this network overlay problem, and scales across various environments/providers. |
@maceip @c4milo isn't GUE super new and poorly supported in the wild? Regarding vxlan+dove, I believe OVS can be used to manage it. Do you think we would be better off hitting the kernel directly? I can see the benefits of not carrying the entire footprint of OVS if we only use a small part of it - but that should be weighed against the difficulty of writing and maintaining new code. We faced a similar tradeoff between continuing to wrap lxc, or carrying our own implementation with libcontainer. Definitely not a no-brainer either way. |
We're reopening this after some discussion with @mavenugo pointing out that our proposal is not a solution for everything in here -- and it should be much closer. We want this in docker and we don't want to communicate otherwise. So, until we can at least mostly incorporate this proposal into our new extension architecture, we will leave it open and solicit comments. |
@c4milo following is the docker-network IRC log between us regarding reopening the proposal. madhu: erikh: backjlack thanks for all the great work |
@mavenugo nice, thank you, it makes more sense now :) |
Related to VxLAN and the network "overlay" the stumbling block to implementation/deployment was always the requirement for multicast to be enabled in the network... which is rare. Last year Cumulus Networks and MetaCloud open sourced VXFLD to implement VxLAN with uni-cast and UDP. They also submitted it for consideration for consideration as a standard. MetaCloud has since been acquired by Cisco Systems. VXFLD consists of 2 components that work together to solve the BUM (Broadcast, unicast Unknown & Multicast) problem with VxLAN by using unicast instead of the traditional multicast. The 2 components are called VXSND and VXRD. VXSND provides: VXRD provides: the source for VXFLD is on Github: https://github.com/CumulusNetworks/vxfld Be sure to read the two github VXFLD directory .RST files as they describe in more detail the two daemon's for VXFLD ... VXRD and VXSND. I thought I'd mention VXFLD as it could potentially solve part of your proposal and... the code already exists. If you use debian or ubuntu Cumulus also has pre-packaged 3 .deb files for VXFLD: http://repo.cumulusnetworks.com/pool/CumulusLinux-2.2/main/vxfld-common_1.0-cl2.2~1_all.deb http://repo.cumulusnetworks.com/pool/CumulusLinux-2.2/main/vxfld-vxrd_1.0-cl2.2~1_all.deb and |
I'd like to chime in on this. I've been trying to put together a few arguments for and against doing this transparently to the user, and coming from a telco/"purist SDN" background it's hard to strike a middle ground between ease of use for small deployments and the kind of infrastructure we need to have it scale up into (and integrate with) datacenter solutions. (I'm rather partial to the OpenVSwitch approach, really, but I understand how weave and pipework can be appealing to a lot of people) So here are my notes: This is just a high-level overview of how software-defined networking might work in a Docker/Swarm/Compose environment, written largely from a devops/IaaS perspective but with a fair degree of background on datacenter/telco networking infrastructure, which is fast converging towards full SDN. There are two sides to the SDN story:
This document will focus largely on the first scenario and a set of user stories, with hints towards the second one at the bottom. Offhand, there are two possible approaches from an end-user perspective:
This is largely described in http://www.slideshare.net/adrienblind/docker-networking-basics-using-software-defined-networks already, and is what pipework was designed to do. Arguments for Keeping Things Simple (Sticking to Port Mapping)Docker's primary networking abstraction is essentially port mapping/linking, with links exposed as environment variables to the containers involved - that makes application configuration very easy, as well as lessening CLI complexity. Steering substantially away from that will shift the balance towards "full" networking, which is not necessarily the best way to go when you're focused on applications/processes rather than VMs. Some IaaS providers (like Azure) provide a single network interface by default (which is then NATed to a public IP or tied to a load balancer, etc.), so the underlying transport shouldn't require extra network interfaces to work. Arguments for Increasing Complexity (Creating Networks)Docker does not exist in a vacuum. Docker containers invariably have to talk to services hosted in more conventional infrastructure, and Docker is increasingly being used (or at least proposed) by network/datacenter vendors as a way to package and deploy fairly low-level functionality (like traffic inspection, shaping, even routing) using solutions like OpenVSwitch and custom bridges. Furthermore, containers can already see each other internally to a host - each is provided with a 172.17.0.0/16 IP address, which is accessible from other containers. Allowing users to define networks and bind containers to networks rather than solely ports may greatly simplify establishing connectivity between sets of containers. Middle GroundHowever, using Linux kernel plumbing (or OpenVSwitch) to provide Docker containers with what amount to fully-functional network interfaces implies a number of additional considerations (like messing with brctl) that may have unforeseen (and dangerous) consequences in terms of security, not to to mention the need to eventually deal with routing and ACLs (which are currently largely the host's concern). On the other hand, there is an obvious need to restrict container (outbound) traffic to some extent, and a number of additional benefits that stem from providing limited visibility onto a network segment, internal or otherwise. Minimal Requirements:There are a few requirements that seem fairly obvious:
Improvements (Step 1):
Further Improvements (Step 2):
Likely Approaches (none favored at this point):
|
You need to pre-provision each docker0 with a different subnet range. Even read this: http://blog.oddbit.com/2014/08/11/four-ways-to-connect-a-docker/ On Thu, Mar 19, 2015 at 10:24 PM, mk-qi notifications@github.com wrote:
|
@mk-qi : You can use "arping" which is essentially a utility to discover if an IP is already in use within a network. Thats the way you can make sure docker does not use the same set of IPs when its "over" multiple Hosts. |
@thockin sorry , i has not draw the picture clearly . in fact the eth0 is the slave of docker0. and as i has said before , i can ping them on each other... @shykes I saw your fork https://github.com/shykes/docker/tree/extensions/extensions/simplebridge it looks like it have ping ip operation before really assigning it, but i am not sure, i do not know whether you could give more information. |
@fzansari thanks for reply , static ip allocation is ok , in fact we had useing pipwork +macvlan( +dhcp) for some small running cluster, but if running much of containers , this is very painful to manage ip, of course we can write tools, but I think hack the docker to directly Solveing the IP conflict problem , the problem will be much simpler. If this way is Possible |
Having just implemented keepalived internally I think there would be an enormous benefit from simply implementing an interoperable vrrp protocol. It would allow docker to "play nice" without forcing it on every machine in the network. For example: Host 1 (ip address 10.0.0.1):
Host 2 (ip address 10.0.0.2: backup service)
Supporting vrrp give a very clean failover story and allows you to simply assign an IP to a service. It would take a lot to flesh out the details but I do think it would be an amazing change. |
Closing since multi-host networking, plugins, etc are all in since docker 1.9 |
Native Docker Multi-Host Networking
TL;DR Practical SDN for Docker
Authors: @dave-tucker, @mavenugo and @nerdalert.
Background
Application virtualization will have a significant impact on the future of data center networks. Compute virtualization has driven the edge of the network into the server and more specifically the virtual switch. The compute workload efficiencies derived from Docker containers will dramatically increase the density of network requirements in the server. Scaling this density will require reliable network fundamentals, while also ensuring the developer has as much or little interaction with the network as is desired.
A tightly coupled and native integration to Docker will ensure there is a base functionality that capable of integrating into the vast majority of data center network architectures today and help reduce the barriers to Docker adoption for the user. Just as important for the diverse user base, is making Docker networking dead simple for the to integrate, provision and troubleshoot.
The first step is a Native Docker Networking solution today that can handle Multi-Host environment which scales to production requirements and that works well with the existing network deployments / operations.
Problem Statement
Though there are a few existing multi-host networking solutions, they are currently designed more as over-the-top solutions on top of Docker that either:
The core of this proposal is to bring multi-host networking as a native part of Docker that handles most of the use-cases, scales and works well with the existing production network and operations. With this provided as a native Docker solution, every orchestration system can enjoy the benefits alike.
There are three ways to approach multi-host networking in docker:
NAT-based
The first option (NAT-based) works by hiding the the containers behind a Docker Host IP address. The TCP port exposed by a given Docker container is mapped to an unique port on the Host machine.
Since the mapped host port has to be unique, containers using well-known port numbers are therefore forced to use ephemeral ports. This adds complexity in network operations, network visibility, troubleshooting and deployment.
For example, the configuration of a front-end load-balancer for a DNS service hosted in a Docker cluster.
Service Address:
Servers:
If you have firewalls or IDS/IPS devices behind the load-balancer, these also need to know that the DNS service is being hosted on these devices and port numbers.
IP-based
The second option (IP-based) works by assigning unique IP-Addresses to the containers and thus avoiding the need to do Port-mapping, and solving issues with downstream load-balancers and firewalls by using well-known ports in pre-determined subnets.
However, this exposes different sets of issues.
docker run
?docker run
or another API?Proposal
We are proposing a Native Multi-Host networking solution to Docker that handles various production-grade deployment scenarios and use cases.
The power of Docker is its simplicity, yet it scales to the demands of hyper-scale deployments. The same cannot be said today for the native networking solution in Docker. This proposal aims to bridge that gap. The intent is to implement a production-ready reliable multi-host networking solutions that is native to Docker while remaining laser focused on the user friendly needs of the developers environment that is at the heart of the Docker transformation.
The new edge of the network is the vSwitch. The virtual port density that application virtualization will drive is an even larger multiplier then the explosion of virtual ports created by OS virtualization. This will create port density far beyond anything to date. In order to scale, the network cannot be seen as merely the existing physical spine/leaf 2-tier physical network architecture but also incorporate the virtual edge. Having Docker natively incorporate clear scalable architectures will avoid the all too common problem of the network blocking innovation.
Solution Components
1. Programmable vSwitch
To implement this solution we require a programmable vSwitch.
This will allow us to configure the necessary bridges, ports and tunnels to support a wide range of networking use cases.
Our initial focus will be to develop an API to implement the primitives required of the vSwitch for multi-host networking with a focus on delivering an implementation for Open vSwitch first.
This link, WHY-OVS covers the rational for choosing OVS and why it is important to the Docker ecosystem and virtual networking as a whole. Open vSwitch has a mature Kernel Data-Plane (upstream since 3.7) with a rich set of features that addresses the requirements of mult-host. In addition to the data-plane performance and functionality, Open vSwitch also has an integrated management-plane called OVSDB that abstracts the Switch as a Database for the applications to make use of.
With this proposal the native implementation in Docker will:
2. Network Integration
The various scenarios that we will deal with in this proposal range between existing Port-Mapping solution to VXLAN based Overlays to Native underlay network-integration. There are real deployment scenarios for each of these use-cases / scenarios.
Facilitate the common application HA scenario of a service needing a 1:1 NAT mapping between the container’s back-end ip-address and a front-end IP address from a routable address pool. Alternatively, the containers can also be reachable globally depending on the users IP addressing strategy.
3. Flexible Addressing / IP Address Management (IPAM)
In a multi-host environment, IP Addressing Strategy becomes crucial. Some of the Use-cases, as we will see, will also require reasonable IPAM in place. This discussion will also lead to the production-grade scale requirements of Layer2 vs Layer3 networks.
4. Host Discovery
Though it is obvious, it is important to mention the Host Discovery requirements that is inherent for any Multi-host solution. We believe that such Host/Service Discovery mechanism is a generic requirement and is not specific to the Multi-Host networking needs and as such we are backing the Docker Clustering proposal for this purpose.
5. Multi-Tenancy
Another important consideration is to provide the architectural white-space for Multi-Tenancy solutions that may either be introduced in Docker Natively or by external orchestration systems.
Single Host Network Deployment Scenarios
This is the native Single-Host Docker Networking model as of today. This is the most basic scenario that the solution that we are proposing must address seamlessly. This scenario brings in the basic Open vSwitch integration into Docker which we can build on top of for the Multi-Host scenarios that follows.
Figure - 1
This scenario adds a Flexible Addressing scheme to the basic single-host use-case where we can provide IP addressing from one of many different sources
Figure - 2
Multi Host Network Deployment Scenarios
This following scenarios enables backend Docker containers to communicate with one another across multiple hosts. This fulfills the need for high availability applications to survive beyond a single node failure.
For environments which need to abstract the physical network, overlay networks need to create a virtual datapath using supported tunneling encapsulations (VXLAN, GRE, etc). It is just as important for these networks to be as reliable and consistent as the underlying network. Our experience leads us towards using similar consistency protocol such as a tenant aware BGP in order to achieve the worry free environment developers and operators desire. This also presents an evolvable architecture if a tighter coupling into the native network is of value in the future.
The overlay datapath is provisioned between tunnel endpoints residing in the Docker host which gives the appearance of all hosts within a given provider segment being directly connected to one another as depicted in the following Diagram 3.
Figure - 3
As a new container comes online, the prefix is updated in the routing protocol announcing its location via a tunnel endpoint. As the other Docker hosts receive the updates the forwarding is installed into OVS for which tunnel endpoint the host resides. When the host is deprovisioned, the similar process occurs and tunnel endpoint Docker hosts remove the forwarding entry for the deprovisioned container.
Underlay Network Integration
The backend can also simply be bridged into a networks broadcast domain and rely on upstream networking to provide reachability. Traditional L2 bridging has significant scaling issues but it is still very common in many data centers with flat VLAN architectures to facilitate live workload migrations of their VMs.
This model is fairly critical for DC architectures that require a tight coupling of network and compute as opposed to a ships in the night design of overlays abstracting the physical network.
The underlay network integration can be designed with some specific network architecture in mind and hence we see models like Google Compute where every host is assigned a dedicated Subnet & each pod gets an ip-address from that subnet.
Figure - 4 - Dedicated one Static Subnet per Host*
The entire backend container space can be advertised into the underlying network for IP reachability. IPv6 is becoming attractive for many in this scenario due to v4 constraints.
By extending L3 to the true edge of the network in the vSwitch it enables a proven network scale while still retaining the ability to perform disaggregated network services on the edge. Extending gateway protocols to the host will play a significant role in scaling a tight coupling to the network architecture.
Alternatively, Underlay integration can also provide Flexible addressing combined with /32 host-updates to the network in order to provide the subnet flexibility.
Figure - 5
Summary
Implementing the above solution provides a flexible, scalable, multi-host networking as a native part of Docker. This implementation adds a strong networking foundation that is intent on providing an evolvable network architecture for the future.
The text was updated successfully, but these errors were encountered: