Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"host not found" for service in same overlay network #26523

Closed
michaelkrog opened this issue Sep 13, 2016 · 44 comments
Closed

"host not found" for service in same overlay network #26523

michaelkrog opened this issue Sep 13, 2016 · 44 comments

Comments

@michaelkrog
Copy link

michaelkrog commented Sep 13, 2016

Description

I am trying to setup a Docker in Swarm mode having 1 manager and 2 workers. They run in Digital Ocean's cloud and the nodes communicate via private networking.

I am consistently having issues when 2 services connected to the same overlay network tries to communicate. Sometimes the resolved ip does not hit some instances of a service and at others times the host name for a service is not at all resolvable.

Steps to reproduce the issue:

  1. Setup a Docker in Swarm mode having 1 manager and 2 worker in Digital Ocean's cloud having the nodes communicate via private networking.
  2. Setup 2 services based on fx. nginx.

Describe the results you received:
The swarm is apparently working correctly:

$ docker node ls
ID                           HOSTNAME  STATUS  AVAILABILITY  MANAGER STATUS
5yy45mhvjwe6y8p19h91kzh1w    engine2   Ready   Active        
759bq2mdovhtcdklvditk3kd4 *  engine1   Ready   Active        Leader
cnzv7b5zcoiecubzjz4vcgk0z    engine3   Ready   Active   

Also my services seems to be running just fine:

$ docker service ls
ID            NAME           REPLICAS  IMAGE                          COMMAND
179qx1ab00z0  previsto-site  1/1       codezoo/previsto-site:latest   
7wx5t6ftj2xu  proxy          1/1       codezoo/previsto-proxy:latest  

But if I ping previsto-site from within proxy I get this:

$ docker ps
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS               NAMES
97ee49ca4a1b        codezoo/previsto-proxy:latest   "nginx -g 'daemon off"   23 minutes ago      Up 23 minutes       80/tcp, 443/tcp     proxy.1.5p428z4n9jg2nuai88bqsf8jn

$ docker exec -ti proxy.1.5p428z4n9jg2nuai88bqsf8jn ping previsto-site
ping: unknown host

However, if I scale down the previsto-site service to 0 and scale it up to 1 again, then I can resolve the host name.

$ docker service scale previsto-site=0
previsto-site scaled to 0

$ docker service scale previsto-site=1
previsto-site scaled to 1

$ docker exec -ti proxy.1.5p428z4n9jg2nuai88bqsf8jn ping previsto-site
PING previsto-site (10.0.0.2): 56 data bytes
92 bytes from 97ee49ca4a1b (10.0.0.6): Destination Host Unreachable
92 bytes from 97ee49ca4a1b (10.0.0.6): Destination Host Unreachable
92 bytes from 97ee49ca4a1b (10.0.0.6): Destination Host Unreachable

Describe the results you expected:
I would expect the DNS resolving to work consistently.

Additional information you deem important (e.g. issue happens only occasionally):

Output of docker version:

Client:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:33:38 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.12.1
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   23cf638
 Built:        Thu Aug 18 05:33:38 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 8
 Running: 1
 Paused: 0
 Stopped: 7
Images: 7
Server Version: 1.12.1
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 26
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: null host bridge overlay
Swarm: active
 NodeID: 759bq2mdovhtcdklvditk3kd4
 Is Manager: true
 ClusterID: evem90a6vnmgzpgvyp0kcimrc
 Managers: 1
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 10.129.23.9
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-36-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 992.5 MiB
Name: engine1
ID: K6VW:TLMS:3G4J:2GJR:VUJQ:ZOJD:7BJD:QGFW:N7C4:2BGT:MXZO:PXXW
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

Additional environment details (AWS, VirtualBox, physical, etc.):
Digital Ocean, Ubuntu 16.04

@nyanloutre
Copy link

I have the same issue but sometimes not on all nodes, for example :

trehiou@cds-stage-ms4 ~> docker service ps spark-proxy
ID                         NAME               IMAGE                   NODE           DESIRED STATE  CURRENT STATE          ERROR
4m3ro0qazlbd1k23sqrgg1fzz  spark-proxy.1      nyanloutre/spark-proxy  cds-stage-ms3  Running        Running 7 seconds ago  
243bs4bok79isx9d8ssuxe9qg   \_ spark-proxy.1  nyanloutre/spark-proxy  cds-stage-ms1  Shutdown       Failed 13 seconds ago  "task: non-zero exit (1)"

On the first host (cds-stage-ms1) the task failed because it couldn't resolve the name of another service

@mavenugo
Copy link
Contributor

mavenugo commented Sep 13, 2016

@michaelkrog @nyanloutre the DNS seems to be resolving properly (to Virtual-IP). But, the ping virtual-ip doesnt work and that is expected. Thats because the load-balancing using IPVS doesnt support ICMP. Instead of using ping, you can use other L4 tools (such as nc, etc...). Please let us know if that works.

@michaelkrog
Copy link
Author

Thanks @mavenugo.

I discovered the issue because the proxy service, based on nginx, had the same result. Every 2nd request resulted in 'no route to host' output in the log.

I had to remove the 2 workers to get my "swarm" working and that resolved the issues I was having. I will try to reproduce in a new environment.

@mavenugo
Copy link
Contributor

@michaelkrog okay got it. there were bunch of fixes that went in after 1.12.1. Maybe this is fixed by one of the fixes. If you can try a docker daemon from master (https://master.dockerproject.org/) and confirm that will help.

@garthk
Copy link

garthk commented Oct 5, 2016

… and another #25266?

@nyanloutre
Copy link

For me it's working properly since 1.12.1-RC1

@thaJeztah
Copy link
Member

@michaelkrog we released 1.12.2, which contains a lot of fixes in this area, and this issue may be resolved; could you give 1.12.2 a try and see if it's resolved for you?

@michaelkrog
Copy link
Author

michaelkrog commented Oct 20, 2016

So I finally managed to recreate my setup and upgrade it to 1.12.2 – and it works! 👍

Awesome guys!

@thaJeztah
Copy link
Member

Thanks @michaelkrog!

@thaJeztah thaJeztah added this to the 1.12.2 milestone Oct 20, 2016
@michaelkrog
Copy link
Author

michaelkrog commented Oct 20, 2016

But then again.. After 8 hours problems have started to occur again.

I had scaled my proxy-service to 3 instances and my previsto-site to 3 instances as well. Suddenly some requests fails when requested by a proxy instance on the master node:

2016/10/20 14:05:03 [error] 6#6: *539 upstream timed out (110: Connection timed out) while connecting to upstream, client: 10.255.0.3, server: previsto.com, request: "GET /da/ HTTP/1.1", upstream: "http://10.0.0.2:80/da/", host: "previsto.com"

Rescaling the proxy-instance(3 -> 1 -> 3) fixes it for now, but I fear it will occur again soon.

Edit
Very soon indeed. After 2 minutes it started again. I will scale my proxy service to 1 instance only for a longer periode to see if that helps.

@michaelkrog michaelkrog reopened this Oct 20, 2016
@michaelkrog
Copy link
Author

michaelkrog commented Oct 20, 2016

And now after approx 15 hours of setting up my cluster the proxy service is not able to connect to any instances of my previsto-site service anymore – no matter what node the instances resides on.

Scaling the services up/down does no longer fix the issue.

Only working solution is to remove all worker nodes again and have only one node.

EDIT
Seeing this using journalctl:

Oct 20 10:12:13 engine1 dockerd[7823]: time="2016-10-20T10:12:13.297609451-04:00" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=9jiz57fo78jcjkfma9dln5439
Oct 20 10:12:13 engine1 dockerd[7823]: time="2016-10-20T10:12:13.301047552-04:00" level=error msg="container status unavailable" error="context canceled" module=taskmanager task.id=6v1hvl8iuelubit7prgtj10xg
Oct 20 10:15:14 engine1 dockerd[7823]: time="2016-10-20T10:15:14.156809629-04:00" level=error msg="Error getting node cnzv7b5zcoiecubzjz4vcgk0z: node cnzv7b5zcoiecubzjz4vcgk0z not found"
Oct 20 10:15:14 engine1 dockerd[7823]: time="2016-10-20T10:15:14.160676112-04:00" level=error msg="Handler for GET /v1.24/nodes/cnzv7b5zcoiecubzjz4vcgk0z returned error: node cnzv7b5zcoiecubzjz4vcgk0z not found"
Oct 20 16:59:50 engine1 dockerd[7823]: time="2016-10-20T16:59:50.654970388-04:00" level=error msg="Error getting node cnzv7b5zcoiecubzjz4vcgk0z: node cnzv7b5zcoiecubzjz4vcgk0z not found"
Oct 20 16:59:50 engine1 dockerd[7823]: time="2016-10-20T16:59:50.656496164-04:00" level=error msg="Handler for GET /v1.24/nodes/cnzv7b5zcoiecubzjz4vcgk0z returned error: node cnzv7b5zcoiecubzjz4vcgk0z not found"
Oct 20 17:02:10 engine1 dockerd[7823]: time="2016-10-20T17:02:10-04:00" level=info msg="Firewalld running: false"
Oct 20 17:02:14 engine1 dockerd[7823]: time="2016-10-20T17:02:14.435990223-04:00" level=error msg="fatal task error" error="network previsto not found" module=taskmanager task.id=ez8ktfbyej76czpo4jlvbqrjh
Oct 20 17:02:14 engine1 dockerd[7823]: time="2016-10-20T17:02:14.608927573-04:00" level=error msg="network previsto remove failed: network previsto not found" module=taskmanager task.id=18u7pwgdys82q3vknh4fhbfz2
Oct 20 17:02:14 engine1 dockerd[7823]: time="2016-10-20T17:02:14.610194394-04:00" level=error msg="remove task failed" error="network previsto not found" module=taskmanager task.id=18u7pwgdys82q3vknh4fhbfz2

@thaJeztah
Copy link
Member

ping @mrjana

@michaelkrog
Copy link
Author

michaelkrog commented Oct 22, 2016

One thing I haven't mentioned is that I have been using an encrypted overlay network all along, following this procedure.

$ docker network create --opt encrypted --driver overlay previsto

I have now created a new unencrypted network, setup the services there, added 2 worker nodes again and have scaled the services to 3 instances each again. It has run flawlessly for 30 minutes now. I'll report any issues I might hit here.

@thaJeztah
Copy link
Member

@michaelkrog could you try running the check-config script to see if anything is missing? https://github.com/docker/docker/blob/master/contrib/check-config.sh

@michaelkrog
Copy link
Author

michaelkrog commented Oct 22, 2016

Sure @thaJeztah

This is the output from all 3 instances:

warning: /proc/config.gz does not exist, searching other paths for kernel config ...
info: reading kernel config from /boot/config-4.4.0-36-generic ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled and tools installed
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled (as module)
- CONFIG_BRIDGE: enabled (as module)
- CONFIG_BRIDGE_NETFILTER: enabled (as module)
- CONFIG_NF_NAT_IPV4: enabled (as module)
- CONFIG_IP_NF_FILTER: enabled (as module)
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled (as module)
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled (as module)
- CONFIG_IP_NF_NAT: enabled (as module)
- CONFIG_NF_NAT: enabled (as module)
- CONFIG_NF_NAT_NEEDED: enabled
- CONFIG_POSIX_MQUEUE: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_CGROUP_PIDS: enabled
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: missing
    (note that cgroup swap accounting is not enabled in your kernel config, you can enable it by setting boot option "swapaccount=1")
- CONFIG_MEMCG_KMEM: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: enabled
- CONFIG_NET_CLS_CGROUP: enabled (as module)
- CONFIG_CGROUP_NET_PRIO: enabled
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_RT_GROUP_SCHED: missing
- CONFIG_IP_VS: enabled (as module)
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_RR: enabled (as module)
- CONFIG_EXT3_FS: missing
- CONFIG_EXT3_FS_XATTR: missing
- CONFIG_EXT3_FS_POSIX_ACL: missing
- CONFIG_EXT3_FS_SECURITY: missing
    (enable these ext3 configs if you are using ext3 as backing filesystem)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled (as module)
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled (as module)
      - CONFIG_CRYPTO_GCM: enabled (as module)
      - CONFIG_CRYPTO_SEQIV: enabled (as module)
      - CONFIG_CRYPTO_GHASH: enabled (as module)
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled (as module)
      - CONFIG_XFRM_ALGO: enabled (as module)
      - CONFIG_INET_ESP: enabled (as module)
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled (as module)
  - "ipvlan":
    - CONFIG_IPVLAN: enabled (as module)
  - "macvlan":
    - CONFIG_MACVLAN: enabled (as module)
    - CONFIG_DUMMY: enabled (as module)
- Storage Drivers:
  - "aufs":
    - CONFIG_AUFS_FS: enabled (as module)
  - "btrfs":
    - CONFIG_BTRFS_FS: enabled (as module)
    - CONFIG_BTRFS_FS_POSIX_ACL: enabled
  - "devicemapper":
    - CONFIG_BLK_DEV_DM: enabled
    - CONFIG_DM_THIN_PROVISIONING: enabled (as module)
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled (as module)
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Edit
I should add that it has now run 15 hours without issues after switching to an unencrypted overlay network.

Edit
Now 27 hours on unencrypted overlay network and still going strong!

@thaJeztah
Copy link
Member

ping @mrjana any thoughts? ^^

@coryleeio
Copy link

After bootstrapping doing:

docker network create --driver overlay --opt encrypted foobar
docker service create --network foobar --name a --constraint node.hostname==A nginx
docker service create --network foobar --name b --constraint node.hostname==B nginx
docker service create --network foobar --name c --constraint node.hostname==C nginx

then having the containers each curl one another
curl a on all three nodes,
curl b on all three nodes, etc.

typically results in none of them being able to communicate over the network, I have seen it work occasionally however, but typically all three containers cannot reach each other.

If i drop the encrypted flag it works fine, unfortunately i need the encryption for my use case x.x

This is possibly related:
#27541

@coryleeio
Copy link

I'm doing this in AWS, on a standard ubuntu 14.10 AMI. with docker 1.12.2 on all nodes.

@thaJeztah
Copy link
Member

@coryleeio could it be related to #27425 ?

@mrjana
Copy link
Contributor

mrjana commented Oct 24, 2016

ping @aboch

@coryleeio
Copy link

@thaJeztah possibly, if the documented ports are indeed incorrect. I am only opening
TCP 2377
TCP/UDP 4789
TCP/UDP 7946

However, I do have it working presently with encrypted networking enabled, and only those ports open between the instances.

I just bootstrapped the swarm a few times, and kept redeploying the containers until it worked..
I'm going to open up port 50 and see if that makes it more predictable, and I will repost my results

@thaJeztah
Copy link
Member

@coryleeio note that it's not port 50, but protocol 50 (ESP) https://github.com/docker/docker.github.io/pull/230/files. I'm not too familiar with AWS's settings, but hope it helps

@coryleeio
Copy link

Opened up protocol 50 (its under custom protocol in AWS)

For each test I stopped the docker service, deleted docker data directory, restarted docker service, did docker swarm init, and docker swarm join, then created my containers, each with node constraints, so they wouldn't move around. Nodes were re-used and were not rebooted between runs.

Here are the commands used:

docker network create --driver overlay --opt encrypted foobar
docker service create --network foobar --name b --constraint node.role==manager nginx
docker service create --network foobar --name a --constraint node.role==worker nginx
docker service create --network foobar --name c --constraint node.role==worker nginx

Then i'd exec in each container...
i'd install curl
apt-get update && apt-get install curl -qy

and do:
curl a
curl b
curl c

Test #1(protocol 50 enabled)
Worked great, all containers can reach all other containers.

Test #2(protocol 50 enabled)
Failure, containers can't talk at all.

Test #3(protocol 50 disabled)
Everything worked, all containers can talk....

After test 3 i turned off protocol 50 traffic to see if i could prove that enabling protocol 50 helped something.

Curl a b and c worked after disabling protocol 50. So enabling/disabling it didn't seem to have an effect on the traffic. My guess is the traffic isn't encrypted, but i've not verified.

Test #4(protocol 50 disabled)
With protocol 50 disabled, I redid everything and found that all containers could still communicate.

@coryleeio
Copy link

@thaJeztah

@coryleeio
Copy link

coryleeio commented Oct 24, 2016

On my most recent run

Test #5
On A:
Can resolve itself, can't resolve B or C...

On B:
$ curl a
curl: (6) Could not resolve host: a
$ curl b
curl: (6) Could not resolve host: b
$ curl c
curl: (6) Could not resolve host: c

Cant resolve anything, including itself.

On C:
$ curl a
curl: (6) Could not resolve host: a
$ curl b
curl: (6) Could not resolve host: b
$ curl c
nginx landing page shows..

Can resolve C, cannot resolve anything else

@coryleeio
Copy link

What gets me is it works some of the time just fine. I can't quite pin if it's a timing issue or what.

@aboch
Copy link
Contributor

aboch commented Oct 24, 2016

@coryleeio

My guess is the traffic isn't encrypted, but i've not verified.

To verify traffic is encrypted, while doing your reachability testing, check if the following command run on the docker hosts is intercepting encrypted packets:

sudo tcpdump esp

@aboch
Copy link
Contributor

aboch commented Oct 24, 2016

Also, when the failure happen because of address resolution, can you manually check whether a ping from container to the other containers' IP address also fails ?

@coryleeio
Copy link

coryleeio commented Oct 24, 2016

Thanks @aboch, I checked and was able to validate the encryption is working. Apparently even when I disable it in the security group. Not sure why that is, but I'm glad it's encrypted.

I was working through an example with it failing, and after a minute or two it started working, is there possibly a delay on it coming up?

As for checking the IP addresses, i'll spin my cluster up and down a few times in the morning and see if I can get it to stop working again. In the mean time, I want to make sure I am looking at the correct thing cause i'm seeing something strange.

if I do docker inspect B on the B node
it tells me that the ip address is
"Networks": {
"foobar": {
"IPAMConfig": {
"IPv4Address": "10.0.0.5"
},
"Links": null,
"Aliases": [

On the D node, exec /bin/sh on container D, i try to reach B.

curl 10.0.0.5 -v results in....
curl 10.0.0.5 -v

  • Rebuilt URL to: 10.0.0.5/
  • Hostname was NOT found in DNS cache
  • Trying 10.0.0.5...
  • Connected to 10.0.0.5 (10.0.0.5) port 80 (#0)

curl b -v

  • Rebuilt URL to: b/
  • Hostname was NOT found in DNS cache
  • Trying 10.0.0.4...
  • Connected to b (10.0.0.4) port 80 (#0)

This is strange, as it does not match the ip address that I would expect for B.
However I can apparently reach B in two different ways...

curl 10.0.0.4 -v

  • Rebuilt URL to: 10.0.0.4/
  • Hostname was NOT found in DNS cache
  • Trying 10.0.0.4...
  • Connected to 10.0.0.4 (10.0.0.4) port 80 (#0)

Am I looking at the right ip address? That one doesn't match the one on the other node, but both seem to work.

I have two services in my cluster
docker service ls
ID NAME REPLICAS IMAGE COMMAND
9p686fhc5o9w d 1/1 nginx
cmvx7kj61ojh b 1/1 nginx

and the ip address of B shows as 10.0.0.9

So i'm a bit confused about that, as 3 ip addresses seem to be working, but only two containers exist.
Checking the log of D shows 3 hits from B

10.0.0.9 - - [24/Oct/2016:22:33:30 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.38.0" "-"
10.0.0.9 - - [24/Oct/2016:22:33:37 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.38.0" "-"
10.0.0.9 - - [24/Oct/2016:22:33:40 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.38.0" "-"

I will try to reproduce the fail state in the morning, and will confirm if I can address the container by it's ip address at that time.

@aboch
Copy link
Contributor

aboch commented Oct 25, 2016

@coryleeio

Am I looking at the right ip address? That one doesn't match the one on the other node, but both seem to work.

when you curl the service name (b in your case) it will resolve to the service virtual IP.

If your service had more than one replica, the resulting connections will be load balanced across the replicas, via ipvs.

docker service inspect b will give you some more info.

BTW, I replicated your setup on AWS with ubuntu 14.04 AMI (kernel 3.13) and so far so good, things work fine over encrypted network. Not sure about your security group configuration, but I can confirm ESP packets need to be allowed in order for containers to communicate over ipsec.

I will check tomorrow after a key rotation happened to see if that is the reason of the issue.

@michaelkrog
Copy link
Author

What @coryleeio is experiencing is exactly what I am seeing too. Since switching to a none encrypted network I am seeing no issues on Ubuntu 16.04/docker 1.12.2/Digital Ocean/private networking. Its been smooth for 3 days now.

In my case it could perhaps be the key rotation that caused problems. As mentioned earlier it worked flawlessly after upgrading to 1.12.2 for 8 hours before issues appeared. Then suddenly it was a complete mess even after rescaling services.

@aboch
Copy link
Contributor

aboch commented Oct 25, 2016

@michaelkrog I do not have DO account to try this out.
In the meanwhile, if possible, could you create another couple of services on a parallel encrypted overlay network, so that it won't affect your current setup.

So that if I don't hit the issue in my aws setup, then we can debug on yours.

Pls run these commands on each docker host before creating the services:
ip xfrm state flush
ip xfrm policy flush

Once at least one task is deployed on more than one docker host on the encrypted network, you should count 4(N-1) entries in the o/p of ip xfrm state, where N is the number of docker hosts.

With ip -s xfrm state you can check how many packets got processed by the SA.
The hit count of the rule in iptables -t mangle -nvL OUTPUT is the number of packets generated by the local containers running on the encrypted network which will get ultimately encrypted before leaving the host.

Please make a copy of ip xfrm state and ip xfrm policy o/p for each docker host.
The you can run few tests to make sure things work, then leave it there waiting for the key rotation.

You will know when the rotation happens if you monitor the xfrm activity with ip xfrm monitor. You may want to launch it in background and redirect to a file. Rotation happens every 24 12 hours (since swarm was started). Docker also informs about the key rotation if debug level was set.

After the rotation, take again a copy of ip xfrm state o/p.
Then you can see if that is the reason why containers can no longer talk to each other.

Thanks !

EDIT: Rotation happens every 12 hours

@aboch
Copy link
Contributor

aboch commented Oct 25, 2016

@coryleeio @michaelkrog

My 3 nodes (one manager, 2 workers) went through a key rotation. So far, so good.
I have two encrypted networks. On one I am running 3 single replicas nginx services (@coryleeio use case), on the other one 2 multi-replicas nginx services (to mimic @michaelkrog scenario).

I have scaled up and down the services and verified each task can connect to the services and can ping the other tasks' IP.

I will keep monitoring to see if the issue arises after subsequent key rotations.

In the meanwhile, I would like to make sure the issues you guys are encountering are indeed related to the encryption stuff and that your infra has the required policies to allow the ESP traffic across all nodes.

This basic check should do it:
Create an encrypted overlay network, run a couple of services with --replicas=<number of nodes> (so that at least each node has one task running) and then verify the IP connectivity works, by execing in each task and ping each IPs returned by getent hosts tasks.<svc name> for each service.

Then repeat 12 hours later (or whenever the rotation has happened if you can monitor that).

Note: It should not make a difference, but I am not publishing ports when I create the services (like in @coryleeio case).

@michaelkrog If possible, can you post the complete command you use to create your two services.
@coryleeio Can you post your security group configuration

Thanks

@coryleeio
Copy link

My security group (sg-xxxxxx) inbound rules looks like the following, outbound is open:

Custom TCP Rule
TCP
2377
sg-xxxxxx

Custom TCP Rule
TCP
4789
sg-xxxxxx

Custom UDP Rule
UDP
7946
sg-xxxxxx

Custom TCP Rule
TCP
7946
sg-xxxxxx

Custom UDP Rule
UDP
4789
sg-xxxxxx

Custom Protocol
ESP (50)
All
sg-xxxxxx

I added the protocol 50 rule yesterday. It does seem to be really stable today, perhaps rebuilding everything with the ESP open was all I needed.

I'm going to spin up a bunch of services, a router, and a bunch of databases all running on different encrypted networks, point some health checks at them, and leave it overnight just to confirm, but i'm feeling a lot better about it now with the ESP enabled. Doesn't quite explain how I was able to get connections before making that change, but since i can validate the encryption i'm not too fussed about it(thanks for that) =]

@thaJeztah
Copy link
Member

@coryleeio that's good to hear; keep us posted how it goes

@michaelkrog
Copy link
Author

@aboch I am away from my office till tomorrow, but I will definitely look into it then.

@coryleeio
Copy link

@thaJeztah
Okay - So my test was stable all night, no issues, 100% uptime. Looks great, but doesn't explain what was happening before. I don't think the key rotation is the issue, as i'm mostly just finding that right after a bootstrap, the reachability is variable. Once my swarm is up and communicating, it works fine, even over long periods. I am spinning my swarm up and down a lot, after extensive experimentation i think i stumbled on something. Though this can be done without encrypted networks, so i'm wondering if that's a false positive maybe.

I managed to produce a weird state with the networks that can occur when you spin the cluster up and down a lot, and i'd be curious if @michaelkrog is perhaps doing something similar.....

On manager:
docker swarm init....

On workers:
docker swarm join.....

On manager:
$ docker network create --opt encrypted --driver overlay foobar
$ docker service create --constraint node.hostname==A --name a --network foobar nginx:1.11.5-alpine
$ docker service create --constraint node.hostname==B --name b --network foobar nginx:1.11.5-alpine
$ docker service create --constraint node.hostname==C --name c --network foobar nginx:1.11.5-alpine

On all nodes:
$ docker network ls | grep foobar
=> 7o1bj557ovsr foobar overlay swarm

Testing here with exec, all containers can curl all containers, as expected. The network was created as it was needed on all the machines, everything is peachy.

On all nodes:
docker swarm leave --force

On all nodes:
$ docker network ls | grep foobar
=> 7o1bj557ovsr foobar overlay swarm

still looks good.... though why is the network still there?

On manager:
docker swarm init

On workers:
docker swarm join....

On manager:
$ docker network ls | grep foobar
=> 7o1bj557ovsr foobar overlay local

On workers:
$ docker network ls | grep foobar
=> 7o1bj557ovsr foobar overlay swarm

ids still match, but the manager network got downgraded to a local scope.
swarm network still exists on worker nodes.

Now i run my example on the newly created swarm
$ docker network create --opt encrypted --driver overlay foobar

But I get a network already exists of course.

So i remove the foobar network
$ docker network rm foobar
= > foobar

On manager:
$ docker network ls | grep foobar
=> nothing
On workers:
$ docker network ls | grep foobar
=> 7o1bj557ovsr foobar overlay swarm

foobar network still exists on worker, and is a swarm overlay network, local scope version was deleted from manager, but it did not propogate because it was local scope of course.

On manager I run my example again:
$ docker network create --opt encrypted --driver overlay foobar
$ docker service create --constraint node.hostname==A --name a --network foobar nginx:1.11.5-alpine
$ docker service create --constraint node.hostname==B --name b --network foobar nginx:1.11.5-alpine
$ docker service create --constraint node.hostname==C --name c --network foobar nginx:1.11.5-alpine

On manager:
docker network ls | grep foobar
=> 19wdbjr7nj4k foobar overlay swarm
On worker:
docker network ls | grep foobar
=> 7o1bj557ovsr foobar overlay swarm

Note the ids are different, but our containers launch happy and connect to the different networks named foobar

(docker ps will show each container running happily on each node, and they wont be able to communicate since they are on different networks that have the same name, scope, and driver, but different ids.)

@aboch
Copy link
Contributor

aboch commented Oct 26, 2016

Thanks @coryleeio for the extra info.

I am suggesting we wait for @michaelkrog to report his findings, so to see if we can rule out the encrypted network for his issue as well.

Also, if he has not performed the swarm join/leave sequences you've done, I'd suggest you to report your new problem in a separate issue.

That way we keep the focus on the original reported problem.

@coryleeio
Copy link

@aboch Yeah that makes sense.

In regards to my previous posts, incase anyone is following along, changing my security group didn't seem to have an effect, since for whatever reason the protocol 50 stuff was already going through in my AWS configuration, for whatever reason.
I'm fairly sure my issue was a different thing,

I created a new ticket
#27796

tldr; if your network ids don't match on your various nodes when you find that they can't communicate, you might check out #27796.

@michaelkrog
Copy link
Author

michaelkrog commented Oct 27, 2016

So, 1) because my proxy has "http://previsto-site" hardcoded in the nginx-config and 2) I already had an encrypted network from earlier, the easiest thing for me was to create a new proxy service on my "old" encrypted network and then put previsto-site in both the encrypted and the unencrypted network. That way both proxy services should be able to connect to the previsto-site service – only difference being the encryption on the network they reside on.

First test shows that all requests goes through on the unencrypted network whereas only some goes through on the encrypted network. To make sure my networks were not in a weird state I checked the networks on each node and they are all identical:

eatbh49ffslh        previsto               overlay             swarm               
dzlbcxqv58lf        previsto_unencrypted   overlay             swarm 

After this I removed all 3 services to start over:

$ docker service rm proxy_enc
$ docker service rm proxy-site
$ docker service rm proxy

I then ran the commands you requested:

$ sudo ip xfrm flush
Usage: ip xfrm XFRM-OBJECT { COMMAND | help }
where  XFRM-OBJECT := state | policy | monitor
$ sudo ip xfrm policy flush

I then created all 3 service again:

$ docker service create --network previsto --network previsto_unencrypted --name previsto-site --replicas 3 codezoo/previsto-site
$ docker service create --network previsto --name proxy_enc --replicas 3 -p 443:80 codezoo/previsto-proxy
$ docker service create --network previsto_unencrypted --name proxy --replicas 3 -p 80:80 codezoo/previsto-proxy

$ docker service ls
ID            NAME           REPLICAS  IMAGE                   COMMAND
3l9khi5ip05j  proxy_enc      3/3       codezoo/previsto-proxy  
akneow761ert  previsto-site  3/3       codezoo/previsto-site   
az3w9kfhjehs  proxy          3/3       codezoo/previsto-proxy  

I then made a few requests to each of the proxies. First to the proxy on the unencrypted network(published on port 80)

$ curl -I http://previsto.com/da/
HTTP/1.1 200 OK
Server: nginx/1.9.15
Date: Thu, 27 Oct 2016 13:41:20 GMT
Content-Type: text/html
Content-Length: 17590
Connection: keep-alive
Last-Modified: Thu, 27 Oct 2016 10:40:58 GMT
ETag: "5811d9ba-44b6"
Accept-Ranges: bytes

$ curl -I http://previsto.com/da/
HTTP/1.1 200 OK
Server: nginx/1.9.15
Date: Thu, 27 Oct 2016 13:51:37 GMT
Content-Type: text/html
Content-Length: 17590
Connection: keep-alive
Last-Modified: Thu, 27 Oct 2016 10:40:58 GMT
ETag: "5811d9ba-44b6"
Accept-Ranges: bytes

$ curl -I http://previsto.com/da/
HTTP/1.1 200 OK
Server: nginx/1.9.15
Date: Thu, 27 Oct 2016 13:51:40 GMT
Content-Type: text/html
Content-Length: 17590
Connection: keep-alive
Last-Modified: Thu, 27 Oct 2016 10:40:58 GMT
ETag: "5811d9ba-44b6"
Accept-Ranges: bytes

Every request goes through.

Then I requested the proxy on the encrypted network(published on port 443)

$ curl -I http://previsto.com:443/da/
HTTP/1.1 504 Gateway Time-out
Server: nginx/1.9.15
Date: Thu, 27 Oct 2016 13:43:31 GMT
Content-Type: text/html
Content-Length: 183
Connection: keep-alive

$ curl -I http://previsto.com:443/da/
HTTP/1.1 504 Gateway Time-out
Server: nginx/1.9.15
Date: Thu, 27 Oct 2016 13:44:33 GMT
Content-Type: text/html
Content-Length: 183
Connection: keep-alive

$ curl -I http://previsto.com:443/da/
HTTP/1.1 200 OK
Server: nginx/1.9.15
Date: Thu, 27 Oct 2016 13:44:34 GMT
Content-Type: text/html
Content-Length: 17590
Connection: keep-alive
Last-Modified: Thu, 27 Oct 2016 10:40:58 GMT
ETag: "5811d9ba-44b6"
Accept-Ranges: bytes

First 2 requests timed out because the proxy service were unable to request the previsto-site service. The 3rd request came through.

ip xfrm state

engine1

$ sudo ip xfrm state
src 10.129.17.245 dst 10.129.23.9
    proto esp spi 0x929ff22b reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x026f03682bad114b3952dfcd638fd66b929ff22b 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.17.245 dst 10.129.23.9
    proto esp spi 0xc145a827 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x2c49ca07c9d10fa84c6a2775e310027dc145a827 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.23.9 dst 10.129.17.245
    proto esp spi 0x7c363811 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d877c363811 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.17.245 dst 10.129.23.9
    proto esp spi 0xf2de4a31 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d87f2de4a31 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.15.191 dst 10.129.23.9
    proto esp spi 0x95330c4f reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x026f03682bad114b3952dfcd638fd66b95330c4f 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.15.191 dst 10.129.23.9
    proto esp spi 0xc799f1a3 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x2c49ca07c9d10fa84c6a2775e310027dc799f1a3 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.23.9 dst 10.129.15.191
    proto esp spi 0xb25024f9 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d87b25024f9 64
    anti-replay context: seq 0x0, oseq 0x5, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.15.191 dst 10.129.23.9
    proto esp spi 0x0fbdc9a5 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d870fbdc9a5 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 

engine2

$ sudo ip xfrm state
src 10.129.23.9 dst 10.129.15.191
    proto esp spi 0xc37b0533 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x026f03682bad114b3952dfcd638fd66bc37b0533 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.23.9 dst 10.129.15.191
    proto esp spi 0x363d7b7f reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x2c49ca07c9d10fa84c6a2775e310027d363d7b7f 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.15.191 dst 10.129.23.9
    proto esp spi 0x0fbdc9a5 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d870fbdc9a5 64
    anti-replay context: seq 0x0, oseq 0x11, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.23.9 dst 10.129.15.191
    proto esp spi 0xb25024f9 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d87b25024f9 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.17.245 dst 10.129.15.191
    proto esp spi 0xc8dade55 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x026f03682bad114b3952dfcd638fd66bc8dade55 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.17.245 dst 10.129.15.191
    proto esp spi 0xcf099409 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x2c49ca07c9d10fa84c6a2775e310027dcf099409 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.15.191 dst 10.129.17.245
    proto esp spi 0x2fc40c33 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d872fc40c33 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.17.245 dst 10.129.15.191
    proto esp spi 0x64cb75ef reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d8764cb75ef 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 

engine 3

$ sudo ip xfrm state
src 10.129.15.191 dst 10.129.17.245
    proto esp spi 0xdd2d6d79 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x026f03682bad114b3952dfcd638fd66bdd2d6d79 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.15.191 dst 10.129.17.245
    proto esp spi 0x67a9f80d reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x2c49ca07c9d10fa84c6a2775e310027d67a9f80d 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.17.245 dst 10.129.15.191
    proto esp spi 0x64cb75ef reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d8764cb75ef 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.15.191 dst 10.129.17.245
    proto esp spi 0x2fc40c33 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d872fc40c33 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.23.9 dst 10.129.17.245
    proto esp spi 0x899441cb reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x026f03682bad114b3952dfcd638fd66b899441cb 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.23.9 dst 10.129.17.245
    proto esp spi 0xebf3654f reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x2c49ca07c9d10fa84c6a2775e310027debf3654f 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.17.245 dst 10.129.23.9
    proto esp spi 0xf2de4a31 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d87f2de4a31 64
    anti-replay context: seq 0x0, oseq 0xc, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 
src 10.129.23.9 dst 10.129.17.245
    proto esp spi 0x7c363811 reqid 0 mode transport
    replay-window 0 
    aead rfc4106(gcm(aes)) 0x239d4f281ee70e3f23bc658438560d877c363811 64
    anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000
    sel src 0.0.0.0/0 dst 0.0.0.0/0 

ip xfrm policy

engine 1

$ sudo ip xfrm policy
src 10.129.23.9/128 dst 10.129.17.245/128 proto udp dport 4789 
    dir out priority 0 
    mark 0xd0c4e3/0xffffffff
    tmpl src 10.129.23.9 dst 10.129.17.245
        proto esp spi 0x7c363811 reqid 0 mode transport
src 10.129.23.9/128 dst 10.129.15.191/128 proto udp dport 4789 
    dir out priority 0 
    mark 0xd0c4e3/0xffffffff
    tmpl src 10.129.23.9 dst 10.129.15.191
        proto esp spi 0xb25024f9 reqid 0 mode transport

engine 2

$ sudo ip xfrm policy
src 10.129.15.191/128 dst 10.129.23.9/128 proto udp dport 4789 
    dir out priority 0 
    mark 0xd0c4e3/0xffffffff
    tmpl src 10.129.15.191 dst 10.129.23.9
        proto esp spi 0x0fbdc9a5 reqid 0 mode transport
src 10.129.15.191/128 dst 10.129.17.245/128 proto udp dport 4789 
    dir out priority 0 
    mark 0xd0c4e3/0xffffffff
    tmpl src 10.129.15.191 dst 10.129.17.245
        proto esp spi 0x2fc40c33 reqid 0 mode transport

engine 3

$ sudo ip xfrm policy
src 10.129.17.245/128 dst 10.129.23.9/128 proto udp dport 4789 
    dir out priority 0 
    mark 0xd0c4e3/0xffffffff
    tmpl src 10.129.17.245 dst 10.129.23.9
        proto esp spi 0xf2de4a31 reqid 0 mode transport
src 10.129.17.245/128 dst 10.129.15.191/128 proto udp dport 4789 
    dir out priority 0 
    mark 0xd0c4e3/0xffffffff
    tmpl src 10.129.17.245 dst 10.129.15.191
        proto esp spi 0x64cb75ef reqid 0 mode transport

@michaelkrog
Copy link
Author

I also tried pinging ip's from one of the tasks on the encrypted network that returned HTTP 504.

$ docker exec -ti  proxy_enc.1.542rer140rv4t7gdij4a056nq bash

/# getent hosts tasks.previsto-site
10.0.0.5        tasks.previsto-site
10.0.0.3        tasks.previsto-site
10.0.0.4        tasks.previsto-site

# ping --timeout=2 10.0.0.5
PING 10.0.0.5 (10.0.0.5): 56 data bytes
64 bytes from 10.0.0.5: icmp_seq=0 ttl=64 time=1.041 ms
64 bytes from 10.0.0.5: icmp_seq=1 ttl=64 time=0.216 ms
--- 10.0.0.5 ping statistics ---
3 packets transmitted, 2 packets received, 33% packet loss
round-trip min/avg/max/stddev = 0.216/0.628/1.041/0.413 ms

# ping --timeout=2 10.0.0.3
PING 10.0.0.3 (10.0.0.3): 56 data bytes
--- 10.0.0.3 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

# ping --timeout=2 10.0.0.4
PING 10.0.0.4 (10.0.0.4): 56 data bytes
--- 10.0.0.4 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss

@aboch
Copy link
Contributor

aboch commented Oct 28, 2016

Thank you @michaelkrog for providing the extra information.

The ipsec tunnels are properly installed on all nodes.

From the ip xfrm state o/p you posted I can only derive the number of outgoing packets which got encrypted:

  • on engine1 5 packets were encrypted and sent to engine2
  • on engine2 17 packets were encrypted and sent to engine1
  • on engine3 12 packets were encrypted and sent to engine1

but clearly those encrypted packets did not make it to their destination.

In order to see how many encrypted packets were received on each host, I need to see the o/p of ip -s xfrm state (no need to run extra traffic).

But, based on what we have now, my guess is that something is blocking ESP packets from being received by the engine1 host.

I don't know much about digital ocean, but I think you are in control of defining which traffic can freely be exchanged across your droplets, like the security groups in AWS.

Can you double check that, and make sure that ip protol 50 packets can be received/sent by all hosts ?

As a runtime check, what you could do is to run a tcpdump -p esp -v command on each of your enginex hosts, then try the ping between you containers over the encrypted network and see if the ESP packets are effectively being received by the docker hosts, if the tcpdump sees them coming in.

@michaelkrog
Copy link
Author

Oh my! Entering another segment of my ignorance: IPSec :)

So when I setup my environment (back in the 1.12 RC days) I followed the Docker Swarm Tutorial. But info about ESP was not included back then.

I had my firewall setup like this:

$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
22                         ALLOW       Anywhere                  
80/tcp                     ALLOW       Anywhere                  
443/tcp                    ALLOW       Anywhere                  
2376/tcp                   ALLOW       Anywhere                  
2377/tcp on eth1           ALLOW       Anywhere                  
7946 on eth1               ALLOW       Anywhere                  
4789 on eth1               ALLOW       Anywhere                  
22 (v6)                    ALLOW       Anywhere (v6)             
80/tcp (v6)                ALLOW       Anywhere (v6)             
443/tcp (v6)               ALLOW       Anywhere (v6)             
2376/tcp (v6)              ALLOW       Anywhere (v6)             
2377/tcp (v6) on eth1      ALLOW       Anywhere (v6)             
7946 (v6) on eth1          ALLOW       Anywhere (v6)             
4789 (v6) on eth1          ALLOW       Anywhere (v6) 

I know nothing about IPSec and how it works, but I guessed that these rules must be blocking the ESP packets you mentioned. So I disabled the firewall on all nodes and Voila!; it works. Every request goes through on both networks.

I did not disable the firewall before, because according to status I could retrieve from Docker everything seemed to be in order. For an ignorant developer type (like me) it is hard to see why the encrypted network does not work as the info available via Docker CLI does not show any errors.

I redefined my firewall rules to this:

$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
22                         ALLOW       Anywhere                  
80/tcp                     ALLOW       Anywhere                  
443/tcp                    ALLOW       Anywhere                  
2376/tcp                   ALLOW       Anywhere                  
2377/tcp on eth1           ALLOW       Anywhere                  
7946 on eth1               ALLOW       Anywhere                  
4789 on eth1               ALLOW       Anywhere                  
10.129.23.9/esp            ALLOW       Anywhere                  
Anywhere on eth1           ALLOW       500/udp                   
Anywhere/esp on eth1       ALLOW       Anywhere/esp              
22 (v6)                    ALLOW       Anywhere (v6)             
80/tcp (v6)                ALLOW       Anywhere (v6)             
443/tcp (v6)               ALLOW       Anywhere (v6)             
2376/tcp (v6)              ALLOW       Anywhere (v6)             
2377/tcp (v6) on eth1      ALLOW       Anywhere (v6)             
7946 (v6) on eth1          ALLOW       Anywhere (v6)             
4789 (v6) on eth1          ALLOW       Anywhere (v6)             
Anywhere (v6) on eth1      ALLOW       500/udp (v6)              
Anywhere/esp (v6) on eth1  ALLOW       Anywhere/esp (v6)

And then it works with firewall enabled! 👍

@aboch
Copy link
Contributor

aboch commented Oct 28, 2016

Awesome @michaelkrog, glad we resolved this one.

So when I setup my environment (back in the 1.12 RC days) I followed the Docker Swarm Tutorial. But info about ESP was not included back then.

I know sorry for that. I realized that was missing only when #27425 was opened. @afrazkhan took care of fixing the documentation in docker/docs#230.

Anywhere/esp on eth1 ALLOW Anywhere/esp

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants