Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service dns - no route to host when routing to the container executing the request #23965

Closed
jhorwit2 opened this issue Jun 26, 2016 · 11 comments
Labels
area/networking/dns area/networking area/swarm kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/1.12

Comments

@jhorwit2
Copy link
Contributor

jhorwit2 commented Jun 26, 2016

Output of docker version:

Client:
 Version:      1.12.0-rc2
 API version:  1.24
 Go version:   go1.6.2
 Git commit:   906eacd
 Built:        Fri Jun 17 20:35:33 2016
 OS/Arch:      darwin/amd64
 Experimental: true

Server:
 Version:      1.12.0-rc2
 API version:  1.24
 Go version:   go1.6.2
 Git commit:   a7119de
 Built:        Fri Jun 17 22:09:20 2016
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

Containers: 125
 Running: 2
 Paused: 0
 Stopped: 123
Images: 236
Server Version: 1.12.0-rc2
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 470
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge overlay null host
Swarm: active
 NodeID: 6o19rpi5kx93jt9z6osdxfwfp
 IsManager: Yes
 Managers: 1
 Nodes: 1
 CACertHash: sha256:5711d0745394ebace6e82caa93a943281c0ce201902d4c3b6ba0b12f5d13cb02
Runtimes: default
Default Runtime: default
Security Options: seccomp
Kernel Version: 4.4.13-moby
Operating System: Alpine Linux v3.4
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 1.954 GiB
Name: moby
ID: 2PNZ:QY46:DCR2:IAU7:WUU4:Q5FZ:MAVC:KK7U:6JKH:LG2L:ANYI:3JP6
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 107
 Goroutines: 281
 System Time: 2016-06-26T17:28:58.594609691Z
 EventsListeners: 32
No Proxy: *.local, 169.254/16
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8

mac beta

Steps to reproduce the issue:

  1. docker service create --name web --replicas=1 -p 8080:80 web
  2. exec into each container for bash
  3. Modify the nginx index page at /usr/share/nginx/html/index.html to include some unique value for that container
  4. run 2 curls and notice the curl to the other container works but curl to current container can't be routed

Describe the results you received:

When I run curl web it only returns the result for the other task. Whenever it tries to route to the current task it responds with curl: (7) Failed to connect to web port 80: No route to host

Describe the results you expected:

Every curl should respond with nginx index page even if it's to the current container.

Additional information you deem important (e.g. issue happens only occasionally):

FROM nginx:latest

RUN apt-get update && apt-get -y install curl vim dnsutils

HEALTHCHECK --interval=5s --timeout=3s CMD curl localhost

^Dockerfile for the test being run.

@jhorwit2
Copy link
Contributor Author

jhorwit2 commented Jun 26, 2016

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                   PORTS               NAMES
98f3a8a8ecb3        web:latest      "nginx -g 'daemon off"   6 minutes ago       Up 6 minutes (healthy)   80/tcp, 443/tcp     web.2.bzpdiq8nuzzxdvdp0iyd4ynjt
d919e98214fc        web:latest      "nginx -g 'daemon off"   8 minutes ago       Up 8 minutes (healthy)   80/tcp, 443/tcp     web.1.5dka4xamn4do07y1jsor7c9hj

As you can see below, I exec into web 1 and only can receive a curl for web 2 (little text at end of body tag)

$ docker exec -it 98f3a8a8ecb3 /bin/bash
root@98f3a8a8ecb3:/# curl web
curl: (7) Failed to connect to web port 80: No route to host
root@98f3a8a8ecb3:/# curl web
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
web 2
</body>
</html>
root@98f3a8a8ecb3:/# curl web
curl: (7) Failed to connect to web port 80: No route to host

@jhorwit2 jhorwit2 changed the title no route to host when routing to the container executing the request service dns - no route to host when routing to the container executing the request Jun 27, 2016
@F21
Copy link
Contributor

F21 commented Jul 4, 2016

I am also hitting this problem with 1.12-rc3.

In my case, I want to run an Elasticsearch cluster using the primitives provided by swarm. This is so that I can easily scale the cluster up and down using docker service scale.

The command I am using is:

docker service create -p=9200:9200 --name=es --replicas=3 elasticsearch:2 elasticsearch -Ddiscovery.zen.ping.unicast.hosts=es

The trick here is to set the name of the service to es, so that all instances are contactable using the es DNS name. I then pass -Ddiscovery.zen.ping.unicast.hosts=es to elasticsearch so that it tries to discover all the other nodes in the cluster through the es dns name.

Because the containers cannot reach itself using the DNS name, in some cases, if you are unlucky, starting the service will cause a container to look up its own ip address using the es DNS name, resulting in a no route to host error, which prevents that node/instance from doing more discovery as the transport layer shuts down:

[2016-07-04 01:56:43,982][WARN ][transport.netty          ] [Deacon Frost] exception caught on transport layer [[id: 0x3fdb819d]], closing connection
java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
        at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I hope this can be fixed for 1.12 as there are lots of use-cases where we want to use the DNS name of a service and the gossip protocol to discover other nodes/instances in a clustered application.

@thaJeztah thaJeztah added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. area/networking area/swarm labels Jul 12, 2016
@thaJeztah thaJeztah added this to the 1.12.0 milestone Jul 12, 2016
@thaJeztah
Copy link
Member

ping @mavenugo PTAL

@bluepuma77
Copy link

Not sure if this is a DNS problem. I am also using Elastic and get a Java "No route to host" exception on a secondary instance, trying to connect to the first one on the same overlay network. Strange thing is, that I can exec into the secondary container and do a ping on the IP of the first one - that works.

docker service create \
  --name elastic1 \
  --publish 9200 \
  --publish 9300 \
  --network NetworkElastic \
  --constraint "node.hostname == srv1" \
  elasticsearch \
  -Des.node.name="srv1"

docker service create \
  --name elastic2 \
  --publish 9200 \
  --publish 9300 \
  --network NetworkElastic \
  --constraint "node.hostname == srv2" \
  elasticsearch \
  -Des.node.name=srv2 \
  --discovery.zen.ping.unicast.hosts=10.0.0.3 #via lookup of the container of elastic1

Docker version: 1.12.0-rc4

@F21
Copy link
Contributor

F21 commented Sep 5, 2016

Any chance this one can make it into 1.12.2?

@garthk
Copy link

garthk commented Oct 5, 2016

This related to #25266? When you get the wrong IP address on lookups, No route to host is the error you see when the connection attempt is made.

@mrjana
Copy link
Contributor

mrjana commented Oct 11, 2016

@jhorwit2 @bluepuma77 @F21 @garthk This is most likely fixed in 1.12.2-rc. Please give https://github.com/docker/docker/releases/tag/v1.12.2-rc3 a try. Thanks!

@arkadius
Copy link

arkadius commented Nov 5, 2016

Unfortunately I have the same issue on 1.12.3. It occurres after running ~2h elasticsearch cluster on heavy load.

transport.netty          ] [e1] exception caught on transport layer [[id: 0x2cd03dcb, /10.0.0.35:58815 => /10.0.0.28:9300]], closing connection
java.io.IOException: No route to host
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)

@thaJeztah thaJeztah modified the milestones: 1.13.0, 1.12.0 Nov 7, 2016
@guenhter
Copy link

guenhter commented Feb 13, 2017

We have seen the same issue on docker 1.13.1

Server Version: 1.13.1
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
Swarm: active
 NodeID: 55l834dp5uhu2lwx9zkirf05p
 Is Manager: true
 ClusterID: aenmnrxp6ie76rt70qqth6puz
 Managers: 3
 Nodes: 4
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 138.201.138.161
 Manager Addresses:
  138.201.138.161:2377
  138.201.138.165:2377
  138.201.193.70:2377
Runtimes: runc

docker version

docker version
Client:
 Version:      1.13.1
 API version:  1.26
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 08:47:51 2017
 OS/Arch:      linux/amd64

Server:
 Version:      1.13.1
 API version:  1.26 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   092cba3
 Built:        Wed Feb  8 08:47:51 2017
 OS/Arch:      linux/amd64
 Experimental: false

Exception is

java.net.SocketException: No route to host (Read failed)
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:170)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:100)
	at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:143)
	at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:173)

@thaJeztah thaJeztah modified the milestones: 17.04.0, 1.13.0 Feb 20, 2017
@guenhter
Copy link

guenhter commented Mar 9, 2017

I noticed that this can happen, when the system (e.g. just one node) is very very busy.

@thaJeztah
Copy link
Member

Let me close this ticket for now, as it looks like it went stale.

@thaJeztah thaJeztah closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2023
@thaJeztah thaJeztah removed this from the 17.04.0 milestone Sep 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking/dns area/networking area/swarm kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/1.12
Projects
None yet
Development

No branches or pull requests

9 participants