Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS fails to resolve for private registry when connected to VPN #924

Closed
spikegrobstein opened this issue Nov 9, 2016 · 46 comments
Closed

Comments

@spikegrobstein
Copy link

Expected behavior

docker pull should successfully pull down images from private registry when connected to VPN

Actual behavior

Errors with:

Error response from daemon: Get https://docker.mycompany.com/v1/_ping: dial tcp: lookup docker.mycompany.com on 192.168.65.1:53: no such host

Information

Diagnose & Feedback:

Docker for Mac: version: 1.12.3-beta29.3 (619507e)
OS X: version 10.12.1 (build: 16B2555)
logs: /tmp/2803062C-B1BF-4F85-83B7-5EABBEDF4364/20161109-134930.tar.gz
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     db
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     slirp
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[OK]     docker-cli
[OK]     menubar
[OK]     disk

Additional notes

We're using an SSL vpn, which is configured through the Network System Preference. When connected, it doesn't update the macOS resolv.conf, so things like the host command on the macOS side do not resolve the DNS, either, however running curl or ping against the hostname does work from the macOS side.

Using the scutil --dns command, I get a section that looks like the following (slightly censored):

resolver #2
  search domain[0] : mycompany.com
  search domain[1] : XXX.in-addr.arpa
  search domain[2] : mycompany.com
  search domain[3] : corp.mycompany.com
  nameserver[0] : XXX.XXX.XXX.XXX
  nameserver[1] : XXX.XXX.XXX.XXX
  if_index : 16 (utun3)
  flags    : Scoped, Request A records, Request AAAA records
  reach    : Reachable

If I connect to the Docker VM (screen ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty), and add the above DNS servers to /etc/resolv.conf or hard-code the IP of our registry into /etc/hosts, operations succeed as expected.

I've been digging in and haven't figured out exactly how the VM is resolving DNS; it's pointing to a 192.168.65.1 DNS server, but I'm not sure where that lives or how it's configured. Ideally if that would use the DNS server from VPN, I believe this would all work.

Steps to reproduce the behavior

  1. connect to corporate VPN
  2. try to pull an image from private registry
  3. error
@ShannonHickey
Copy link

My guess is that this is a duplicate of #540

@djs55
Copy link
Contributor

djs55 commented Nov 10, 2016

Thanks for the report. Could you try again and upload a diagnostics report with beta 30 (released ~2 hrs ago). I can't promise it's fixed but the DNS code has been improved:

  • better logic to extract DNS configuration from the SC database (as used by scutil) <-- this is where I need some help to make sure the code manages to understand your configuration properly
  • ability to send requests to multiple upstream servers over UDP and TCP in parallel
  • ability to multiplex requests properly which cuts down the number of sockets needed on the host (particularly on the Mac there are low system limits)
  • caching

@spikegrobstein
Copy link
Author

@ShannonHickey this does appear to be a dupe of #540 -- I went through open issues looking to see if it was already reported, but somehow missed it.

@djs55 w00! beta 30 does, indeed fix this problem. I did a factory reset, tried to start up my docker container, it failed (because I wasn't connected to VPN), then connected and tried to start, and it successfully pulled. This is great, thank you!

I'll continue testing/using it throughout today and will report in this thread if I find any issues, but we can consider it preliminarily solved.

@spikegrobstein
Copy link
Author

After doing some additional testing, I'm finding that DNS resolution in the Moby VM and in a running docker container are not in sync and not working as expected:

in VM

OK - resolves, FAIL - fails to resolve

  • google.com: FAIL
  • archive.ubuntu.com: FAIL
  • docker.mycompany.com: OK
  • apt.mycompany.com: OK
  • git.mycompany.com: OK

in running container

OK - resolves, FAIL - fails to resolve

  • google.com: FAIL
  • archive.ubuntu.com: FAIL
  • docker.mycompany.com: OK
  • apt.mycompany.com: FAIL
  • git.mycompany.com: OK

in macOS

all resolve as expected.

Also, when just trying to pull a container from the public index:

$ docker pull ubuntu
Using default tag: latest
Pulling repository docker.io/library/ubuntu
Error while pulling image: Get https://index.docker.io/v1/repositories/library/ubuntu/images: dial tcp: lookup index.docker.io on 192.168.65.1:53: no such host

Since there are 2 sets of DNS servers (one for the (W)LAN that I'm on and one for the VPN, I checked how DNS resolves directly against those servers. The public domains above (google and ubuntu) both resolve against the LAN DNS servers, but not against the VPN DNS servers. The mycompany.com DNS all resolves only against the VPN DNS servers (as expected).

It is weird that that the apt domain fails to resolve in the container where it resolves fine in the VM, but the other mycompany.com entries resolve fine in both.

let me know if there's any other information I can forward.

Diagnostics for Beta30:

Docker for Mac: version: 1.12.3-beta30 (7314181)
OS X: version 10.12.1 (build: 16B2555)
logs: /tmp/6477B407-0210-46B2-9CEA-09AE04962565/20161110-094857.tar.gz
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     db
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     slirp
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[OK]     docker-cli
[OK]     menubar
[OK]     disk

@rogaha
Copy link

rogaha commented Nov 22, 2016

@spikegrobstein we added lots of improvements on the networking side to docker engine 1.13, can you please try it again with that version? You just need to connect to the VM using screen and run curl -fsSL https://test.docker.com/ | sh. After that you can replace your docker client with https://test.docker.com/builds/Darwin/x86_64/docker-1.13.0-rc1.tgz.

@justincormack
Copy link
Member

@rogaha no you can't do that, there is no way to update Docker for Mac using test.docker.com. There will be a release with 1.13 soon.

@rogaha
Copy link

rogaha commented Nov 22, 2016

ok cool. Thanks for the heads up @justincormack. Better to wait for the next beta then. :)

@spikegrobstein
Copy link
Author

I'll await the next beta, then.

@kaskavalci
Copy link

Is the change in 1.13.0-rc3 ?

@mklatsky
Copy link

I'm running 1.13.0-rc4-beta34.1 (14853), and I still have the failing DNS lookups as lookups are using 192.168.65.1.

@spikegrobstein
Copy link
Author

I'm also still running into this issue in beta34.1

internal DNS resolves in the container, while external DNS (ie: google.com and archive.ubuntu.com) fail.

diagnose output:

OS X: version 10.12.2 (build: 16C67)
logs: /tmp/5D1839C1-275E-4EAD-8626-07DAD4D078D4/20170103-104142.tar.gz
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     db
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     slirp
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[OK]     docker-cli
[OK]     menubar
[OK]     disk

@mgilbir
Copy link

mgilbir commented Jan 13, 2017

It used to work fine for me before the holidays, but now I see similar problem in:

Version 1.13.0-rc6-beta36 (14969)
Channel: Beta a158c69c78

I have a local DNS server running on my host.

/etc/resolv.conf without the automatically generated OSX comments:

nameserver 127.0.0.1

I can pull from public registries but it won't resolve addresses in a non-public DNS.

$ docker push docker.corp.internal:5000/app
The push refers to a repository [docker.corp.internal:5000/app]
Put http://docker.corp.internal:5000/v1/repositories/app/: dial tcp: lookup docker.corp.internal on 192.168.65.1:53: no such host

But querying for the registry domain locally gives:

dig docker.corp.internal

; <<>> DiG 9.8.3-P1 <<>> docker.corp.internal
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64244
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;docker.corp.internal.		IN	A

;; ANSWER SECTION:
docker.corp.internal.	42	IN	A	10.3.50.75

;; Query time: 27 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Jan 13 15:44:54 2017
;; MSG SIZE  rcvd: 70

@soupmatt
Copy link

I just started having this issue after upgrading from Docker for Mac 1.12.6 to 1.13.0.

$ docker pull git.vibes.com:4567/docker-ci-images/centos68-ci:latest
Pulling repository git.vibes.com:4567/docker-ci-images/centos68-ci
Error while pulling image: Get http://git.vibes.com:4567/v1/repositories/docker-ci-images/centos68-ci/images: dial tcp: lookup git.vibes.com on 192.168.65.1:53: no such host

Diagnostic Output

Docker for Mac: version: 1.13.0 (0c6d765c5)
macOS: version 10.11.6 (build: 15G1212)
logs: /tmp/30ED4F18-BE27-4B7E-9EFB-D776DB8F1DBF/20170120-163804.tar.gz
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     db
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     slirp
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[OK]     docker-cli
[OK]     menubar
[OK]     disk

@sergeipogrebnyak
Copy link

Same here

Worked fine on Mac 1.12.latest. Upgraded to 1.13.0 and now cannot resolve any corporate VPN addresses including internal docker registry. The public DNS names work ok.

$ docker run --rm -it alpine ping www.google.com
PING www.google.com (172.217.3.196): 56 data bytes

$  docker run --rm -it alpine ping registry.eur.ad.sag
ping: bad address 'registry.eur.ad.sag'

The only workaround I found so far is to explicitly give corporate DNS server address:

$  docker run --rm -it --dns <dns address> alpine ping registry.<domain>
PING registry.<domain> (<correct address>): 56 data bytes

I cannot use this workaround for docker pull or docker-compose.
Any pointers/help on how to resolve it are appreciated.
Or maybe a way to downgrade to 1.12?


Docker for Mac: version: 1.13.0 (0c6d765c5)
macOS: version 10.12.2 (build: 16C67)
logs: /tmp/2135CC0C-1800-4CA2-B417-00177AA9C0D0/20170123-151909.tar.gz
[OK] vmnetd
[OK] dns
[OK] driver.amd64-linux
[OK] virtualization VT-X
[OK] app
[OK] moby
[OK] system
[OK] moby-syslog
[OK] db
[OK] env
[OK] virtualization kern.hv_support
[OK] slirp
[OK] osxfs
[OK] moby-console
[OK] logs
[OK] docker-cli
[OK] menubar
[OK] disk

@michaelwilde
Copy link

Found a solution. In Docker for Mac, go to "Preferences -- Uninstall/Reset --- Factory Reset". Do that, moby gets whacked and recreated. Problem solved for me.

@sergeipogrebnyak
Copy link

Sweet! Worked for me as well.
Thank you much

@sergeipogrebnyak
Copy link

Actually, spoke too soon.
The addresses now get resolved but always to the same IP 54.200.1.15 which seems to point to some ec2 instance!?

@spikegrobstein
Copy link
Author

I'm still having the issue in 1.13.0-beta38. I did a full factory reset and it's still reproducible.

@blamh
Copy link

blamh commented Jan 24, 2017

I tried the Factory Reset, but it did not work for me. Still getting an lookup docker.corp.internal on 192.168.65.1:53: no such host error.

I ended up doing a reinstall of version 1.12.1 (build: 12133)

@soupmatt
Copy link

I went back to 1.12.6 to get it working again.

@sergeipogrebnyak
Copy link

Where did you get this version from? I can't find it anywhere anymore. I mean the whole Docker for Mac package.

@michaelwilde
Copy link

Have you guys tried moving to the stable 1.13? Thats what i'm on.. working well now.

@sergeipogrebnyak
Copy link

Upgrading to beta39 resolved my VPN issues! At least right now everything seems to be working again. I'll update if it stays this way.

@dave-tucker
Copy link
Contributor

@spikegrobstein if you are still having this issue, could you please send us another diagnostic so we can look in to it? If not, please let us know it's fixed so we can close this! Thanks.

@michaelwilde
Copy link

michaelwilde commented Feb 13, 2017 via email

@spikegrobstein
Copy link
Author

Yes, still having issues:

Docker for Mac: version: 1.13.1-beta42 (2ffb2b491)
macOS: version 10.12.3 (build: 16D32)
logs: /tmp/1A731E86-2066-4093-A982-B78D83D94E1C/20170213-084847.tar.gz
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     db
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     slirp
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[OK]     docker-cli
[OK]     menubar
[OK]     disk

Diagnostic ID: 1A731E86-2066-4093-A982-B78D83D94E1C

1.13.1-beta42 is unable to resolve any dns in-container while connected to the VPN, while the moby VM is able to resolve only internal DNS (eg: not google.com or archive.ubuntu.com, but does resolve docker.mycompany.com).

@kaskavalci
Copy link

running on Docker for Mac: version: 1.13.1 (94675c5a7)

container can resolve public and internal domain names but connection to any docker registry fails. diagnostic upload failed as well.

@llamahunter
Copy link

This was working on 1.13.0, but I recently updated to 1.13.1 and I'm no longer able to resolve our private docker registry.

@yourbuddyconner
Copy link

yourbuddyconner commented Feb 17, 2017

Was working great on 1.13.0 and when I updated to 1.13.1 I was unable to connect to my internal private registry via vpn. Like a previous commenter, I am also using the Palo Alto Globalprotect VPN Client.

DNS for the registry resolves fine on the OSX side, but fails to resolve inside the container with the following error:
Error while pulling image: Get http://registry.DOMAIN.net/v1/repositories/operations/jenkins/images: dial tcp: lookup registry.DOMAIN.net on 192.168.65.1:53: no such host

@michaelwilde
Copy link

Docker.. ya'll need to solve this one quickly. Folks not being able to access registries over their VPNs is a huge Blocker for adoption.. like code red, level

@yourbuddyconner
Copy link

Fortunately it's not super critical for me because no production machines are running Docker for Mac, however Development machines are affected if they were updated. A downgrade path from 1.13.1 -> 1.13.0 would be useful...

@ijc
Copy link
Contributor

ijc commented Feb 22, 2017

@spikegrobstein thank you for your latest diagnostics, it has appeared on our servers and so I have escalated this to an internal ticket.

@soupmatt & @sergeipogrebnyak your diagnostics are also present, not clear if you have the same issue as @spikegrobstein or not but I have referenced your diagnostics in the internal ticket.

Everyone else, please use 🐳 menu ➡️ "Diagnose & Feedback" to upload a diagnostic and create a fresh issue describing what does and does not work for you, without that we will be unable to address your specific failure mode.

@djs55
Copy link
Contributor

djs55 commented Mar 24, 2017

The latest edge version 17.03.1-ce-rc1-mac3 (15924) can experimentally use the host's DNS resolver, which should be more compatible with VPN software than the current default resolver. To try it, first install the latest edge build and then:

$ cd ~/Library/Containers/com.docker.docker/Data/database/
$ git reset --hard
HEAD is now at 825ed0d last-start-time changed at 1490363273
$ ls com.docker.driver.amd64-linux/slirp/
dns		docker		domain		host		max-connections	mtu
$ mkdir -p com.docker.driver.amd64-linux/slirp-override
$ touch com.docker.driver.amd64-linux/slirp-override/dns
$ git add com.docker.driver.amd64-linux/slirp-override/dns 
$ git commit -s -m 'Use host resolver'
[master f9de428] Use host resolver
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 com.docker.driver.amd64-linux/slirp-override/dns

If anyone gets a chance to try this, let me know how it goes. If it goes well I'd like to make it the default behaviour in a future release.

Thanks!

@DeadLemon
Copy link

DeadLemon commented Mar 24, 2017

Docker for Mac: version: 17.03.0-ce-mac2 (1d7d97bbb)
macOS: version 10.12.3 (build: 16D32)
logs: /tmp/0ADA7662-1314-4058-9EAC-AB77BAEBCD69/20170325-003211.tar.gz
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     db
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     slirp
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[OK]     docker-cli
[OK]     menubar
[OK]     disk

I run in same issue. Is it possible, that docker servers banned me? In past I have some troubles with PSN(Playstation Network) because their provider blacklisted my dynamic IP-address

UPD: just restart my wifi router, now its ok

UPD 2: nope, it worked about 5 minutes

@mgilbir
Copy link

mgilbir commented Mar 27, 2017

@djs55 It works!

Before the change:

docker push docker.ls.internal:5000/base:3.5
The push refers to a repository [docker.ls.internal:5000/base]
Put http://docker.ls.internal:5000/v1/repositories/base/: dial tcp: lookup docker.ls.internal on 192.168.65.1:53: no such host
make: *** [push] Error 1

After the change:

docker push docker.ls.internal:5000/base:3.5
The push refers to a repository [docker.ls.internal:5000/base]
4ffb86018ccc: Pushed
5d995f35d4d0: Pushed
23b9c7b43573: Pushed
3.5: digest: sha256:dc29d3903531f3bbf323b29531e0667459842d943a89d6a4cbe4a909a2501ca4 size: 946

@mgilbir
Copy link

mgilbir commented Apr 12, 2017

@djs55 It broke again :(

I just updated to Version 17.04.0-ce-mac7 (16352) Channel: edge b598153e23 and the DNS resolution is gone and the fix doesn't work anymore.

Any idea on when are we going to have this working reliably?

@djs55
Copy link
Contributor

djs55 commented Apr 19, 2017

@mgilbir sorry to hear it stopped working. The method to switch to the native host resolver changed a little in recent edge versions.

In Version 17.05.0-ce-rc1-mac8 (16582) Channel: edge 73d01bb48e it should use the native host resolver by default (unless the configuration parsing code is buggy). To check if it's using the host resolver (which should be the most VPN-friendly resolution method) run the command syslog -k Sender Docker output and check that it includes something like:

Apr 19 14:19:40 ... Docker[95282] <Notice>: updating resolvers to use host resolver
Apr 19 14:19:40 ... Docker[95282] <Notice>: Remove(3): DNS configuration changed to: use upstream DNS servers nameserver 10.14.32.10#53
	order 0
Apr 19 14:19:40 ... Docker[95282] <Notice>: Add(3): DNS configuration changed to: use host resolver
Apr 19 14:19:40 ... Docker[95282] <Notice>: Will use the host's DNS resolver

The new runes to turn it on or off are:

cd ~/Library/Containers/com.docker.docker/Data/database/
git checkout master
mkdir -p mkdir com.docker.driver.amd64-linux/slirp
# to use the host resolver, which should be the most VPN-friendly:
echo -n host > com.docker.driver.amd64-linux/slirp/resolver
# uncomment to use the old resolver:
# echo -n old > com.docker.driver.amd64-linux/slirp/resolver
git add com.docker.driver.amd64-linux/slirp/resolver 
git commit -s -m 'use the host resolver'

Could you let me know if it works or not with the most recent edge version? If it still doesn't work, could you upload a fresh diagnostics?

Sorry for the inconvenience.

@BrendonW
Copy link

BrendonW commented May 2, 2017

Using:

Version 17.05.0-ce-rc1-mac8 (16582)
Channel: edge
73d01bb48e

I can get nothing to work. The problem seems to be related to Alpine 3.5 name resolution, but if I understand correctly that is what Docker is using itself.

If you look below, it seems that even if it resolves to an IPv4 address, it fails anyway. Looking at the DNS order, it favors the IPv6 DNS address and that returns the correct data if I use dig on the host.

May  2 06:43:53 Firefly Docker[6996] <Warning>: DNS lookup localhost.local A: Timeout
--- last message repeated 1 time ---
May  2 06:43:53 Firefly Docker[6996] <Warning>: DNS lookup localhost.local AAAA: Timeout
May  2 06:43:53 Firefly Docker[6996] <Notice>: DNS: localhost is ::1 in in /etc/hosts
May  2 06:43:53 Firefly Docker[6996] <Notice>: DNS: localhost is 127.0.0.1 in in /etc/hosts
May  2 06:43:53 Firefly Docker[6996] <Notice>: DNS: localhost is ::1 in in /etc/hosts
May  2 06:43:53 Firefly Docker[6996] <Notice>: DNS: localhost is 127.0.0.1 in in /etc/hosts
May  2 06:43:53 Firefly Docker[6996] <Warning>: DNS lookup dl-cdn.alpinelinux.org AAAA: NoSuchRecord
--- last message repeated 1 time ---
May  2 06:43:53 Firefly Docker[6996] <Notice>: DNS lookup dl-cdn.alpinelinux.org A: dl-cdn.alpinelinux.org <IN|2251> [CNAME (global.prod.fastly.net)], global.prod.fastly.net <IN|12> [A (151.101.40.249
)]
--- last message repeated 2 times ---
May  2 06:43:53 Firefly Docker[6996] <Warning>: DNS lookup dl-cdn.alpinelinux.org AAAA: NoSuchRecord

I really need my development environment running but don't know how to go back to a working version or how to patch this to work!

@djs55
Copy link
Contributor

djs55 commented May 2, 2017

@BrendonW could you upload a diagnostic report after the problem manifests? The report will contain more detailed diagnostics including a DNS packet trace.

There are some unreleased DNS fixes -- if you'd like to try testing them, follow the instructions in this comment: #1569 (comment)

Failing that, try this to revert to a previous setting:

$ cd ~/Library/Containers/com.docker.docker/Data/database
$ git reset --hard
HEAD is now at be213ac Updating state branch
$ ls
branch-created			com.docker.driver.amd64-linux
$ cat com.docker.driver.amd64-linux/slirp/resolver 
host
$ echo -n builtin > com.docker.driver.amd64-l/slirp/resolver 
$ git add com.docker.driver.amd64-linux/slirp/resolver 
$ git commit -s -m 'revert resolver mode'

Let me know if that helps (or not). Please upload another diagnostic if not and I'll take a look.

@jasonbivins
Copy link

This issue has been inactive for more than 14 days while marked as status/0-more-info-needed. It is being closed due to abandonment. Please feel free to re-open with more information about the problem.

MORE_INFO_EXPIRY_TIMEOUT

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Jun 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.