Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SRV responses with absolute domain names are treated as relative domain names in v3.5.3 #13948

Closed
Tracked by #13960
liggitt opened this issue Apr 15, 2022 · 24 comments
Closed
Tracked by #13960
Labels

Comments

@liggitt
Copy link
Contributor

liggitt commented Apr 15, 2022

What happened?

#13712 incorrectly trimmed trailing . from SRV responses when constructing client addresses. #13714 then backported the bug to the 3.5 stream where it was released in v3.5.3.

This turns absolute domain names into relative domain names and means local search paths are appended when resolving the addresses

What did you expect to happen?

Absolute hostname SRV records do not get local DNS search paths appended again

How can we reproduce it (as minimally and precisely as possible)?

Start a local DNS server that knows how to return srv records for etcd.example.com:

Dockerfile:

FROM alpine:latest
RUN apk --no-cache add dnsmasq
EXPOSE 53/tcp
EXPOSE 53/udp
ENTRYPOINT ["dnsmasq"]
docker build -t dnsmasq .
docker run -it --rm --name dnsmasq dnsmasq \
  --user=root \
  --keep-in-foreground \
  --bind-dynamic \
  --conf-file=/dev/null \
  --log-queries \
  --log-facility=- \
  --srv-host=_etcd-client-ssl._tcp.etcd.example.com,etcd1.example.com. \
  --srv-host=_etcd-client-ssl._tcp.etcd.example.com,etcd2.example.com. \
  --srv-host=_etcd-client-ssl._tcp.etcd.example.com,etcd3.example.com.

Capture the IP address of the DNS server and verify it is responding correctly:

dnsip=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' dnsmasq)
dig @${dnsip} srv _etcd-client-ssl._tcp.etcd.example.com

See the DNS server observe the query:

dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.example.com from 192.168.9.1
dnsmasq[1]: config _etcd-client-ssl._tcp.etcd.example.com is <SRV>
dnsmasq[1]: config _etcd-client-ssl._tcp.etcd.example.com is <SRV>
dnsmasq[1]: config _etcd-client-ssl._tcp.etcd.example.com is <SRV>

And the response:

; <<>> DiG 9.18.0-2-Debian <<>> @192.168.9.2 srv _etcd-client-ssl._tcp.etcd.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24691
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_etcd-client-ssl._tcp.etcd.example.com.	IN SRV

;; ANSWER SECTION:
_etcd-client-ssl._tcp.etcd.example.com.	0 IN SRV 0 0 1 etcd2.example.com.
_etcd-client-ssl._tcp.etcd.example.com.	0 IN SRV 0 0 1 etcd1.example.com.
_etcd-client-ssl._tcp.etcd.example.com.	0 IN SRV 0 0 1 etcd3.example.com.

;; Query time: 0 msec
;; SERVER: 192.168.9.2#53(192.168.9.2) (UDP)
;; WHEN: Fri Apr 15 19:02:42 UTC 2022
;; MSG SIZE  rcvd: 178

Run a v3.5.2 client pointing at the custom DNS server with custom DNS search paths, trying to query SRV records:

docker run --dns="${dnsip}" --dns-option=ndots:5 --dns-search=example.org --dns-search=corp.example.org \
quay.io/coreos/etcd:v3.5.2 etcdctl -d etcd.example.com get /

And observe the resulting DNS queries are as expected:

# SRV queries for https with example.org and corp.example.org search paths
dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.example.com.example.org from 192.168.9.3
dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.example.com.corp.example.org from 192.168.9.3
dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.example.com from 192.168.9.3

# SRV queries for http with example.org and corp.example.org search paths
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.example.com.example.org from 192.168.9.3
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.example.com.corp.example.org from 192.168.9.3
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.example.com from 192.168.9.3

# AAAA and A queries for the targets returned from the SRV query
dnsmasq[1]: query[AAAA] etcd3.example.com from 192.168.9.3
dnsmasq[1]: query[AAAA] etcd1.example.com from 192.168.9.3
dnsmasq[1]: query[AAAA] etcd2.example.com from 192.168.9.3
dnsmasq[1]: query[A] etcd3.example.com from 192.168.9.3
dnsmasq[1]: query[A] etcd1.example.com from 192.168.9.3
dnsmasq[1]: query[A] etcd2.example.com from 192.168.9.3
...

Now run the same command with a v3.5.3 client (which includes a backport of #13712)

docker run --dns="${dnsip}" --dns-option=ndots:5 --dns-search=example.org --dns-search=corp.example.org \
quay.io/coreos/etcd:v3.5.3 etcdctl -d etcd.example.com get /
# SRV queries for https with example.org and corp.example.org search paths
dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.example.com.example.org from 192.168.9.3
dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.example.com.corp.example.org from 192.168.9.3
dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.example.com from 192.168.9.3

# SRV queries for http with example.org and corp.example.org search paths
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.example.com.example.org from 192.168.9.3
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.example.com.corp.example.org from 192.168.9.3
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.example.com from 192.168.9.3

# incorrect AAAA and A queries reappending DNS search paths
dnsmasq[1]: query[AAAA] etcd3.example.com.example.org from 192.168.9.3
dnsmasq[1]: query[AAAA] etcd1.example.com.example.org from 192.168.9.3
dnsmasq[1]: query[A] etcd3.example.com.example.org from 192.168.9.3
dnsmasq[1]: query[A] etcd1.example.com.example.org from 192.168.9.3
dnsmasq[1]: query[AAAA] etcd2.example.com.example.org from 192.168.9.3
dnsmasq[1]: query[A] etcd2.example.com.example.org from 192.168.9.3
dnsmasq[1]: query[AAAA] etcd3.example.com.corp.example.org from 192.168.9.3
dnsmasq[1]: query[AAAA] etcd1.example.com.corp.example.org from 192.168.9.3
dnsmasq[1]: query[A] etcd1.example.com.corp.example.org from 192.168.9.3
dnsmasq[1]: query[A] etcd3.example.com.corp.example.org from 192.168.9.3
dnsmasq[1]: query[A] etcd2.example.com.corp.example.org from 192.168.9.3
dnsmasq[1]: query[AAAA] etcd2.example.com.corp.example.org from 192.168.9.3

# correct AAAA and A queries for the targets returned from the SRV query
dnsmasq[1]: query[AAAA] etcd3.example.com from 192.168.9.3
dnsmasq[1]: query[AAAA] etcd2.example.com from 192.168.9.3
dnsmasq[1]: query[AAAA] etcd1.example.com from 192.168.9.3
dnsmasq[1]: query[A] etcd3.example.com from 192.168.9.3
dnsmasq[1]: query[A] etcd2.example.com from 192.168.9.3
dnsmasq[1]: query[A] etcd1.example.com from 192.168.9.3
...

Anything else we need to know?

No response

Etcd version (please run commands below)

3.5.3

Etcd configuration (command line flags or environment variables)

No response

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

No response

Relevant log output

No response

@liggitt
Copy link
Contributor Author

liggitt commented Apr 15, 2022

This bug exists in master, and was backported to 3.5.3 (and apparently copies a similar bug that already existed on the server-side -

shortHost := strings.TrimSuffix(srv.Target, ".")
)

This was referenced Apr 15, 2022
@liggitt
Copy link
Contributor Author

liggitt commented Apr 15, 2022

/cc @ahrtr @spzala @serathius

@ahrtr
Copy link
Member

ahrtr commented Apr 16, 2022

Thanks @liggitt for raising this issue, but it isn't correct to me.

Firstly, the addrs isn't the cname, instead it's a slice of SRV records.

The cname in this case is one of the following two values,

_etcd-client-ssl._tcp.etcd.example.com.
_etcd-client._tcp.etcd.example.com.

Secondly, no matter whether there is a dot at the end of Target in --srv-host, the SRV.Target returned by dig always has a trailing dot.

Let's work with your example. removing the trailing dot at the end of each target, and adding the port 2379,

docker run -it --rm --name dnsmasq dnsmasq \
  --user=root \
  --keep-in-foreground \
  --bind-dynamic \
  --conf-file=/dev/null \
  --log-queries \
  --log-facility=- \
  --srv-host=_etcd-client-ssl._tcp.etcd.example.com,etcd1.example.com,2379 \
  --srv-host=_etcd-client-ssl._tcp.etcd.example.com,etcd2.example.com,2379 \
  --srv-host=_etcd-client-ssl._tcp.etcd.example.com,etcd3.example.com,2379

Afterwards, run command below,

dnsip=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' dnsmasq)
dig @${dnsip} srv _etcd-client-ssl._tcp.etcd.example.com

The response is,

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @172.17.0.2 srv _etcd-client-ssl._tcp.etcd.example.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39587
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_etcd-client-ssl._tcp.etcd.example.com.	IN SRV

;; ANSWER SECTION:
_etcd-client-ssl._tcp.etcd.example.com.	0 IN SRV 0 0 2379 etcd2.example.com.
_etcd-client-ssl._tcp.etcd.example.com.	0 IN SRV 0 0 2379 etcd1.example.com.
_etcd-client-ssl._tcp.etcd.example.com.	0 IN SRV 0 0 2379 etcd3.example.com.

;; Query time: 0 msec
;; SERVER: 172.17.0.2#53(172.17.0.2)
;; WHEN: Fri Apr 15 23:30:00 PDT 2022
;; MSG SIZE  rcvd: 178

Thirdly, removing the trailing dot is correct, because users might also need the second DNS lookup.

Fourthly, are you using "https://www.google.com." or "https://www.google.com" to access google? Feel free to ignore this comment :)

@ahrtr ahrtr removed the type/bug label Apr 16, 2022
@liggitt
Copy link
Contributor Author

liggitt commented Apr 16, 2022

Sorry to disagree, but this is a bug.

Using local dns search paths to resolve an absolute domain name returned in a SRV record is not correct.

All the queries in the # incorrect AAAA and A queries reappending DNS search paths group are wrong. At best, they return NXDOMAIN and simply slow down resolution. At worst, they resolve to incorrect addresses.

@liggitt
Copy link
Contributor Author

liggitt commented Apr 16, 2022

no matter whether there is a dot at the end of Target in --srv-host, the SRV.Target returned by dig always has a trailing dot.

that's because SRV responses return canonical absolute target hostnames

@dims
Copy link
Contributor

dims commented Apr 16, 2022

@ahrtr please re-add the area/bug

@ptabor
Copy link
Contributor

ptabor commented Apr 16, 2022

I'm not a DNS expert, but in my option:

  1. As this change has semantical consequences (potentially even security) and is not a 'no brainer' bug fix, we shouldn't cherry-pick it to 3.5.x. We should consider it for 3.6 if it solves a practical problem.
  2. From quick scanning of https://datatracker.ietf.org/doc/html/rfc2782:
 Target
        The domain name of the target host.  There MUST be one or more
        address records for this name, the name MUST NOT be an alias (in
        the sense of [RFC 1034](https://datatracker.ietf.org/doc/html/rfc1034) or [RFC 2181](https://datatracker.ietf.org/doc/html/rfc2181)).  Implementors are urged, but
        not required, to return the address record(s) in the Additional
        Data section.  Unless and until permitted by future standards
        action, name compression is not to be used for this field.

If the CNAMEs are explicitly excluded, I doubt the intention of standard author was 'allowing second DNS lookup' for local resolution.

@ahrtr
Copy link
Member

ahrtr commented Apr 16, 2022

that's because SRV responses return canonical target hostnames

I agree with you if this statement is always true.

@liggitt
Copy link
Contributor Author

liggitt commented Apr 16, 2022

I doubt the intention of the RFC author was 'allowing second DNS lookup' for local resolution.

I'm not sure what you mean by that, but if a hostname is returned in the SRV record instead of an IP (which is valid to do), a DNS lookup is required to resolve that hostname to a usable IP.

The point of disagreement is whether local DNS search paths should be used in that DNS lookup. When resolving an absolute hostname returned in the SRV record, local search paths should definitely not be used.

@liggitt
Copy link
Contributor Author

liggitt commented Apr 16, 2022

Ah, I think I confused the issue by calling the .-suffixed result "canonical", leading to confusion with CNAME records.

What I should have called them were "absolute domain names", meaning they explicitly should not have search suffixes appended.

from rfc1035:

Domain names that end in a dot are called absolute, and are taken as complete.

We should not be stripping a trailing dot from the domain name, turning an absolute domain name into a relative domain name.

I edited the title/description/comments to clarify the issue is in the treatment of absolute domain names returned in SRV records.

@liggitt liggitt changed the title SRV responses are treated as non-canonical in v3.5.3 SRV responses with absolute domain names are treated as relative domain names in v3.5.3 Apr 16, 2022
@liggitt
Copy link
Contributor Author

liggitt commented Apr 16, 2022

that's because SRV responses return canonical absolute target hostnames

I agree with you if this statement is always true.

Whether it is always true or not, if a particular SRV response includes a dot suffix, that indicates that hostname is absolute, so we have to keep the dot suffix to do the DNS resolution of that hostname correctly.

@ahrtr
Copy link
Member

ahrtr commented Apr 16, 2022

that's because SRV responses return canonical absolute target hostnames

I agree with you if this statement is always true.

Whether it is always true or not, if a particular SRV response includes a dot suffix, that indicates that hostname is absolute, so we have to keep the dot suffix to do the DNS resolution of that hostname correctly.

Technically speaking, I agree with you. But it's a little counterintuitive. Just as I raised previously, usually we access a service using URL something like https://etcd1.example.com:2379 instead of https://etcd1.example.com.:2379. After etcdctl gets the endpoint via the DNS lookup, then it just delivers the endpoints to gRPC lib. So it accesses the endpoints/service just similar to we run curl https://etcd1.example.com:2379. (let's ignore the different protocol between gRPC and REST)

Usually we define an entry something below into /etc/hosts, so it doesn't matter whether we trim the trailing dot or not in this case from technical perspective. But from users' perspective, a URL without the trailing dot makes more sense? I guess this is the reason why previously the trailing dot is trimmed?

192.168.1.10  etcd1.example.com

Thanks for raising this interesting discussion, which I am totally open to.

@ptabor
Copy link
Contributor

ptabor commented Apr 17, 2022

I doubt the intention of the RFC author was 'allowing second DNS lookup' for local resolution.

I'm not sure what you mean by that, but if a hostname is returned in the SRV record instead of an IP (which is valid to do), a DNS lookup is required to resolve that hostname to a usable IP.

I agree with you. By "local resolution" I mean taking in consideration local search suffixes.

I approved the PR for rollback in 3.5.3, and I think we should release 3.5.4.

@liggitt: Has this problem manifested practically (e.g. broken k8s tests) or you cached it by reading the changelog ?

@liggitt
Copy link
Contributor Author

liggitt commented Apr 17, 2022

I just caught it reviewing the code changes.

Kubernetes doesn't inherently use the SRV lookup approach, but particular installations certainly could, and the fact that kubernetes sets ndots:5 inside containers could make anyone using SRV lookup inside a kubernetes container more susceptible to this issue.

@ahrtr
Copy link
Member

ahrtr commented Apr 18, 2022

that's because SRV responses return canonical absolute target hostnames

I agree with you if this statement is always true.

Whether it is always true or not, if a particular SRV response includes a dot suffix, that indicates that hostname is absolute, so we have to keep the dot suffix to do the DNS resolution of that hostname correctly.

It took me a couple of hours to read through the source code of net.LookupSRV, it seems that the returned srv.Target always has a trailing dot, no matter whether is a absolute name or not, please see message.go;l=2021 and message.go;l=2046.

Based on this point and my previous comment issuecomment-1100756920, so we should still trim the trailing dot?

Or does it mean that the nameserver (to which the golang lib send request, see dnsclient_unix.go#L257 ) will always return the absolute domain name?

@ahrtr
Copy link
Member

ahrtr commented Apr 18, 2022

I am not a DNS expert either.

If there is a trailing dot, then golang will not append any search suffix. Instead it only tries to resolve the name directly. See dnsclient_unix.go#L462. @liggitt has already verified this.

But from another perspective, the the SRV.Target what the golang client receives always have a trailing dot, just as I pointed out in previous comment.

In summary, there are two DNS Lookups.

The first time is to translate one of the following two targets

_etcd-client-ssl._tcp.etcd.example.com.
_etcd-client._tcp.etcd.example.com.

into a slice of etcd endpoints something like below,

etcd1.example.com.:2379 
etcd2.example.com.:2379
etcd3.example.com.:2379

The second time is to translate etcd[1-3].exaxmple.com. to IP addresses. The key question is should we trim the trailing dot from the domain name etcd[1-3].exaxmple.com.?

If the returned SRV.Target returned by DNS Server always has a trailing dot, even for the domain which isn't an absolute value, then I think we should trim the trailing dot. Otherwise, golang will not try to append any search suffix.

@liggitt
Copy link
Contributor Author

liggitt commented Apr 18, 2022

The key question is should we trim the trailing dot from the domain name etcd[1-3].example.com.?

We should not.

If the returned SRV.Target returned by DNS Server always has a trailing dot, even for the domain which isn't an absolute value

My understanding is that targets returned in SRV records should always be absolute. It doesn't make sense that a DNS server would return relative names that would be resolved using search paths local to the client which could be inconsistent.

@ahrtr
Copy link
Member

ahrtr commented Apr 18, 2022

My understanding is that targets returned in SRV records should always be absolute. It doesn't make sense that a DNS server would return relative names that would be resolved using search paths local to the client which could be inconsistent.

It seems that the DNS server just returns what it's configured/told, so we might really need to trim the trailing dot. Please see my experiment (based on yours) below,

Step 1: start the dnsmasq. Note that the target for _etcd-client._tcp.etcd.vmware.com is etcd1, and I added "etcd1.vmware.com:10.170.96.40" into /etc/host.


docker run --add-host etcd1.vmware.com:10.170.96.40 -it --rm --name dnsmasq dnsmasq \
  --user=root \
  --keep-in-foreground \
  --bind-dynamic \
  --conf-file=/dev/null \
  --log-queries \
  --log-facility=- \
  --srv-host=_etcd-client._tcp.etcd.vmware.com,etcd1,2379 


Step 2: run etcdctl in etcd 3.5.2.


dnsip=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' dnsmasq)
docker run --dns="${dnsip}" --dns-option=ndots:5 --dns-search=vmware.com --dns-search=eng.vmware.com quay.io/coreos/etcd:v3.5.2 etcdctl --command-timeout 20s -d etcd.vmware.com get k1

The output is,

{"level":"warn","ts":"2022-04-18T22:41:13.703Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc00031c700/etcd1.:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup etcd1. on 172.17.0.2:53: no such host\""}
Error: context deadline exceeded

The response of dnsmasq is as below. Obviously it only tried to resolve etcd1, and did not append any search suffix.

dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.vmware.com.vmware.com from 172.17.0.3
dnsmasq[1]: forwarded _etcd-client-ssl._tcp.etcd.vmware.com.vmware.com to 172.17.0.1
dnsmasq[1]: forwarded _etcd-client-ssl._tcp.etcd.vmware.com.vmware.com to 10.162.204.1
dnsmasq[1]: forwarded _etcd-client-ssl._tcp.etcd.vmware.com.vmware.com to 10.166.1.1
dnsmasq[1]: reply _etcd-client-ssl._tcp.etcd.vmware.com.vmware.com is NXDOMAIN
dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.vmware.com.eng.vmware.com from 172.17.0.3
dnsmasq[1]: forwarded _etcd-client-ssl._tcp.etcd.vmware.com.eng.vmware.com to 172.17.0.1
dnsmasq[1]: reply _etcd-client-ssl._tcp.etcd.vmware.com.eng.vmware.com is NXDOMAIN
dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.vmware.com from 172.17.0.3
dnsmasq[1]: forwarded _etcd-client-ssl._tcp.etcd.vmware.com to 172.17.0.1
dnsmasq[1]: reply _etcd-client-ssl._tcp.etcd.vmware.com is NXDOMAIN
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.vmware.com.vmware.com from 172.17.0.3
dnsmasq[1]: forwarded _etcd-client._tcp.etcd.vmware.com.vmware.com to 172.17.0.1
dnsmasq[1]: reply _etcd-client._tcp.etcd.vmware.com.vmware.com is NXDOMAIN
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.vmware.com.eng.vmware.com from 172.17.0.3
dnsmasq[1]: forwarded _etcd-client._tcp.etcd.vmware.com.eng.vmware.com to 172.17.0.1
dnsmasq[1]: reply _etcd-client._tcp.etcd.vmware.com.eng.vmware.com is NXDOMAIN
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.vmware.com from 172.17.0.3
dnsmasq[1]: config _etcd-client._tcp.etcd.vmware.com is <SRV>
dnsmasq[1]: query[AAAA] etcd1 from 172.17.0.3
dnsmasq[1]: forwarded etcd1 to 172.17.0.1
dnsmasq[1]: query[A] etcd1 from 172.17.0.3
dnsmasq[1]: forwarded etcd1 to 172.17.0.1
dnsmasq[1]: reply etcd1 is NXDOMAIN
dnsmasq[1]: reply etcd1 is NXDOMAIN
dnsmasq[1]: query[AAAA] etcd1 from 172.17.0.3
dnsmasq[1]: cached etcd1 is NXDOMAIN
dnsmasq[1]: query[A] etcd1 from 172.17.0.3
dnsmasq[1]: cached etcd1 is NXDOMAIN
dnsmasq[1]: query[AAAA] etcd1 from 172.17.0.3
dnsmasq[1]: cached etcd1 is NXDOMAIN
dnsmasq[1]: query[A] etcd1 from 172.17.0.3
dnsmasq[1]: cached etcd1 is NXDOMAIN
dnsmasq[1]: query[AAAA] etcd1 from 172.17.0.3
dnsmasq[1]: cached etcd1 is NXDOMAIN
dnsmasq[1]: query[A] etcd1 from 172.17.0.3
dnsmasq[1]: cached etcd1 is NXDOMAIN
dnsmasq[1]: query[AAAA] etcd1 from 172.17.0.3
dnsmasq[1]: cached etcd1 is NXDOMAIN
dnsmasq[1]: query[A] etcd1 from 172.17.0.3
dnsmasq[1]: cached etcd1 is NXDOMAIN
dnsmasq[1]: query[AAAA] etcd1 from 172.17.0.3
dnsmasq[1]: cached etcd1 is NXDOMAIN
dnsmasq[1]: query[A] etcd1 from 172.17.0.3
dnsmasq[1]: cached etcd1 is NXDOMAIN


Step 3: run etcdctl in etcd 3.5.3.


dnsip=$(docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' dnsmasq)
docker run --dns="${dnsip}" --dns-option=ndots:5 --dns-search=vmware.com --dns-search=eng.vmware.com quay.io/coreos/etcd:v3.5.3 etcdctl --command-timeout 20s -d etcd.vmware.com get k1

The output is, (Note: I added a "k1/v1" into the etcd server beforehand).

k1
v1

The response of dnsmasq is as below. Since etcd 3.5.3 trims the trailing dot, so it tried to append the search suffix ".vmware.com". Accordingly it could resolve the address successfully.


dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.vmware.com.vmware.com from 172.17.0.3
dnsmasq[1]: cached _etcd-client-ssl._tcp.etcd.vmware.com.vmware.com is NXDOMAIN
dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.vmware.com.eng.vmware.com from 172.17.0.3
dnsmasq[1]: cached _etcd-client-ssl._tcp.etcd.vmware.com.eng.vmware.com is NXDOMAIN
dnsmasq[1]: query[SRV] _etcd-client-ssl._tcp.etcd.vmware.com from 172.17.0.3
dnsmasq[1]: cached _etcd-client-ssl._tcp.etcd.vmware.com is NXDOMAIN
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.vmware.com.vmware.com from 172.17.0.3
dnsmasq[1]: cached _etcd-client._tcp.etcd.vmware.com.vmware.com is NXDOMAIN
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.vmware.com.eng.vmware.com from 172.17.0.3
dnsmasq[1]: cached _etcd-client._tcp.etcd.vmware.com.eng.vmware.com is NXDOMAIN
dnsmasq[1]: query[SRV] _etcd-client._tcp.etcd.vmware.com from 172.17.0.3
dnsmasq[1]: config _etcd-client._tcp.etcd.vmware.com is <SRV>
dnsmasq[1]: query[AAAA] etcd1.vmware.com from 172.17.0.3
dnsmasq[1]: forwarded etcd1.vmware.com to 172.17.0.1
dnsmasq[1]: forwarded etcd1.vmware.com to 10.162.204.1
dnsmasq[1]: forwarded etcd1.vmware.com to 10.166.1.1
dnsmasq[1]: query[A] etcd1.vmware.com from 172.17.0.3
dnsmasq[1]: /etc/hosts etcd1.vmware.com is 10.170.96.40

@liggitt
Copy link
Contributor Author

liggitt commented Apr 18, 2022

I'm not a DNS expert, but I would not expect DNS to reply with relative hostnames that depend on local search paths to resolve. But whatever we do with relative hostnames coming back from DNS (if those are even possible), I would never expect to re-relativize an absolute hostname that came back from DNS by trimming the trailing ..

/cc @thockin @bowei
who might be able to weigh in with more knowledge of the DNS spec

@thockin
Copy link

thockin commented Apr 18, 2022

I am AFK, so I can't cite RFC, but it is unfathomable to me that an SRV lookup would intentionally respond with anything other than an absolute name. The fact that Go's implementation agrees gives me even more confidence.

@thockin
Copy link

thockin commented Apr 19, 2022

Maybe also worth pointing out that "search paths" are not actually part of the DNS protocol but are part of the resolver libraries. No DNS server implementation would sanely depend on resolver behavior for correct response handling.

The fact that you got a DNS server to return crap (a bare label cannot be a DNS subdomain, I am 99% sure) doesn't mean that's what was intended. Beyond that case, anything could theoretically be a TLD.

The trailing period is actually part of the name, we just forgot about it because it is ugly.

@bowei
Copy link

bowei commented Apr 19, 2022

Looking through the RFCs, as far as I can tell, this should be a properly fully-qualified name, not one that should be subject to name aliasing -- which is a concept outside of the DNS protocol itself.

@ahrtr
Copy link
Member

ahrtr commented Apr 19, 2022

OK, thanks for all the feedback. The wiki also says that SRV.Target is a canonical hostname.

target: the canonical hostname of the machine providing the service, ending in a dot.

So I regards my above experiment/example as a bad/improper configuration. Will approve the PR 13949 and 13950.

@serathius
Copy link
Member

So I regards my above experiment/example as a bad/improper configuration. Will approve the PR 13949 and 13950.

Both PRs have now been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

7 participants