Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lookup_srv resolvers configuration is still using systems resolv.conf #3801

Closed
danlsgiga opened this issue Oct 16, 2020 · 15 comments
Closed
Labels
feature ⚙️ New feature or request
Milestone

Comments

@danlsgiga
Copy link
Contributor

danlsgiga commented Oct 16, 2020

Caddy version: v2.2.1

The feature introduced in #3479 is not using the specified addresses for upstream DNS resolution.

Here are is an error log that is occurring every minute in my setup:

{"level":"error","ts":1602812552.1035602,"logger":"http.log.error.log0","msg":"making dial info: lookup app.service.consul on 169.254.169.253:53: dial udp 169.254.169.253:53: operation was canceled","request":{"remote_addr":"24.88.156.180:4608","proto":"HTTP/2.0","method":"OPTIONS","host":"my.app.site","uri":"/registry/user/ribbon","headers":{"Accept-Language":["en-US,en;q=0.5"],"Accept-Encoding":["gzip, deflate, br"],"User-Agent":["Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0"],"Accept":["*/*"],"Access-Control-Request-Method":["GET"],"Access-Control-Request-Headers":["authorization,x-client-app"],"Referer":["https://app2.site/computers/29a8240d-6fd2-4464-a187-3227276b398f"],"Origin":["https://app2.site"],"Dnt":["1"],"Te":["trailers"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.app.site"}},"duration":0.000193545,"status":502,"err_id":"3jv1hi6bp","err_trace":"reverseproxy.(*Handler).ServeHTTP (reverseproxy.go:388)"}

The 169.254.169.253 IP is only present in my /etc/resolv.conf and should never be used after I have specified the resolvers in the transport config.

My config:

{
  "admin": {
    "disabled": true
  },
  "apps": {
    "http": {
      "http_port": 80,
      "servers": {
        "srv0": {
          "automatic_https": {
            "disable": true,
            "disable_redirects": true
          },
          "listen": [
            ":80"
          ],
          "logs": {
            "default_logger_name": "log0"
          },
          "routes": [
            {
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "group": "app",
                      "handle": [
                        {
                          "handler": "reverse_proxy",
                          "transport": {
                            "protocol": "http",
                            "resolver": {
                              "addresses": [
                                "tcp/127.0.0.1:53"
                              ]
                            }
                          },
                          "upstreams": [
                            {
                              "lookup_srv": "app.service.consul"
                            }
                          ]
                        }
                      ],
                      "match": [
                        {
                          "path": [
                            "/registry/investigation*"
                          ]
                        }
                      ]
                    },
                    {
                      "group": "app",
                      "handle": [
                        {
                          "handler": "reverse_proxy",
                          "transport": {
                            "protocol": "http",
                            "resolver": {
                              "addresses": [
                                "tcp/127.0.0.1:53"
                              ]
                            }
                          },
                          "upstreams": [
                            {
                              "lookup_srv": "app2.service.consul"
                            }
                          ]
                        }
                      ],
                      "match": [
                        {
                          "path": [
                            "/registry*"
                          ]
                        }
                      ]
                    }
                  ]
                }
              ],
              "match": [
                {
                  "host": [
                    "my.app.site"
                  ]
                }
              ],
              "terminal": true
            }
          ]
        }
      }
    }
  }
}

The dial call fails to 169.254.169.253 because that DNS will fail to resolve any .service.consul query.

Note: There's no way I can disable the error logs to leak sensitive information being sent via HTTP Headers, since Caddy is logging the full context of the call because of the DNS query failure. Also there's no way to filter these.

@francislavoie
Copy link
Member

That's because the resolver is specifically for the http transport dialer. The SRV dialer uses net.DefaultResolver always (currently).

The transports and upstreams are separate from eachother, the chosen upstreams are passed through the transport. There would probably need to be a separate resolver config for upstreams, but that seems annoying to configure.

/cc @mohammed90

@danlsgiga
Copy link
Contributor Author

But its there a way for me to not log those in stderr?

@francislavoie
Copy link
Member

Not really. You'll need to filter the logs out yourself with some other tool. Pipe the Caddy logs through jq or something, excluding any lines that match some query, and output the resulting logs somewhere else.

@danlsgiga
Copy link
Contributor Author

Ok, I think these errors are related to context canceled requests and are bubbling up the stack and canceling the dns lookups as well but its just an assumption from my part.

Regarding the logs, there's no way we can pipe this out to jq with our current stack without a major restructuring.

@francislavoie francislavoie added the feature ⚙️ New feature or request label Oct 16, 2020
@francislavoie
Copy link
Member

francislavoie commented Oct 16, 2020

@mohammed90 I think what we should do is move the Resolver config up to the reverse_proxy handler instead, then push it down to the transport on provision (try to type assert for HTTPTransport, if so, set Resolver; or we can set up a NeedsResolver type which has a SetResolver function which HTTPTransport can implement, to generalize), and also pass it to fillDialInfo to be used for SRV. We should also deprecate the transport resolver config and warn that it should be moved to the handler.

@danlsgiga
Copy link
Contributor Author

Bit more insight on this issue... I logged in all my 3 nodes and manually edited my /etc/resolv.conf to point to the DNS server I wanted. The issue persists... so, I believe this issue is related to when requests are canceled on the client side and the server just cancels the context of the request, which ends up cancelling the dns lookup due to _, records, err := net.DefaultResolver.LookupSRV(r.Context(), "", "", srvName). The DNS server IP in the error logs just seems to be always the last entry of the resolv.conf file.

I have no reports of normal requests failing at all and I can't see any issues in my unbound setup.

@danlsgiga danlsgiga changed the title resolvers configuration is still using systems resolv.conf lookup_srv resolvers configuration is still using systems resolv.conf Oct 16, 2020
@danlsgiga
Copy link
Contributor Author

I think we can use this issue to have the resolvers set for both dial and lookup_srv... meanwhile I'll track the error reported here in the appropriate issue here

@mholt
Copy link
Member

mholt commented Nov 24, 2020

Is the lookup_srv resolvers still a problem? Do we need to keep the issue open, and if so, what's the current status? Sorry, it looks like this issue diverged into two separate things. It looks like #3816 addresses it?

@danlsgiga
Copy link
Contributor Author

Hey @mholt... so, I think this issue is still valid since @francislavoie already confirmed that the custom resolver is only being applied to the http transport dialer, not the LookupSRV. IMO, both dial and lookup_srv directives should use the specified resolver (if any)

@mholt mholt added the help wanted 🆘 Extra attention is needed label Nov 24, 2020
@francislavoie
Copy link
Member

Another user has encountered this issue: https://caddy.community/t/dns-server-to-use/14054

@mohammed90
Copy link
Member

@mohammed90 I think what we should do is move the Resolver config up to the reverse_proxy handler instead, then push it down to the transport on provision (try to type assert for HTTPTransport, if so, set Resolver; or we can set up a NeedsResolver type which has a SetResolver function which HTTPTransport can implement, to generalize), and also pass it to fillDialInfo to be used for SRV. We should also deprecate the transport resolver config and warn that it should be moved to the handler.

At a glance, yes, that's one way. Another way I see is to inject the resolver form the Transport into the Upstreams. See, the Transport is provisioned starting in line 206:

if h.Transport == nil {
t := &HTTPTransport{
KeepAlive: &KeepAlive{
ProbeInterval: caddy.Duration(30 * time.Second),
IdleConnTimeout: caddy.Duration(2 * time.Minute),
MaxIdleConnsPerHost: 32, // seems about optimal, see #2805
},
DialTimeout: caddy.Duration(10 * time.Second),
}
err := t.Provision(ctx)
if err != nil {
return fmt.Errorf("provisioning default transport: %v", err)
}
h.Transport = t
}

while Upstreams are provisioned starting in line 246:

for _, upstream := range h.Upstreams {
// create or get the host representation for this upstream
var host Host = new(upstreamHost)
existingHost, loaded := hosts.LoadOrStore(upstream.String(), host)
if loaded {
host = existingHost.(Host)
}
upstream.Host = host
// give it the circuit breaker, if any
upstream.cb = h.CB
// if the passive health checker has a non-zero UnhealthyRequestCount
// but the upstream has no MaxRequests set (they are the same thing,
// but the passive health checker is a default value for for upstreams
// without MaxRequests), copy the value into this upstream, since the
// value in the upstream (MaxRequests) is what is used during
// availability checks
if h.HealthChecks != nil && h.HealthChecks.Passive != nil {
h.HealthChecks.Passive.logger = h.logger.Named("health_checker.passive")
if h.HealthChecks.Passive.UnhealthyRequestCount > 0 &&
upstream.MaxRequests == 0 {
upstream.MaxRequests = h.HealthChecks.Passive.UnhealthyRequestCount
}
}
// upstreams need independent access to the passive
// health check policy because passive health checks
// run without access to h.
if h.HealthChecks != nil {
upstream.healthCheckPolicy = h.HealthChecks.Passive
}
}

So we can grab the resolver from the transport and inject it into the Upstream instance. It's been more than a year since I touched this code, so take my judgement with a grain of salt.

@mholt
Copy link
Member

mholt commented Dec 8, 2021

@mohammed90 Maybe we can look into integrating this into #4470.

@mholt
Copy link
Member

mholt commented Jan 25, 2022

I think @francislavoie just added this to #4470, if you want to try it out!

@mholt
Copy link
Member

mholt commented Feb 10, 2022

@danlsgiga Do you still need this? Any chance you could try out that branch? 😀

@mholt
Copy link
Member

mholt commented Mar 7, 2022

Merged in #4470. Closing this due to completion + inactivity.

@mholt mholt closed this as completed Mar 7, 2022
@mholt mholt removed the help wanted 🆘 Extra attention is needed label Mar 7, 2022
@mholt mholt added this to the v2.5.0 milestone Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants