Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy: Load Balance using multiple A records per single hostname #1545

Closed
furkanmustafa opened this issue Mar 29, 2017 · 28 comments · Fixed by #4470
Closed

proxy: Load Balance using multiple A records per single hostname #1545

furkanmustafa opened this issue Mar 29, 2017 · 28 comments · Fixed by #4470
Labels
feature ⚙️ New feature or request
Milestone

Comments

@furkanmustafa
Copy link

1. What version of Caddy are you running (caddy -version)?

Caddy 0.9.5

2. What are you trying to do?

Trying to proxy and load-balance to an upstream server, but, not by listing all IPs/hostnames, because the system I use (rancher) is providing multiple A records for service DNS requests.

Example dns reply using dig;

# dig hello-world.common-services.rancher.internal @169.254.169.250

; <<>> DiG 9.10.3-P4-Ubuntu <<>> hello-world.common-services.rancher.internal @169.254.169.250
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64531
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;hello-world.common-services.rancher.internal. IN A

;; ANSWER SECTION:
hello-world.common-services.rancher.internal. 600 IN A 10.42.49.70
hello-world.common-services.rancher.internal. 600 IN A 10.42.131.117
hello-world.common-services.rancher.internal. 600 IN A 10.42.183.144

;; Query time: 0 msec
;; SERVER: 169.254.169.250#53(169.254.169.250)
;; WHEN: Wed Mar 29 08:37:12 UTC 2017
;; MSG SIZE  rcvd: 103

Caddy resolves the host OK, but sends all requests to the IP address in the first A record.

3. What is your entire Caddyfile?

https://api.hello-world-app.example.com {
        log / stdout "{combined}"
        tls some-email@hello-world-app.example.com

        basicauth /signup hello-world-app mello-world-app

        timeouts none

        proxy / http://hello-world.common-services.rancher.internal:80 {
                keepalive 0
                transparent
                websocket
        }
}

:80 {
        redir / https://{host}{uri} 301
}

4. How did you run Caddy (give the full command and describe the execution environment)?

Using Docker/Rancher, with the dockerfile here.

6. What did you expect to see?

  • Caddy does DNS request for entered proxy upstream hostname
  • Considers all returned IP addresses as separate upstream host entry
  • Does not cache this value for too long and updates it frequently;
    • another way is; having an option to limit caching of this down to a few (~5?) seconds.
    • or another way is allowing to send a signal (eg. HUP) to Caddy process to re-resolve proxy upstream hosts, so we can automate reloading upstream list without downtime.

7. What did you see instead (give full error messages and/or log)?

  • All requests are forwarded to the first A record in the DNS response.

8. How can someone who is starting from scratch reproduce the bug as minimally as possible?

  • Create a (sub)domain in any dns server / provider;
  • Add two or more A entries, with the same (sub)domain but different IP addresses.
  • Setup Caddy to proxy requests to that (sub)domain
  • All requests will be proxied to the IP address in the first A record.
@furkanmustafa furkanmustafa changed the title Load Balance using multiple A records per single hostname proxy: Load Balance using multiple A records per single hostname Mar 29, 2017
@mholt
Copy link
Member

mholt commented Mar 29, 2017

Hi @furkanmustafa , thank you for following the issue template.

I do not think this is a bug, though. This is how the Go resolver works:

https://golang.org/pkg/net/#Dial

If the host is resolved to multiple addresses, Dial will try each address in order until one succeeds.

If the first IP address is succeeding, it will be used.

Maybe this would be a better fit upstream as a Go bug?

@mholt mholt closed this as completed Mar 29, 2017
@furkanmustafa
Copy link
Author

I think this would be a feature request rather than a bug request.

For Dial's behavior, I think it works as it supposed to. But for a load balancer, I would expect counting all those IPs as upstream servers.

If you think this suggested behavior would be appropriate for caddy's behavior, I can try to make a PR.

If not, then I think the next proper solution would be to use a service proxy between caddy and the containers, to make this look like a single target to caddy. But also seems like a bit redundant to do the load balancing work again at an additional layer.

@mholt
Copy link
Member

mholt commented Mar 29, 2017

(Sorry, I confused it for a bug report because it followed the bug template.)

I have heard that using DNS for load balancing is not a good idea (TTLs and such); why not just use Caddy's built-in load balancing features that do it properly?

@furkanmustafa
Copy link
Author

(Sorry, I confused it for a bug report because it followed the bug template.)

That's a mistake probably on my part :)

In this use case, the upstream target is changing rapidly. scaling up and down, or containers are replaced quickly with a new one (with a new IP). And service discovery is available using the internal dns server.

Using this DNS way, we would avoid generating web server configuration using a template file, and restarting/reloading it on each change etc. But of course, additionally the cache ttl on the load balancer side should be extremely low (<10s) to make this work smoothly. Everything would be dead simple to work with in this setup, but not the most elegant way to do it maybe.

For comparison with nginx behavior; nginx does load balance to multiple A records as desired, but it completely respects the DNS ttl value, therefore rapid changes in the upstream service is not reflected in a reasonable amount of time. That part could be also considered as a design failure on the internal DNS server of the system(rancher), because it reports ttl values as 600 seconds, which doesn't fit to it's purpose. If rancher reports a ttl of zero, or least something down to a few seconds, that'd work perfectly. Therefore, a proper implementation of DNS cache would work on load balancer side.


Another better/cleaner approach would be Caddy to provide API to add/remove upstreams to a virtualhost on the go. That'd require some integration opposed to DNS solution. But would provide complete automation possibility, and would be elegant enough.

This is probably out of this issue's scope, that can be moved if considered as a better solution.

Just a random idea for how I'd wish API to work;

Config:

https://api.hello-world-app.example.com {
        proxy / @my-app-upstream {
                keepalive 0
                transparent
                websocket
        }
}

api :8000
# or
# api unix:///var/run/caddy.sock

It'd return 502 until any host added to upstream via API.

API Interface:

# to add a host on the fly
echo "10.42.49.70" | curl -X POST http://localhost:8000/upstreams/my-app-upstream/hosts
# and to remove it back
curl -X DELETE http://localhost:8000/upstreams/my-app-upstream/hosts/10.42.49.70

@mholt mholt reopened this Mar 29, 2017
@mholt mholt added the feature ⚙️ New feature or request label Mar 29, 2017
@markvincze
Copy link

This is something that would be useful for my use case too. The same feature is available also for example in Envoy, which calls it Strict DNS Service Discovery mode, in which it considers every A record a separate upstream, and it's periodically, asynchronously refreshing the list of IPs.

@mholt mholt added this to the 2.0 milestone May 9, 2019
@JCMais
Copy link

JCMais commented Jun 10, 2019

Just to add another common usage scenario for this: if you have headless services on Kubernetes, their A records return multiple ips, one for each pod managed by that service.

That feature would be useful to load balance those pods directly with Caddy

See:
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#a-records

This scenario already (kinda) works using SRV records, but only if you have named ports on the Service. The problem with SRV records is that basically the DNS resolver is the one doing the load balancing. (#2100 (comment))

@mholt
Copy link
Member

mholt commented Jun 10, 2019

@JCMais Good news on that front, we have already built an ingress controller that works with Caddy 2 which should do just what you need.

@ghostsquad
Copy link

@mholt is Caddy really production ready? What advantages does it have over nginx ingress? I used Caddy before because the configuration was simple (perfect for a dev environment).

@mholt
Copy link
Member

mholt commented Jun 11, 2019

Yes it is. Our ingress controller automatically provisions certificates for all your hosts and is extensible with modules in more ways than nginx.

@ghostsquad
Copy link

@mholt is this the official ingress repo? https://github.com/wehco/caddy-ingress-controller

@mholt
Copy link
Member

mholt commented Jun 12, 2019

@ghostsquad No, that's a third-party one. We haven't released ours yet, but stay tuned. It's really awesome 😎

@ghostsquad
Copy link

@mholt is there an issue or PR I can subscribe to?

@mholt
Copy link
Member

mholt commented Jun 12, 2019

If you happen to be at Velocity Conference tomorrow, we're demoing it at our booth. :)

Otherwise, just follow caddyserver on Twitter for now. 👍

@ghostsquad
Copy link

I, unfortunately, won't be at the conference, but will definitely follow on Twitter. Thanks!

@thenewguy
Copy link

Was this ever implemented? Using a single dns record that points to multiple backend server ip addresses is a nice and easy way to reduce duplication and configuration

@francislavoie
Copy link
Member

@thenewguy see #1545 (comment), Caddy v2 will have this. The code for the ingress controller is not yet public but should be soon I think.

@thenewguy
Copy link

@francislavoie awesome!

@francislavoie
Copy link
Member

The Caddy ingress controller for Kubernetes is here: https://github.com/caddyserver/ingress

Just to gauge interest on this issue, it seems like everyone looking for this feature wanted it for use with Kubernetes? If so, I think we can close this issue.

@furkanmustafa
Copy link
Author

Kubernetes is not the only reason/system where people use dns based service discovery.

@francislavoie
Copy link
Member

Understood, but I'm specifically looking to gauge how many people actually need this that aren't already covered by the k8s ingress controller. We don't want to waste our time implementing a feature nobody will use.

@mholt mholt added the deferred ⏰ We'll come back to this later label Sep 17, 2020
@mholt
Copy link
Member

mholt commented Sep 17, 2020

As Caddy 2 now has several dynamic ways of configuring proxy backends, I'll close this until we find a use case that seems necessary to specifically do load balancing using DNS A records. (Feel free to continue discussion if you have a legit use case.)

@mholt mholt closed this as completed Sep 17, 2020
@jstewmon
Copy link

👋 Hi, I stumbled upon this issue while researching whether Caddy could be used as a reverse proxy to an AWS ALB... AWS load balancers can have an instance per AZ / subnet (typically 2 - 6) whose ipv4 addresses are all returned when resolving the A record for the load balancer's DNS name.

So, to effectively reverse proxy an AWS ALB, all of the A records need to be resolved and load balanced by the reverse proxy.

Is there an alternative strategy for accomplishing this in Caddy v2?

@mholt
Copy link
Member

mholt commented Sep 22, 2020

This should be pretty easy to implement using LookupIP(): https://golang.org/pkg/net/#LookupIP

@mholt mholt reopened this Sep 25, 2020
@mholt mholt added the help wanted 🆘 Extra attention is needed label Sep 25, 2020
@francislavoie francislavoie removed the deferred ⏰ We'll come back to this later label Sep 25, 2020
@teutat3s
Copy link

teutat3s commented Apr 20, 2021

I came across this while researching about service discovery using DNS A records with caddy.

Our use case would be this:
It would be really nice to be able to use Caddy in combination with Triton CNS (Container Name Service) in on-premise Triton clouds, where DNS A records get created automatically for each new container and also quickly get removed when a container stops or is destroyed.

I'm still in my early stages of programming, but I'll take a look into this.

@mholt
Copy link
Member

mholt commented Dec 7, 2021

Currently working on this as part of the refactoring of Caddy's reverse proxy which will allow dynamic upstreams via new "upstream source" modules (or something like that, naming is hard). I'm actually doing this for SRV lookups, but A/AAAA load balancing works pretty similarly.

@mholt
Copy link
Member

mholt commented Dec 13, 2021

Implementation can be seen in #4470

@mholt
Copy link
Member

mholt commented Jan 18, 2022

Would anyone like to give my PR, #4470, a try? It only supports JSON config for now, but you can just use caddy adapt and then tweak your config to use the new dynamic upstreams feature.

@mholt
Copy link
Member

mholt commented Mar 7, 2022

This is now merged, see #4470.

@mholt mholt modified the milestones: 2.x, v2.5.0 Mar 7, 2022
@mholt mholt removed the in progress 🏃‍♂️ Being actively worked on label Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants