Delay in Caddy using changes to reverse_proxy upstream #6195

paularlott · 2024-03-27T01:41:27Z

I'm writing/written a plugin for Caddy which watches a Nomad / Consul cluster and builds ingress routes to our cluster based on the data, basically it generates a collection of reverse_proxy statements to the running containers.

However I've noticed that after reloading the Caddyfile existing clients continue to use the previous configurations for 2 - 3 minutes or until the browser cache is cleared.

Initially I thought this was something I'd done wrong in my code, so I've extracted the minimum Caddyfile and manually applied it against a stock version of Caddy (2.7.6 running in Docker) with the same results.

Initially I might have a reverse_proxy block such as

reverse_proxy {
      dynamic srv {
	name php-pool-1.service.consul
        refresh 10s
        dial_timeout 5s
      }
      import reverseProxyConfig
      transport http {
        keepalive off
      }
    }

After a deployment the php-pool-1 name changes e.g.

reverse_proxy {
      dynamic srv {
	name php-pool-2.service.consul
        refresh 10s
        dial_timeout 5s
      }
      import reverseProxyConfig
      transport http {
        keepalive off
      }
    }

If I start browsing the site I can see Caddy logging that it's querying DNS for php-pool-1.service.consul, which is correct.

If I now reload with a new Caddyfile to change the upstream to php-pool-2.service.consul then I can see Caddy logging that it's querying DNS for php-pool-1.service.consul and not the new php-pool-2.service.consul, it will keep using the old upstream until either I clear the browser cache or wait 2 - 3 minutes, after which it moves to php-pool-2.service.consul.

If I use a different browser to connect directly after the reload then that connection is using php-pool-2.service.consul, which is what I expect.

The minimal Caddyfile I have is:

{
  admin localhost:2019

  log {
    output stdout
    level DEBUG
    format console
  }

  grace_period 3s
  debug
}

(reverseProxyConfig) {
  header_up +X_FORWARDED_PORT 443
  lb_policy least_conn
  lb_try_duration 5s
  lb_try_interval 250ms
  fail_duration 2s
  unhealthy_status 5xx
}

(logsConfig) {
  log {
    output stdout
    level DEBUG
    format console
  }
}

# Health Check
:8080 {
  respond /health-check OK 200
}

*.fortix.systems {
  import logsConfig
  encode zstd gzip

  @wildcard_86 host ingresstest.fortix.systems
  handle @wildcard_86 {
    reverse_proxy {
      dynamic srv {
	name php-pool-46db85dd489abc89543ff3ff288c7975.service.consul
        refresh 10s
        dial_timeout 5s
      }
      import reverseProxyConfig
      transport http {
        keepalive off
      }
    }
  }

  handle {
    abort
  }
}

From the documentation, https://caddyserver.com/docs/caddyfile/options#grace-period if I include grace_period then it should force connection close after in my case 3s, but it doesn't seem to be happening so I'm guessing I'm missing something obvious here.

Thank you for any guidance

The text was updated successfully, but these errors were encountered:

mholt · 2024-03-27T04:09:31Z

Hmmm, thanks for the report. I'll try to look into this soon.

mholt · 2024-03-28T02:56:08Z

What is in your logs before, during, and after a config change?

I would expect the server socket to shut down after the grace period and sever existing connections to clients.

As for the dynamic upstreams, they are global and persist through config reloads, but keyed by the SRV address, so if those change the old ones should no longer be used I think.

paularlott · 2024-03-28T04:16:31Z

I've attached a dump of the logs log.gz, I started Caddy with

docker run --rm --network host --privileged --cap-add NET_ADMIN -v /root/config/Caddyfile:/etc/caddy/Caddyfile -v /root/config/data:/data caddy:2.7.6-alpine > log 2> error_log

Accessed a page on the site, then entered the container, modified the Caddyfile and did caddy reload --config /etc/caddy/Caddyfile

The initial service record was php-pool-dda10003a747409ee137207837a8eae7.service.consul and I changed it to xxx-php-pool-dda10003a747409ee137207837a8eae7.service.consul

Reload happened at 2024/03/28 03:58:58.357

Old record was used until 2024/03/28 04:00:04.674, at that point it failed as in this case the upstream doesn't exist, but it doesn't matter if the upstream exists or not. Sometimes it'll hold it longer, but usually 2 - 3 minutes.

Caddyfile used:

{
  admin localhost:2019

  log {
    output stdout
    level DEBUG
    format console
  }

  shutdown_delay 1s
  grace_period 3s
  debug
}

# Snippets

(reverseProxyConfig) {
  header_up +X_FORWARDED_PORT 443
  lb_policy least_conn
  lb_try_duration 5s
  lb_try_interval 250ms
  fail_duration 2s
  unhealthy_status 5xx

	health_uri /
	health_interval 10s
	health_timeout 5s
	health_status 5xx
}

(logsConfig) {
  log {
    output stdout
    level DEBUG
    format console
  }
}

# Health Check
:8080 {
  respond /health-check OK 200
}

ingresstest.fortix.systems {
  import logsConfig
  encode zstd gzip

    reverse_proxy {
      dynamic srv {
        name php-pool-dda10003a747409ee137207837a8eae7.service.consul
        refresh 10s
        dial_timeout 5s
      }
      import reverseProxyConfig
      transport http {
        versions 2
        keepalive off
	keepalive_interval 5s
	keepalive_idle_conns 2
        read_buffer 16KiB
        write_buffer 16KiB

      }
  }

}

Thanks

WeidiDeng · 2024-03-29T00:41:36Z

From the log I found out all the requests are HTTP/3.0. It's due to old h3 connections being active after configuration reload. I'll see what I can do with quic-go, which caddy depends for http3.

WeidiDeng · 2024-03-29T02:04:38Z

@paularlott Can you try xcaddy build close-h3-connections to see if it's fixed?

paularlott · 2024-03-29T02:21:16Z

@WeidiDeng I've only had a chance for a quick test so far, but it's looking like that resolves the issues, thank you.

mholt · 2024-03-29T02:29:30Z

@WeidiDeng You are a WIZARD 🎉 🧙‍♂️ can't wait to review this 😃

WeidiDeng · 2024-04-03T14:48:25Z

@paularlott Do you mind giving this patch a go? This properly closes http3 servers without interrupting ongoing requests unless a grace period is reached.

paularlott · 2024-04-03T15:08:00Z

@WeidiDeng I've just built it and deployed it as the ingress controller for some test sites and it seems to be following the changes to the upstream SRV records as it should. I'll leave it running so we can give it a good test.

Thank you

WeidiDeng added the upstream ⬆️ Relates to some dependency of this project label Mar 29, 2024

WeidiDeng mentioned this issue Mar 29, 2024

close quic connections when server closes #6202

Merged

mholt closed this as completed in #6202 Mar 29, 2024

This was referenced Apr 3, 2024

Implement graceful shutdown for http3 servers. quic-go/quic-go#4407

Open

http: close http3 server gracefully #6213

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delay in Caddy using changes to reverse_proxy upstream #6195

Delay in Caddy using changes to reverse_proxy upstream #6195

paularlott commented Mar 27, 2024

mholt commented Mar 27, 2024

mholt commented Mar 28, 2024

paularlott commented Mar 28, 2024

WeidiDeng commented Mar 29, 2024

WeidiDeng commented Mar 29, 2024

paularlott commented Mar 29, 2024

mholt commented Mar 29, 2024

WeidiDeng commented Apr 3, 2024

paularlott commented Apr 3, 2024

Delay in Caddy using changes to reverse_proxy upstream #6195

Delay in Caddy using changes to reverse_proxy upstream #6195

Comments

paularlott commented Mar 27, 2024

mholt commented Mar 27, 2024

mholt commented Mar 28, 2024

paularlott commented Mar 28, 2024

WeidiDeng commented Mar 29, 2024

WeidiDeng commented Mar 29, 2024

paularlott commented Mar 29, 2024

mholt commented Mar 29, 2024

WeidiDeng commented Apr 3, 2024

paularlott commented Apr 3, 2024