Add a hook for "before shutdown" #3327

jmuia · 2024-02-05T17:34:23Z

Is your feature request related to a problem? Please describe.
In our environment, when a server is shutting down (e.g. in response to a SIGTERM) it's a common pattern to fail healthcheck requests for N seconds prior to draining existing connections and closing the listener. This gives clients (and load balancers) a chance to start sending traffic elsewhere.

At the moment, it doesn't seem that Puma supports implementing this pattern because existing shutdown hooks (e.g. :on_stopped event) run after the listener has been closed (link)

Describe the solution you'd like
I'd like puma to support the above pattern. I think this would be possible by providing a new event or hook that fires before listeners are closed and workers are stopped. Alternatively, the existing :on_stopped event could be moved before the listeners are closed.

I have a branch here that shows the rough shape of adding a new event.

Describe alternatives you've considered

Custom signal handlers. At the moment, Puma overrides existing signal handlers. Alternatively, it may be possible to override the Puma signal handler instead, run our logic, and then run the Puma signal handler.
Wrapping the Ruby application with another process that can handle the signal as desired before forwarding it to the Ruby app

The text was updated successfully, but these errors were encountered:

MSP-Greg · 2024-02-05T18:37:55Z

@jmuia

Haven't had time to think about this, but one question:

it's a common pattern to fail healthcheck requests for N seconds prior to draining existing connections and closing the listener.

So, you want Puma to 'wait N seconds, then drain existing connections'? Is that delay needed? Also, often 'drain' used, but there are three types. Also, see docs for Puma::DSL#drain_on_shutdown

Backlog connections that haven't been accepted
Connections that have been accepted, but not completed the request
Connections that are 'keep-alive', and may have additional requests

jmuia · 2024-02-05T19:58:54Z

Yep, exactly – the pattern is effectively:

Receive SIGTERM
Continue processing new connections and requests. Only fail (e.g. 503) healthcheck requests.
After N seconds, close the listener and process any pending requests (let's connections that have been accepted).

The delay is needed to allow clients and load balancers to detect that this server shouldn't receive any more traffic.

DSL#drain_on_shutdown paired with checking Server#shutting_down? in the healthcheck handler (allowing us to fail healthcheck requests) does seem promising. I see two downsides:

For a low traffic server, the accept queue could be empty causing us to exit the accept loop. There could be a race condition with an incoming request.
A server will always receive healthcheck requests, which could mean that with drain_on_shutdown we never exit the accept loop (seems unlikely)

dentarg · 2024-02-05T20:18:59Z

Hmm, isn't it better to instruct your load balancers to take out the server of the rotation when you want to restart it? (Not sure what you're using but HAProxy seems to have a drain mode for this)

jmuia · 2024-02-05T21:44:07Z

Hmm, isn't it better to instruct your load balancers to take out the server of the rotation when you want to restart it?

This would be a good solution! Unfortunately, in this environment it's not an option. We're using client-side load balancers and real-time health information comes from client-side health check requests.

dentarg · 2024-02-05T22:15:44Z

Sounds cool… feel free to elaborate :)

jmuia · 2024-02-06T18:17:47Z

After some thinking, I could see another viable alternative: add a min/max drain time to Puma::DSL#drain_on_shutdown (in other words, continue the accept() loop until we've waited the minimum drain time and there are either no more connections or we've spent the max drain time).

Sounds cool… feel free to elaborate :)

Of course! We're using Envoy as a sidecar mesh proxy in this environment – it proxies requests on both the client and server hosts. In our current configuration, health checks are an important part of the "draining" process. Please let me know if there are other specifics you're interested.

dentarg added the feature label Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a hook for "before shutdown" #3327

Add a hook for "before shutdown" #3327

jmuia commented Feb 5, 2024

MSP-Greg commented Feb 5, 2024

jmuia commented Feb 5, 2024

dentarg commented Feb 5, 2024

jmuia commented Feb 5, 2024

dentarg commented Feb 5, 2024

jmuia commented Feb 6, 2024

Add a hook for "before shutdown" #3327

Add a hook for "before shutdown" #3327

Comments

jmuia commented Feb 5, 2024

MSP-Greg commented Feb 5, 2024

jmuia commented Feb 5, 2024

dentarg commented Feb 5, 2024

jmuia commented Feb 5, 2024

dentarg commented Feb 5, 2024

jmuia commented Feb 6, 2024