Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a hook for "before shutdown" #3327

Open
jmuia opened this issue Feb 5, 2024 · 6 comments
Open

Add a hook for "before shutdown" #3327

jmuia opened this issue Feb 5, 2024 · 6 comments
Labels

Comments

@jmuia
Copy link

jmuia commented Feb 5, 2024

Is your feature request related to a problem? Please describe.
In our environment, when a server is shutting down (e.g. in response to a SIGTERM) it's a common pattern to fail healthcheck requests for N seconds prior to draining existing connections and closing the listener. This gives clients (and load balancers) a chance to start sending traffic elsewhere.

At the moment, it doesn't seem that Puma supports implementing this pattern because existing shutdown hooks (e.g. :on_stopped event) run after the listener has been closed (link)

Describe the solution you'd like
I'd like puma to support the above pattern. I think this would be possible by providing a new event or hook that fires before listeners are closed and workers are stopped. Alternatively, the existing :on_stopped event could be moved before the listeners are closed.

I have a branch here that shows the rough shape of adding a new event.

Describe alternatives you've considered

  • Custom signal handlers. At the moment, Puma overrides existing signal handlers. Alternatively, it may be possible to override the Puma signal handler instead, run our logic, and then run the Puma signal handler.
  • Wrapping the Ruby application with another process that can handle the signal as desired before forwarding it to the Ruby app
@MSP-Greg
Copy link
Member

MSP-Greg commented Feb 5, 2024

@jmuia

Haven't had time to think about this, but one question:

it's a common pattern to fail healthcheck requests for N seconds prior to draining existing connections and closing the listener.

So, you want Puma to 'wait N seconds, then drain existing connections'? Is that delay needed? Also, often 'drain' used, but there are three types. Also, see docs for Puma::DSL#drain_on_shutdown

  1. Backlog connections that haven't been accepted
  2. Connections that have been accepted, but not completed the request
  3. Connections that are 'keep-alive', and may have additional requests

@jmuia
Copy link
Author

jmuia commented Feb 5, 2024

Yep, exactly – the pattern is effectively:

  1. Receive SIGTERM
  2. Continue processing new connections and requests. Only fail (e.g. 503) healthcheck requests.
  3. After N seconds, close the listener and process any pending requests (let's connections that have been accepted).

The delay is needed to allow clients and load balancers to detect that this server shouldn't receive any more traffic.

DSL#drain_on_shutdown paired with checking Server#shutting_down? in the healthcheck handler (allowing us to fail healthcheck requests) does seem promising. I see two downsides:

  1. For a low traffic server, the accept queue could be empty causing us to exit the accept loop. There could be a race condition with an incoming request.
  2. A server will always receive healthcheck requests, which could mean that with drain_on_shutdown we never exit the accept loop (seems unlikely)

@dentarg
Copy link
Member

dentarg commented Feb 5, 2024

Hmm, isn't it better to instruct your load balancers to take out the server of the rotation when you want to restart it? (Not sure what you're using but HAProxy seems to have a drain mode for this)

@jmuia
Copy link
Author

jmuia commented Feb 5, 2024

Hmm, isn't it better to instruct your load balancers to take out the server of the rotation when you want to restart it?

This would be a good solution! Unfortunately, in this environment it's not an option. We're using client-side load balancers and real-time health information comes from client-side health check requests.

@dentarg
Copy link
Member

dentarg commented Feb 5, 2024

Sounds cool… feel free to elaborate :)

@jmuia
Copy link
Author

jmuia commented Feb 6, 2024

After some thinking, I could see another viable alternative: add a min/max drain time to Puma::DSL#drain_on_shutdown (in other words, continue the accept() loop until we've waited the minimum drain time and there are either no more connections or we've spent the max drain time).

Sounds cool… feel free to elaborate :)

Of course! We're using Envoy as a sidecar mesh proxy in this environment – it proxies requests on both the client and server hosts. In our current configuration, health checks are an important part of the "draining" process. Please let me know if there are other specifics you're interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants