Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation what hot restart means #2366

Closed
h0jeZvgoxFepBQ2C opened this issue Sep 14, 2020 · 7 comments · Fixed by #2444
Closed

Improve documentation what hot restart means #2366

h0jeZvgoxFepBQ2C opened this issue Sep 14, 2020 · 7 comments · Fixed by #2444

Comments

@h0jeZvgoxFepBQ2C
Copy link

I tried to switch from rolling restarts to hot restarts, since sometimes we use servers with many CPUs and the rolling restart takes much longer time...

But after setting it up, I realized that my puma workers get stopped and then AFTERWARDS the application gets preloaded, with following order:

  • workers shut down
  • application gets preloaded
  • new workers take over

but I really thought that preloading means following order:

  • new workers get booted & application gets preloaded
  • workers shut down
  • new workers take over

It would be great if you could a) confirm this behaviour (maybe I only did a mistake?) and b) improve the documentation for this kind of restart.

Thank you!

@h0jeZvgoxFepBQ2C
Copy link
Author

Could someone tell me if the mentioned behaviour (first list) is the correct behaviour?

@cjlarose
Copy link
Member

cjlarose commented Oct 6, 2020

@h0jeZvgoxFepBQ2C Your (first) description of the hot restart behavior is correct for a deployment of puma in cluster mode.

The puma cluster start-up process is essentially something like this:

  1. Puma master process starts
  2. If preload_app is enabled, the application is preloaded in the puma master process.
  3. Puma worker processes are forked from the puma master process

If you request the cluster to perform a hot restart (such as by sending SIGUSR2 directly, or by using pumactl restart), the following happens

  1. Puma master process gracefully kills its worker processes (worker processes finish the requests they're working on)
  2. The puma master process performs an exec operation to essentially reload itself starting again at step 1.

As you may have realized in your previous configuration, preload_app is incompatible with phased restarts. Since phased restarts kill workers and spin up new workers one-by-one, it can take a long time to complete a phased restart with a puma cluster that has many workers and an expensive startup sequence.

A new, experimental, puma configuration option called fork_worker might help in your case. If fork_worker is enabled, the startup sequence is like this:

  1. Puma master process starts
  2. The first worker process worker 0 is started
  3. worker 0 preloads the application (regardless of whether or notpreload_app is enabled) and begins to take requests
  4. All remaining workers are forked from worker 0 (creating new workers is cheap since the new workers do no need to load the application again)

During a phased restart:

  1. All workers finish their current requests and all of them die
  2. The master process creates a new worker 0, which preloads the new version of the application. It begins to take requests
  3. All remaining workers are forked from the new worker 0.

Documentation for the fork_worker option is available separately: https://github.com/puma/puma/blob/b08976840ec70448b26b66b4a9134369ca3c3a61/docs/fork_worker.md

@nateberkopec
Copy link
Member

If anyone wants to adapt ^^^ into new Docs, go ahead and open the PR!

@h0jeZvgoxFepBQ2C
Copy link
Author

Thanks @wjordan for this PR! #2099

Looks pretty cool! Right now we can choose between hard downtimes with hot reloading, or really slow deployments on big machines (32 CPUs take f.e. 15 minutes deployment for 30 puma processes).. With this PR this should improve drastically! Great that you worked / are working on this! ❤️

@cjlarose
Copy link
Member

cjlarose commented Oct 6, 2020

If anyone wants to adapt ^^^ into new Docs, go ahead and open the PR!

I can do that. I've spent enough time tracing the code related to hot restarts and phased restarts that I could do it. The existing restart documentation is a little tough to follow. I might just rewrite it from scratch if you think that'd be okay.

@cjlarose
Copy link
Member

cjlarose commented Oct 6, 2020

Right now we can choose between hard downtimes with hot reloading, or really slow deployments on big machines (32 CPUs take f.e. 15 minutes deployment for 30 puma processes).. With this PR this should improve drastically!

If you do try this feature out, we'd love the feedback. Right now with the way fork_worker is implemented, there will still be some time where the cluster is unable to handle requests (whenever the cluster kills all workers before starting up a new worker 0), but I'd predict that the amount of time it takes to perform a phased restart with fork_worker on will be much faster than performing a phased restart with fork_worker off.

I think that fork_worker might need some changes to be the perfect solution for you since you still have a short loss of availability during a phased restart. That's a known limitation of fork_worker right now, and I'd be surprised if we didn't fix it at some point.

@nateberkopec
Copy link
Member

I might just rewrite it from scratch if you think that'd be okay.

More than OK bro!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants