Improve documentation what hot restart means #2366

h0jeZvgoxFepBQ2C · 2020-09-14T17:22:56Z

I tried to switch from rolling restarts to hot restarts, since sometimes we use servers with many CPUs and the rolling restart takes much longer time...

But after setting it up, I realized that my puma workers get stopped and then AFTERWARDS the application gets preloaded, with following order:

workers shut down
application gets preloaded
new workers take over

but I really thought that preloading means following order:

new workers get booted & application gets preloaded
workers shut down
new workers take over

It would be great if you could a) confirm this behaviour (maybe I only did a mistake?) and b) improve the documentation for this kind of restart.

Thank you!

h0jeZvgoxFepBQ2C · 2020-09-22T12:14:15Z

Could someone tell me if the mentioned behaviour (first list) is the correct behaviour?

cjlarose · 2020-10-06T03:37:02Z

@h0jeZvgoxFepBQ2C Your (first) description of the hot restart behavior is correct for a deployment of puma in cluster mode.

The puma cluster start-up process is essentially something like this:

Puma master process starts
If preload_app is enabled, the application is preloaded in the puma master process.
Puma worker processes are forked from the puma master process

If you request the cluster to perform a hot restart (such as by sending SIGUSR2 directly, or by using pumactl restart), the following happens

Puma master process gracefully kills its worker processes (worker processes finish the requests they're working on)
The puma master process performs an exec operation to essentially reload itself starting again at step 1.

As you may have realized in your previous configuration, preload_app is incompatible with phased restarts. Since phased restarts kill workers and spin up new workers one-by-one, it can take a long time to complete a phased restart with a puma cluster that has many workers and an expensive startup sequence.

A new, experimental, puma configuration option called fork_worker might help in your case. If fork_worker is enabled, the startup sequence is like this:

Puma master process starts
The first worker process worker 0 is started
worker 0 preloads the application (regardless of whether or notpreload_app is enabled) and begins to take requests
All remaining workers are forked from worker 0 (creating new workers is cheap since the new workers do no need to load the application again)

During a phased restart:

All workers finish their current requests and all of them die
The master process creates a new worker 0, which preloads the new version of the application. It begins to take requests
All remaining workers are forked from the new worker 0.

Documentation for the fork_worker option is available separately: https://github.com/puma/puma/blob/b08976840ec70448b26b66b4a9134369ca3c3a61/docs/fork_worker.md

nateberkopec · 2020-10-06T13:13:29Z

If anyone wants to adapt ^^^ into new Docs, go ahead and open the PR!

h0jeZvgoxFepBQ2C · 2020-10-06T14:39:44Z

Thanks @wjordan for this PR! #2099

Looks pretty cool! Right now we can choose between hard downtimes with hot reloading, or really slow deployments on big machines (32 CPUs take f.e. 15 minutes deployment for 30 puma processes).. With this PR this should improve drastically! Great that you worked / are working on this! ❤️

cjlarose · 2020-10-06T18:51:00Z

If anyone wants to adapt ^^^ into new Docs, go ahead and open the PR!

I can do that. I've spent enough time tracing the code related to hot restarts and phased restarts that I could do it. The existing restart documentation is a little tough to follow. I might just rewrite it from scratch if you think that'd be okay.

cjlarose · 2020-10-06T19:07:52Z

Right now we can choose between hard downtimes with hot reloading, or really slow deployments on big machines (32 CPUs take f.e. 15 minutes deployment for 30 puma processes).. With this PR this should improve drastically!

If you do try this feature out, we'd love the feedback. Right now with the way fork_worker is implemented, there will still be some time where the cluster is unable to handle requests (whenever the cluster kills all workers before starting up a new worker 0), but I'd predict that the amount of time it takes to perform a phased restart with fork_worker on will be much faster than performing a phased restart with fork_worker off.

I think that fork_worker might need some changes to be the perfect solution for you since you still have a short loss of availability during a phased restart. That's a known limitation of fork_worker right now, and I'd be surprised if we didn't fix it at some point.

nateberkopec · 2020-10-06T21:14:41Z

I might just rewrite it from scratch if you think that'd be okay.

More than OK bro!

nateberkopec added the docs label Sep 14, 2020

nateberkopec added the contrib-wanted label Sep 27, 2020

cjlarose mentioned this issue Oct 22, 2020

Update restart documentation [changelog skip] [ci skip] #2444

Merged

8 tasks

nateberkopec closed this as completed in #2444 Oct 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve documentation what hot restart means #2366

Improve documentation what hot restart means #2366

h0jeZvgoxFepBQ2C commented Sep 14, 2020

h0jeZvgoxFepBQ2C commented Sep 22, 2020

cjlarose commented Oct 6, 2020 •

edited

nateberkopec commented Oct 6, 2020

h0jeZvgoxFepBQ2C commented Oct 6, 2020

cjlarose commented Oct 6, 2020

cjlarose commented Oct 6, 2020

nateberkopec commented Oct 6, 2020

Improve documentation what hot restart means #2366

Improve documentation what hot restart means #2366

Comments

h0jeZvgoxFepBQ2C commented Sep 14, 2020

h0jeZvgoxFepBQ2C commented Sep 22, 2020

cjlarose commented Oct 6, 2020 • edited

nateberkopec commented Oct 6, 2020

h0jeZvgoxFepBQ2C commented Oct 6, 2020

cjlarose commented Oct 6, 2020

cjlarose commented Oct 6, 2020

nateberkopec commented Oct 6, 2020

cjlarose commented Oct 6, 2020 •

edited