In Ruby on Rails v5.2 and Puma v4, after restarting puma at the time of logrotate, unnecessary workers continue to remain #1891

dounokouno · 2019-08-07T04:01:48Z

We are uses Ruby on Rails v5.2 and Puma v4, rotating the Rails application log with logrotate, and restarting puma when it is rotated.

We uses puma restart with pumactl phased-restart, but we used to send SIGUSR1 signal to restart it.

Specifically, it is the following code.

test -s $pid && kill -USR1 "$(cat /path/to/app/tmp/pids/puma.pid)"

Refs: https://github.com/puma/puma/blob/master/docs/restart.md

Although puma restarts without any problem for a while, after several restarts, unnecessary workers will be left as shown below. In the following, there are two worker 7s. Perhaps pid = 15241 seems to be an unnecessary worker.

$ ps aux | grep [p]uma
username     10096 22.8  3.0 3357088 1012940 ?     Sl   13:01   1:12 puma: cluster worker 0: 14749 [app]
username     10135 26.6  3.0 3562476 1016716 ?     Sl   13:01   1:22 puma: cluster worker 1: 14749 [app]
username     10196 17.9  3.4 3688000 1145328 ?     Sl   13:01   0:54 puma: cluster worker 2: 14749 [app]
username     10243 15.6  3.9 3753452 1287424 ?     Sl   13:01   0:46 puma: cluster worker 3: 14749 [app]
username     10325 16.8  3.6 3758280 1189004 ?     Sl   13:01   0:48 puma: cluster worker 4: 14749 [app]
username     10381 15.9  3.1 3423564 1034904 ?     Sl   13:01   0:45 puma: cluster worker 5: 14749 [app]
username     10420 16.6  3.2 3689844 1070112 ?     Sl   13:01   0:46 puma: cluster worker 6: 14749 [app]
username     10464 21.9  2.6 3155296 888148 ?      Sl   13:01   0:59 puma: cluster worker 7: 14749 [app]
username     14749  0.0  0.0 216884 20340 ?        Sl    7月19   0:56 puma 4.0.0 (tcp://0.0.0.0:9292) [app]
username     15241 60.7  9.2 5273704 3049272 ?     Sl   11:02  75:20 puma: cluster worker 7: 14749 [app]

My question is, I would like to know how to restart puma without leaving unnecessary workers.

Or I would like to know if there is a way to find out what each worker is doing to find out why this is the case.

Or, please tell me if there is a way to put in logs etc and check the situation after the fact.

The main usage version is as follows.

Ruby: 2.4.2
Ruby on Rails: 5.2.2.1
Puma: 4.0.0

As mentioned above, puma restarts are performed by pumactl phased-restart, but the situation was the same by sending SIGUSR1 signal.

The text was updated successfully, but these errors were encountered:

nateberkopec · 2019-08-07T14:17:30Z

Not a lot for us to go on or reproduce here. All I can say based on what you've provided is that we're working very actively on restart behavior lately, and to try 4.1 when it releases (very soon).

We need some sort of script or simpler scenario that reproduces the issue you're seeing.

Is the duplicated worker(s) always the last one (i.e., is it always worker #7, or sometimes #1 or #0?)

dounokouno · 2019-08-08T05:51:59Z

@nateberkopec

Thank you for your reply. I'm very much looking forward to the release of v4.1.

Is the duplicated worker(s) always the last one (i.e., is it always worker #7, or sometimes #1 or #0?)

No, not always the last one worker. Sometimes it may be #0 or #1.

nateberkopec · 2019-08-08T16:02:09Z

Tagging @MSP-Greg here as well because he's been looking so much at restarts lately, wondering if this triggers any thoughts

MSP-Greg · 2019-08-08T16:53:11Z

he's been looking so much at restarts lately

Too much.

'phased-restart' relies on the logic in Cluster#check_workers to call wait on all workers. restart uses the logic in Cluster#stop_workers With the current code in Cluster#check_workers, if a worker is SIGTERM'd, I beleive it will be respawned, but wait (or some of the logic so force it to exit) may not be called. Hence, what you're seeing may occur.

dounokouno · 2019-08-09T04:59:25Z

I have confirmed that v4.1 has been released 🎉 I would like to use it with our service.

https://github.com/puma/puma/releases/tag/v4.1.0

dounokouno · 2019-08-09T05:05:49Z

@MSP-Greg

Thank you for your explanation.

Is there a way to properly stop unnecessary puma worker(s) generated using phased-restart? Or is there any way to do a phased restart other than phased-restart? By the way, the unnecessary worker(s) generated by this issue remained for hours unless it was deleted with kill -9.

MSP-Greg · 2019-08-09T14:11:41Z

Is there a way to properly stop unnecessary puma worker(s) generated using phased-restart?

That's certainly how Puma should work.

remained for hours

Obviously, no one wants that happening.

Speaking only for myself, am I 100% certain that this issue has been resolved in 4.1.0? No. Something similar to #1892 may be needed.

In some respects, phased-restart and externally stopping workers are similar, in that they both involve 'cleaning up' and starting a new worker/workers.

But, the code paths are not identical, and there may be cases where 'misbehaving' workers are not handled correctly. There's also the issue of when should a new worker be created. Should it be created when another worker is 'told to stop', or when the other worker has actually exited?

Also, should the timers involved for 'external stop' versus 'phased-restart' be the same, or different?

Sorry for the brief rant, but it is messy, and testing for it is also messy. If there are still issues remaining, they will get quickly fixed...

dounokouno · 2019-08-19T01:41:58Z

@MSP-Greg

Thank you for many replies.

First, I report that the symptoms did not improve after updating to v4.1.

Does "externally stopping" mean "send SIGUSR1 signal"?

Both "phased-restart" and "send SIGUSR1 signal" are executed with logrotate, and the rotation interval is 1 hour.

Is it possible to output a log of what each worker does when phased-restart is executed? I would like to be able to investigate the timing at which the worker continues without stopping by outputting logs such as stop command received, stop started, etc.

nateberkopec · 2019-09-02T15:06:42Z

Closed by #1908.

dounokouno · 2019-10-16T03:41:10Z

I will report.

After puma v4.1.1 was released, I immediately updated to v4.1.1. After that, there was no situation where unnecessary workers remained for about a month.

Thank you for your quick response and daily maintenance work. We will continue to use puma in the future.

nateberkopec added needs-repro restart labels Aug 7, 2019

nateberkopec closed this as completed Sep 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In Ruby on Rails v5.2 and Puma v4, after restarting puma at the time of logrotate, unnecessary workers continue to remain #1891

In Ruby on Rails v5.2 and Puma v4, after restarting puma at the time of logrotate, unnecessary workers continue to remain #1891

dounokouno commented Aug 7, 2019

nateberkopec commented Aug 7, 2019

dounokouno commented Aug 8, 2019

nateberkopec commented Aug 8, 2019

MSP-Greg commented Aug 8, 2019

dounokouno commented Aug 9, 2019

dounokouno commented Aug 9, 2019

MSP-Greg commented Aug 9, 2019

dounokouno commented Aug 19, 2019

nateberkopec commented Sep 2, 2019

dounokouno commented Oct 16, 2019

In Ruby on Rails v5.2 and Puma v4, after restarting puma at the time of logrotate, unnecessary workers continue to remain #1891

In Ruby on Rails v5.2 and Puma v4, after restarting puma at the time of logrotate, unnecessary workers continue to remain #1891

Comments

dounokouno commented Aug 7, 2019

nateberkopec commented Aug 7, 2019

dounokouno commented Aug 8, 2019

nateberkopec commented Aug 8, 2019

MSP-Greg commented Aug 8, 2019

dounokouno commented Aug 9, 2019

dounokouno commented Aug 9, 2019

MSP-Greg commented Aug 9, 2019

dounokouno commented Aug 19, 2019

nateberkopec commented Sep 2, 2019

dounokouno commented Oct 16, 2019