Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGTERM leaving zombie worker processes #1855

Closed
jgelo opened this issue Jul 18, 2019 · 10 comments
Closed

SIGTERM leaving zombie worker processes #1855

jgelo opened this issue Jul 18, 2019 · 10 comments
Labels

Comments

@jgelo
Copy link

jgelo commented Jul 18, 2019

I'm running Puma in clustered mode and use monit to watch worker memory usage. If any worker reaches a certain threshold it's sent a SIGTERM to die gracefully and be respawned by the master.

As of Puma v4, each worker that receives a SIGTERM leaves behind a permanent zombie process (until the master is killed/restarted). With Puma v3, I noticed that workers receiving a SIGTERM also result in a zombie process but it's removed once the worker is fully respawned.

Is there a new preferred way to kill/restart workers since v4 or is this a possible issue?

Puma config -

directory             RAILS_ROOT
environment           RAILS_ENV
pidfile               "#{RAILS_ROOT}/tmp/pids/puma.pid"
state_path            "#{RAILS_ROOT}/tmp/pids/puma.state"
stdout_redirect       "#{RAILS_ROOT}/log/puma.stdout.log", "#{RAILS_ROOT}/log/puma.stderr.log", true
bind                  'tcp://127.0.0.1:3000'
activate_control_app  "unix:///#{RAILS_ROOT}/tmp/pids/pumactl.sock", { no_token: true }
daemonize             true
workers               2
threads               1,10
prune_bundler

Steps to reproduce

  1. Start puma, check processes -
UID        PID  PPID  C STIME TTY          TIME CMD
jgelo    13152  1293  0 18:59 ?        00:00:00 puma 4.0.1 (tcp://127.0.0.1:3000) [test]
jgelo    13157 13152 69 18:59 ?        00:00:02 puma: cluster worker 0: 13152 [test]
jgelo    13159 13152 69 18:59 ?        00:00:02 puma: cluster worker 1: 13152 [test]
  1. Send a worker a SIGTERM (kill -s TERM 13157) -
UID        PID  PPID  C STIME TTY          TIME CMD
jgelo    13152  1293  0 18:59 ?        00:00:00 puma 4.0.1 (tcp://127.0.0.1:3000) [test]
jgelo    13157 13152  5 18:59 ?        00:00:02 [ruby] <defunct>
jgelo    13159 13152  5 18:59 ?        00:00:02 puma: cluster worker 1: 13152 [test]
jgelo    13225 13152 64 18:59 ?        00:00:02 puma: cluster worker 0: 13152 [test]

Expected behavior

No zombie/defunct worker process should exist.

System configuration

Ruby version: ruby 2.6.3p62 (2019-04-16 revision 67580) [x86_64-linux]
Rails version: 5.2.3
Puma version: 4.0.0 & 4.0.1

@nateberkopec
Copy link
Member

@nateberkopec
Copy link
Member

Sounds like we can do something to clean this up, but there's no process left around, just the entry in the process table, so this isn't a serious issue.

@andresbarcenas
Copy link

andresbarcenas commented Jul 22, 2019

I am experiencing the same issue running Puma 4.0.1:

deployer  29148  0.4  2.9 2459256 424428 ?      Sl   Jul21   5:49 puma: cluster worker 10: 73450 [20190717090909]
deployer  73450  0.0  0.1 453552 27072 ?        Sl   Jul17   2:49 puma 4.0.1 (unix:///home/deployer/apps/fore/current/tmp/sockets/puma.sock) [20190717090909]
deployer  73528  1.2  2.6 2324728 384720 ?      Sl   11:09   1:38 puma: cluster worker 5: 73450 [20190717090909]
deployer  86933  0.6  4.1 2567896 591692 ?      Sl   Jul19  30:12 puma: cluster worker 1: 73450 [20190717090909]
deployer  97801  0.5  3.3 2390784 475336 ?      Sl   Jul21  11:00 puma: cluster worker 7: 73450 [20190717090909]
deployer  98017  0.5  3.3 2392060 482064 ?      Sl   Jul21  10:40 puma: cluster worker 4: 73450 [20190717090909]
deployer  98493  0.5  3.1 2389828 445012 ?      Sl   Jul21  10:52 puma: cluster worker 2: 73450 [20190717090909]
deployer  98991  1.0  3.0 2389824 441476 ?      Sl   Jul21  19:14 puma: cluster worker 6: 73450 [20190717090909]
deployer  99216  0.6  3.4 2459536 490920 ?      Sl   Jul21  11:49 puma: cluster worker 9: 73450 [20190717090909]
deployer  99435  0.6  3.7 2339408 534264 ?      Sl   Jul21  11:41 puma: cluster worker 11: 73450 [20190717090909]
deployer  99789  0.5  3.2 2465088 472212 ?      Sl   Jul21  10:11 puma: cluster worker 3: 73450 [20190717090909]
deployer 100059  0.5  3.0 2391428 432680 ?      Sl   Jul21  10:40 puma: cluster worker 8: 73450 [20190717090909]


deployer  45700  0.1  0.0      0     0 ?        Z    Jul18   6:35 [ruby] <defunct>
deployer  50509  0.3  0.0      0     0 ?        Z    Jul17  27:58 [ruby] <defunct>
deployer  55182  0.1  0.0      0     0 ?        Z    Jul17  10:30 [ruby] <defunct>
deployer  73454  0.2  0.0      0     0 ?        Z    Jul17  20:58 [ruby] <defunct>
deployer  73456  0.2  0.0      0     0 ?        Z    Jul17  19:59 [ruby] <defunct>
deployer  73460  0.2  0.0      0     0 ?        Z    Jul17  21:04 [ruby] <defunct>
deployer  73461  0.2  0.0      0     0 ?        Z    Jul17  21:33 [ruby] <defunct>
deployer  73462  0.2  0.0      0     0 ?        Z    Jul17  20:42 [ruby] <defunct>
deployer  73464  0.2  0.0      0     0 ?        Z    Jul17  21:36 [ruby] <defunct>
deployer  73466  0.2  0.0      0     0 ?        Z    Jul17  20:41 [ruby] <defunct>
deployer  73467  0.2  0.0      0     0 ?        Z    Jul17  20:47 [ruby] <defunct>
deployer  73469  0.2  0.0      0     0 ?        Z    Jul17  22:53 [ruby] <defunct>
deployer  84149  0.4  0.0      0     0 ?        Z    Jul19  20:01 [ruby] <defunct>
deployer  84384  0.4  0.0      0     0 ?        Z    Jul19  19:54 [ruby] <defunct>
deployer  84929  0.4  0.0      0     0 ?        Z    Jul19  19:54 [ruby] <defunct>
deployer  85152  0.4  0.0      0     0 ?        Z    Jul19  20:22 [ruby] <defunct>
deployer  85401  0.4  0.0      0     0 ?        Z    Jul19  20:29 [ruby] <defunct>
deployer  85683  0.4  0.0      0     0 ?        Z    Jul19  19:32 [ruby] <defunct>
deployer  85901  0.4  0.0      0     0 ?        Z    Jul19  20:20 [ruby] <defunct>
deployer  86218  0.3  0.0      0     0 ?        Z    Jul19  18:27 [ruby] <defunct>
deployer  86454  0.4  0.0      0     0 ?        Z    Jul19  20:05 [ruby] <defunct>
deployer  86695  0.4  0.0      0     0 ?        Z    Jul19  21:00 [ruby] <defunct>

@wolfemm
Copy link

wolfemm commented Jul 30, 2019

Apparently Monit (v5.16 at least) includes these defunct processes when it counts child processes. Looking forward to the fix being merged and released.

@nateberkopec
Copy link
Member

🤦‍♂ Why would monit do that? You might want to open an issue over there, that seems unreasonable to me but I'm not a sysadmin so idk.

@qnighy
Copy link

qnighy commented Jul 31, 2019

We recently experienced the same issue after upgrading puma to 4.x, which forced us to revert it back to 3.x. It ate pids so fast that it hit the default maximum pid value of 49,152 in a few days, leading to a massive amount of ThreadErrors. What was worse is that it affected other services we were operating in the same container host. We're using puma in conjunction with puma_worker_killer and we intentionally configure it to kill workers in a relatively short period, so this might be related to the speed of the pid consumption.

Although there might be workarounds like increasing pid_max or adjusting puma_worker_killer configuration, we prefer a more straightforward fix if any. Is there any chance the issue is prioritized more regarding our cases?

@nateberkopec
Copy link
Member

nateberkopec commented Jul 31, 2019

I wonder why people experience this issue only with 4.x?

@qnighy
Copy link

qnighy commented Jul 31, 2019

For what we experienced, we (@south37) experimented in a staging environment and identified #1748 as the point where the issue started to occur. So there might be fallout from @workers in some conditions...

@nateberkopec
Copy link
Member

#1748 (comment)

Always the famous last words, right? 😆

@nateberkopec
Copy link
Member

Closed in #1887

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants