Skip to content

Commit

Permalink
Fix child processes not being reaped with PID 1
Browse files Browse the repository at this point in the history
Starting with Puma v6.4.1, we observed that killed Puma cluster
workers were never being restarted when the parent was run as PID
1. For example, I issued a `kill 44` and PID 44 remained in the
`defunct` state:

```
git@gitlab-webservice-default-78664bb757-2nxvh:/var/log/gitlab$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
git            1       0  0 Jan09 ?        00:01:39 puma 6.4.1 (tcp://0.0.0.0:8080) [gitlab-puma-worker]
git           23       1  0 Jan09 ?        00:05:46 /usr/local/bin/gitlab-logger /var/log/gitlab
git           41       1  0 Jan09 ?        00:01:55 ruby /srv/gitlab/bin/metrics-server
git           44       1  0 Jan09 ?        00:02:41 [ruby] <defunct>
git           46       1  0 Jan09 ?        00:02:38 puma: cluster worker 1: 1 [gitlab-puma-worker]
git           48       1  0 Jan09 ?        00:02:42 puma: cluster worker 2: 1 [gitlab-puma-worker]
git           49       1  0 Jan09 ?        00:02:41 puma: cluster worker 3: 1 [gitlab-puma-worker]
git         5205       0  0 21:57 pts/0    00:00:00 bash
git         5331    5205  0 22:00 pts/0    00:00:00 ps -ef
```

Further investigation showed that the introduction of
`Process.wait2(-1, Process::WNOHANG)` in puma#3255 never appears to return
anything inside Google Kubernetes Engine running as PID 1.

Previously `Process.wait(w.pid, Process::WNOHANG)` was called on each
known worker PID. puma#3255 changed this behavior to do this only if the
`fork_worker` config parameter were enabled, but it seems that we
should always do this.

Closes puma#3313
  • Loading branch information
stanhu committed Jan 10, 2024
1 parent 704d251 commit 83f53bd
Showing 1 changed file with 5 additions and 3 deletions.
8 changes: 5 additions & 3 deletions lib/puma/cluster.rb
Expand Up @@ -560,9 +560,11 @@ def wait_workers
@workers.reject! do |w|
next false if w.pid.nil?
begin
# When `fork_worker` is enabled, some worker may not be direct children, but grand children.
# Because of this they won't be reaped by `Process.wait2(-1)`, so we need to check them individually)
if reaped_children.delete(w.pid) || (@options[:fork_worker] && Process.wait(w.pid, Process::WNOHANG))
# We may need to check the PID individually because:
# 1. On some environments running as PID 1, `Process.wait2(-1)` might not return anything.
# 2. When `fork_worker` is enabled, some worker may not be direct children, but grand children.
# Because of this they won't be reaped by `Process.wait2(-1)`.
if reaped_children.delete(w.pid) || Process.wait(w.pid, Process::WNOHANG)
true
else
w.term if w.term?
Expand Down

0 comments on commit 83f53bd

Please sign in to comment.