Skip to content

Commit

Permalink
Fix child processes not being reaped with PID 1
Browse files Browse the repository at this point in the history
Starting with Puma v6.4.1, we observed that killed Puma cluster
workers were never being restarted when the parent was run as PID
1. For example, I issued a `kill 44` and PID 44 remained in the
`defunct` state:

```
git@gitlab-webservice-default-78664bb757-2nxvh:/var/log/gitlab$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
git            1       0  0 Jan09 ?        00:01:39 puma 6.4.1 (tcp://0.0.0.0:8080) [gitlab-puma-worker]
git           23       1  0 Jan09 ?        00:05:46 /usr/local/bin/gitlab-logger /var/log/gitlab
git           41       1  0 Jan09 ?        00:01:55 ruby /srv/gitlab/bin/metrics-server
git           44       1  0 Jan09 ?        00:02:41 [ruby] <defunct>
git           46       1  0 Jan09 ?        00:02:38 puma: cluster worker 1: 1 [gitlab-puma-worker]
git           48       1  0 Jan09 ?        00:02:42 puma: cluster worker 2: 1 [gitlab-puma-worker]
git           49       1  0 Jan09 ?        00:02:41 puma: cluster worker 3: 1 [gitlab-puma-worker]
git         5205       0  0 21:57 pts/0    00:00:00 bash
git         5331    5205  0 22:00 pts/0    00:00:00 ps -ef
```

Further investigation showed that the introduction of
`Process.wait2(-1, Process::WNOHANG)` in puma#3255 never appears to return
anything when:

1. The parent PID is 1.
2. `Process.detach(some PID != 1)` is run after a `Process.spawn`.
This bug appears to be present in Ruby 3.1 and 3.2, but it seems to
have been fixed in Ruby 3.3.

Previously `Process.wait(w.pid, Process::WNOHANG)` was called on each
known worker PID. puma#3255 changed this behavior to do this only if the
`fork_worker` config parameter were enabled, but it seems that we
should always do this.

Closes puma#3313
  • Loading branch information
stanhu committed Jan 11, 2024
1 parent 704d251 commit bd08e8a
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions lib/puma/cluster.rb
Expand Up @@ -560,9 +560,12 @@ def wait_workers
@workers.reject! do |w|
next false if w.pid.nil?
begin
# When `fork_worker` is enabled, some worker may not be direct children, but grand children.
# Because of this they won't be reaped by `Process.wait2(-1)`, so we need to check them individually)
if reaped_children.delete(w.pid) || (@options[:fork_worker] && Process.wait(w.pid, Process::WNOHANG))
# We may need to check the PID individually because:
# 1. In versions prior to Ruby 3.3, `Process.detach` appears to prevent
# `Process.wait2(-1)` from working when the parent PID is 1.
# 2. When `fork_worker` is enabled, some worker may not be direct children,
# but grand children. Because of this they won't be reaped by `Process.wait2(-1)`.
if reaped_children.delete(w.pid) || Process.wait(w.pid, Process::WNOHANG)
true
else
w.term if w.term?
Expand Down

0 comments on commit bd08e8a

Please sign in to comment.