New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cluster.rb - no zombies #1887
cluster.rb - no zombies #1887
Conversation
Forgot to mention - test added here failed on current master... EDIT: The build on Travis was https://travis-ci.org/MSP-Greg/puma/builds/568063498. It failed in Ruby 2.2 thru 2.6 on Xenial, but I cancelled before 2.6 completed. |
Re CI, there are two Ruby 2.2.10 jobs, one failed, which seems to be an intermittent failure. See https://bugs.ruby-lang.org/issues/13239. There was a recent patch for it, but it was only backported to Ruby 2.3.x. The failure may involve As to the failure in 2.3.8, it passed several times in my fork. Not sure what to think of it... |
So a better test, rather than relying on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, as per my comment here, returning to the waitpid(-1)
behavior should reap all the zombies sooner or later.
while pid = Process.waitpid(-1, Process::WNOHANG) do | ||
pids << pid | ||
end | ||
@workers.reject! { |w| w.dead? || pids.include?(w.pid) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could move these lines into a reap_workers
method to match cull_workers
and spawn_workers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, I'll leave coding style up to others. It's four lines, probably only called by the method it's contained in, and any test would involve that method...
Co-authored-by: Matt Duszynski <mattduszynski@gmail.com>
jruby-head to bionic xcode10.2 to xcode11 all ruby 2.2 to 2.2.10
Updated. Thanks. |
Sometime ago Lines 38 to 45 in 3ea2471
If the first worker in the array is the one needing the longest time to shut down, the code will not check the others until it's waited for the first to complete. Maybe I've got something wrong, but this seems incorrect. So, should the code simply use a revised version of the code currently used for Ruby 2.6+ (removing the code for Ruby < 2.6)? If so, a commit here, or another PR? |
@@ -37,32 +37,36 @@ rvm: | |||
matrix: | |||
fast_finish: true | |||
include: | |||
- rvm: 2.2 | |||
dist: trusty | |||
- rvm: 2.2.10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future @MSP-Greg, I'd prefer you make changes to travis.yml in separate PRs. That way, they won't get blocked by the corresponding feature/bugfix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same.
While I was testing/debugging, Travis seemed to be acting up, and I wasn't sure if it was affecting the CI results I was seeing.. IOW, didn't just decide to throw it in, I was having failures...
How about I remove it right now & add another PR?
@MSP-Greg definitely a separate PR for anything regarding #stop_workers |
Does the paragraph after the code snippet make sense? JFYI, I'm looking at some tests where multiple requests that take a varying amount of time to complete are used, and maybe better testing as far as graceful shutdown/phased restart can be done. Also, externally killing workers and checking respawn. Not today... |
IDK, ask me later on that PR after I release 4.1 😆 |
Commits:
'cluster.rb - fixup for Cluster#check_workers, removes zombies?' - reverts problematic code in
Cluster#check_workers
to a simplified version of what was used for several years, thru 3.12.1. Current code has been revised from previous due to issues with newer Ruby versions, etc. Simplified version appears to be working correctly.'test_integration.rb - add zombie test' - uses test from PR Detach dead workers to prevent zombie processes #1864 (not merged, closed)
'travis.yml - misc updates' - updates a few items to make use of current/newer releases. Hoping to fix some issues with intermittent failures.