Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track and wait on dead workers to prevent zombies #1884

Closed
wants to merge 1 commit into from

Conversation

evanphx
Copy link
Member

@evanphx evanphx commented Aug 5, 2019

This fixes a bug introduced in #1748 and supersedes the work in PR #1864.

The bug was the the old logic used -1 to Process.waitpid which would catch any pids, including ones that we accidentally already removed from @workers before waiting on them.

This logic tracks the dead workers separately so we're sure to wait on them.

@dentarg
Copy link
Member

dentarg commented Aug 5, 2019

Should the test from #1864 be added too?

@MSP-Greg
Copy link
Member

MSP-Greg commented Aug 6, 2019

I looked at this in both 3.12.1 and master with the test from #1864 added. Some interesting things, along with the issues that have appeared with Ruby 2.6+.

Anyway, if the code in check_workers that is causing these issues is replaced with:

pids = []
while pid = Process.waitpid(-1, Process::WNOHANG) do
  pids << pid
end
@workers.reject! { |w| w.dead? || pids.include?(w.pid) }

Everything passes (with a few other minor changes), including the mentioned test, but Ruby 2.2 fails intermittently on the new zombie test. It's essentially a simplification of the code that existed in 3.12.1. but it does use '-1', which you've mentioned you'd prefer not to use...

Top two commits at https://github.com/MSP-Greg/puma/commits/zombies. I deleted one that may make 2.2 failures less common.

Travis test is: https://travis-ci.org/MSP-Greg/puma/builds/568144883

@evanphx
Copy link
Member Author

evanphx commented Aug 6, 2019

@MSP-Greg Oh great! Please open a PR against that branch and we'll use it.

@MSP-Greg
Copy link
Member

MSP-Greg commented Aug 6, 2019

@evanphx - see PR #1887. Testing right now. Sorry for jumping the gun, but I've been looking at this stuff too much... Now watch, Appveyor will fail...

@MSP-Greg
Copy link
Member

MSP-Greg commented Aug 6, 2019

@evanphx

When you have a minute, a look at #1836 would be appreciated.

The only real feature added is no_tlsv1_1, which was mostly about duplicating the code from #no_tlsv1. A minimum_protocol_version or something similar would be better, but I don't think we'll need to add a no_tlsv1_2 parameter/option anytime soon...

Otherwise, it's a few things that make SSL testing better, as there are some false positives, etc happening. Obviously, false positives are not good. Also, I've got a patch for the 'http & OpenSSL 1.1.1 (TLSv1.3)' problem, but I really would rather do it on top of #1836... Thanks, Greg

EDIT: and, I tested 1836 locally, as everything in it affects all platforms...

@evanphx
Copy link
Member Author

evanphx commented Aug 6, 2019

@MSP-Greg No problem, merged #1836.

@MSP-Greg
Copy link
Member

MSP-Greg commented Aug 6, 2019

@evanphx

Thanks. I'm fixing Travis in #1887 'cluster.rb - no zombies' right now

@nateberkopec
Copy link
Member

just pasting this comment here for my own review later: #1864 (comment)

@nateberkopec
Copy link
Member

Closing in favor of #1887

@nateberkopec nateberkopec deleted the b-prevent-zombies branch March 14, 2020 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants