Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Worker external TERM signalling #1952

Merged
merged 1 commit into from Sep 19, 2019

Conversation

MSP-Greg
Copy link
Member

@MSP-Greg MSP-Greg commented Sep 6, 2019

External Worker TERM signals 'enter' on the forked side of the worker, but need to be communicated to other side, especially when an externally TERM'd worker is 'stuck'.

Without doing so, a stuck worker will never be KILL'ed by Puma, regardless of whether the timers run out...

External Worker TERM signals 'enter' on the forked side of the worker, but need to be communicated to other side
@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 6, 2019

Travis failed on XCode11 / Ruby 2.5.6 during 'bundle install', didn't even get to the compile step...

Actions Tests (all passing): https://github.com/MSP-Greg/puma/runs/214899119

@nateberkopec
Copy link
Member

Test?

Sent with GitHawk

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 7, 2019

Test?

  1. This should never happen: https://travis-ci.org/puma/puma/jobs/581195972#L468, as there should be several times that Worker#term checks the @options[:worker_shutdown_timeout] setting, and hence, KILL's the worker. So, until the test are updated, the current test is TestIntegration#test_worker_spawn_external_term.

  2. As mentioned above, I did the work with a revised test_integration.rb, and the test failures were more consistent.

  3. If this isn't a bug, how does the current code allow an externally TERM'd worker to check the @options[:worker_shutdown_timeout] setting? The only place in the code that an externally sent TERM signal is visible is here in Cluster#worker:

    puma/lib/puma/cluster.rb

    Lines 268 to 270 in d5c394e

    Signal.trap "SIGTERM" do
    server.stop
    end

Lastly, and a separate issue, I think test 'retry' should be removed, but more work is needed in the test files to make them more consistent. Same for parallel, which should be used...

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 7, 2019

Since I've occasionally not provided clear descriptions...

While working on #1908, the main concern was that @options[:worker_shutdown_timeout] was being accounted for. Both code review and CI seemed to indicate it was not. I assumed that an externally TERM'd worker was something that Puma was aware of. Two tests were added, one using 'phased-restart', the other using external TERM.

Then, while working on getting test_integration.rb cleaned up and also changing both of the above tests, the 'phased-restart' test was consistently passing, but the external TERM was not. After adding some debugging, it was clear that an externally TERM'd worker was not being handled correctly. As mentioned above, the code review seems to also indicate that.

Finally, this code only affects frozen/stuck workers that are externally TERM'd but don't respond, and hence, need a KILL signal. A worker that properly shuts down from the TERM signal is currently working fine.

@nateberkopec
Copy link
Member

clear descriptions

I disagree Greg! You provide very complete descriptions of your thoughts, and it is very useful.

Sent with GitHawk

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 8, 2019

Nate,

Thanks. Really.

  1. I have good and bad days.

  2. I'm very capable of going in the wrong direction.

  3. After wrestling with the code paths, typos, the testing, etc, I may not be to excited about writing clear and complete descriptions about what I've done & why.

Puma is supposed to keep smoothly running regardless of the countless ways that code it's connected to can crash and burn. That can make it difficult for contributors/maintainers, and makes it very important that tests are not giving 'false positive' results...

JFYI, when running tests parallel, Minitest's time details may be incorrect or even missing. Retry may be affecting that. Ideally, we can get the tests to the point where they don't need retry...

MSP-Greg added a commit to MSP-Greg/puma that referenced this pull request Sep 9, 2019
1. binder.rb   Add PR puma#1886

2. cluster.rb  Add PR puma#1952, additional minor changes

3. launcher.rb #close_binder_listeners close io's as fast as possible
MSP-Greg added a commit to MSP-Greg/puma that referenced this pull request Sep 10, 2019
1. binder.rb   Add PR puma#1886

2. cluster.rb  Add PR puma#1952, additional minor changes

3. launcher.rb #close_binder_listeners close io's as fast as possible
@nateberkopec
Copy link
Member

Closing to look at #1956 in its entirety instead

@MSP-Greg MSP-Greg deleted the fix-external-worker-term branch September 18, 2019 15:53
@MSP-Greg MSP-Greg restored the fix-external-worker-term branch September 18, 2019 16:35
@MSP-Greg
Copy link
Member Author

MSP-Greg commented Sep 18, 2019

@nateberkopec

Forgot. I think this needs to be merged, as #1956 has been replaced by #1965, which is only test files...

MSP-Greg added a commit to MSP-Greg/puma that referenced this pull request Sep 18, 2019
MSP-Greg added a commit to MSP-Greg/puma that referenced this pull request Sep 18, 2019
MSP-Greg added a commit to MSP-Greg/puma that referenced this pull request Sep 18, 2019
@nateberkopec nateberkopec reopened this Sep 19, 2019
@nateberkopec nateberkopec merged commit 8eac8d4 into puma:master Sep 19, 2019
@MSP-Greg MSP-Greg deleted the fix-external-worker-term branch September 20, 2020 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants