Return proper exit code for TERM signal #1337

shayonj · 2017-06-19T03:02:15Z

Attempt at returning the proper exit code (128+15) when TERM signal
is sent to the server, for both single and clustered mode.

The changes are achieved by restoring signal from within the trap
and accordingly killing the process using TERM event.

Added plus, stopping single mode gracefully now.

Potential fix for : #1183

dekellum · 2017-07-11T16:29:38Z

Note if this is merged (or #1183 is fixed as intended) that systemd service configs, as in /docs/systemd, should be updated to include:

SuccessExitStatus=143

...to avoid normal shutdown/restart reporting warning and a failed state. I'm assuming this isn't a systemd default because traditional, forked unix daemons only return an exit status during startup (before forking away).

shayonj · 2017-07-15T22:02:26Z

@dekellum Thanks for the review!

I believe both SIGINT and SIGTERM are already part of existing success exit handlers for SuccessExitStatus

Takes a list of exit status definitions that, when returned by the main service process, will be considered successful termination, in addition to the normal successful exit code 0 and the signals SIGHUP, SIGINT, SIGTERM, and SIGPIPE.

Source: https://www.freedesktop.org/software/systemd/man/systemd.service.html

So, for instance, if we were to run a systemd service, with Restart= with on-success, a TERM signal will be interpreted as a success and a proper restart would be conducted by the systemd service.

Existing implementations still wont be affected, because the exit status being returned until this PR was 0, which is also considered success exit in this case.

Let me know if I missed something :) and/or if you think it might be worth adding a note about success exit handlers in https://github.com/puma/puma/blob/master/docs/systemd.md ?

dekellum · 2017-07-17T19:27:02Z

I agree that your interpretation of the systemd docs is plausible and that the docs are not particularly clear. I've assumed that SuccessExitStatus=143 is necessary because I similarly had to do this with a java-based project. (JVM exits with 143 by SIGTERM) See this serverfault answer. Without SuccessExitStatus=143 on that project, I would get the following on a systemctl stop:

Jul 17 12:05:24 klein systemd[877]: Stopping Guppy (Fishwife) HTTP Server...
Jul 17 12:05:24 klein systemd[877]: guppy.service: Main process exited, code=exited, status=143/n/a
Jul 17 12:05:24 klein systemd[877]: Stopped Guppy (Fishwife) HTTP Server.
Jul 17 12:05:24 klein systemd[877]: guppy.service: Unit entered failed state.
Jul 17 12:05:24 klein systemd[877]: guppy.service: Failed with result 'exit-code'.

But perhaps the JVM is doing something wrong, while your PR does it correctly? Next I've gone ahead and setup a direct test of your changes, using MRI ruby. Here is the complete tree for this test application (branch 'puma'):

https://github.com/dekellum/lionfish

With this setup if I use your exact PR branch, it fails on startup here:

Jul 17 11:33:46 klein systemd[877]: Started Lionfish (Puma) HTTP Server.
Jul 17 11:33:46 klein puma[2025]: % Registered tcp:0.0.0.0:9238 for activation from LISTEN_FDS
Jul 17 11:33:46 klein puma[2025]: /home/david/.gem/ruby/2.2.0/bundler/gems/puma-fd38f5c8ce97/lib/puma/launcher.rb:171:in `run': undefined method `include?' for nil:NilClass (
Jul 17 11:33:46 klein puma[2025]:         from /home/david/.gem/ruby/2.2.0/bundler/gems/puma-fd38f5c8ce97/lib/puma/cli.rb:77:in `run'
Jul 17 11:33:46 klein puma[2025]:         from /home/david/.gem/ruby/2.2.0/bundler/gems/puma-fd38f5c8ce97/bin/puma:10:in `<top (required)>'
Jul 17 11:33:46 klein puma[2025]:         from /home/david/src/lionfish/sbin/puma:17:in `load'
Jul 17 11:33:46 klein puma[2025]:         from /home/david/src/lionfish/sbin/puma:17:in `<main>'
Jul 17 11:33:46 klein systemd[877]: lionfish.service: Main process exited, code=exited, status=1/FAILURE
Jul 17 11:33:46 klein systemd[877]: lionfish.service: Unit entered failed state.
Jul 17 11:33:46 klein systemd[877]: lionfish.service: Failed with result 'exit-code'.
Jul 17 11:33:47 klein systemd[877]: lionfish.service: Service hold-off time over, scheduling restart.
Jul 17 11:33:47 klein systemd[877]: Stopped Lionfish (Puma) HTTP Server.
Jul 17 11:33:47 klein systemd[877]: lionfish.service: Start request repeated too quickly.
Jul 17 11:33:47 klein systemd[877]: Failed to start Lionfish (Puma) HTTP Server.
Jul 17 11:33:47 klein systemd[877]: lionfish.service: Unit entered failed state.
Jul 17 11:33:47 klein systemd[877]: lionfish.service: Failed with result 'exit-code'.

If I instead cherrypick fd38f5c onto a new testing branch rooted at the v3.9.1 tag, it starts correctly but I get this error on stop:

Jul 17 11:47:20 klein puma[2487]: - Gracefully stopping, waiting for requests to finish
Jul 17 11:47:20 klein systemd[877]: Stopping Lionfish (Puma) HTTP Server...
Jul 17 11:47:20 klein puma[2487]: /home/david/src/puma/lib/puma/server.rb:392:in `close': can't modify frozen IOError (RuntimeError)
Jul 17 11:47:20 klein puma[2487]:         from /home/david/src/puma/lib/puma/server.rb:392:in `ensure in handle_servers'
Jul 17 11:47:20 klein puma[2487]:         from /home/david/src/puma/lib/puma/server.rb:396:in `handle_servers'
Jul 17 11:47:20 klein puma[2487]:         from /home/david/src/puma/lib/puma/server.rb:325:in `block in run'
Jul 17 11:47:20 klein systemd[877]: lionfish.service: Main process exited, code=exited, status=1/FAILURE
Jul 17 11:47:20 klein systemd[877]: Stopped Lionfish (Puma) HTTP Server.
Jul 17 11:47:20 klein systemd[877]: lionfish.service: Unit entered failed state.
Jul 17 11:47:20 klein systemd[877]: lionfish.service: Failed with result 'exit-code'.

So in summary, I'm finding other problems on the master branch and/or your change that occur before I can even determine if the SuccessExitStatus=143 setting is necessary. Could you try and reproduce this yourself? One more note in case it matters, my local version of MRI ruby is:

ruby 2.2.7p470 (2017-03-28 revision 58194) [x86_64-linux]

github: puma/puma#1344 puma/puma#1337

dekellum · 2017-07-17T20:51:12Z

The puma/launcher.rb:171:in `run': undefined method `include?' issue is apparently #1344. I found workaround for this (853c01d) and now I'm able to confirm that your initial branch is working as you expected with sytstemd, no problems and without SuccessExitStatus=143 being required. I tested it both with and without clustered mode.

Again with the workaround, if I merge this PR to master it also works.

My cherry-picked variant based on v3.9.1 has that other error, but I don't think its relevant to this PR and current master.

So I guess the JVM has it wrong and you have it right, at least for MRI. Thanks @shayonj. Next I want to check this with jruby (on JVM).

dekellum · 2017-07-17T21:04:50Z

OK this PR seems to also work fine under jruby with systemd (without socket activation), and SuccessExitStatus=143 is not required for any of these combinations. I assume the difference is that the JVM's signal handler is calling exit(128+15) instead of kill("TERM") on itself after removing the exit handler? Even on jruby installing the DEFAULT signal handler (presumably via JNI) is working as intended.

shayonj · 2017-07-18T02:00:08Z

Thanks so much for such a thorough review @dekellum 🎉 .

Yes, I have puma running using systemd config on my VPS, now that I am thinking, I should have perhaps shared the config here :). And yes MRI ruby.

Another quick way to test SuccessExitStatus requirements, would be to fire up a miniature systemd config, pointing to a bash script that is just sleep. You can kill that process with -15 and would see the on-success behavior accordingly.

Lastly, exit(143) is different than kill("TERM"). exit more or less means a program terminated abnormally (and the status code is only for the terminal), where as kill allows to shutdown a process gracefully (if any).

shayonj · 2017-07-19T15:57:25Z

TL;DR on the above: No additional changes to the existing implementation was introduced and TERM signals return the exit status code correctly, plus the graceful stop behavior on single mode.

stereobooster · 2017-08-03T07:03:05Z

Please add test for this

shayonj · 2017-08-04T07:02:30Z

@stereobooster do you mean tests for making sure the right exit status code is returned? I suppose, verifying the exit status of a server (or process) exit from within an existing process (test suite) could be tricky ($? would return the current process) and probably why there aren't existing specs/tests.

I am going to attempt it and also look into any other additional way to expect/assert the trapping of Signal

stereobooster · 2017-08-04T16:55:16Z

It is supposed to integration test. By idea you should launch instance of puma in subshell and capture return status from it.

# returns pid of process
def start_puma()
  fork do
    exec "bundle exec puma -C puma_config.rb"
  end
end

def stop(pid)
  Process.kill("TERM", pid)
end

need to google on how to capture respond code in this scenario

shayonj · 2017-08-05T05:27:57Z

@stereobooster yes I am going for integration tests. I was looking for a way to working with sub process. I have added some integration tests, they pass locally for me, but appears to be failing in TravisCI. Looks like re-raising/killing the process might be causing an exception to be raised. I will follow up on this soon. Also, the teardown isn't working as expected.

stereobooster · 2017-08-05T14:50:38Z

test/test_integration.rb

+    _, status = Process.wait2(@server.pid)
+    assert_equal 15, status
+
+    @server = nil


@server.close @server = nil

shayonj · 2017-08-11T15:20:48Z

r? @schneems (I see you worked on the launcher migration) or @nateberkopec || @stereobooster
🙂

schneems · 2017-08-11T16:25:52Z

We can do something like this to re-raise the exact exception https://stackoverflow.com/questions/29568298/run-code-when-signal-is-sent-but-do-not-trap-the-signal-in-ruby

Attempt at returning the proper exit code (128+15) when TERM signal is sent to the server, for both single and clustered mode. The changes are achieved by restoring signal from within the trap and accordingly killing the process using TERM event. Added plus, stopping single mode gracefully now.

shayonj · 2017-08-11T17:36:43Z

TIL, we can re-raise the same signal exception instead of restoring signal and killing the process.

Updated 🙂

wodka · 2017-10-11T08:07:27Z

Hi, with the applied changes (3.10) I get an error on heroku during a restart:

is this intended (in terms of me having to catch the exception somehow)?

dekellum · 2017-10-11T17:41:45Z

This PR was merged. I suggest you (@wodka) please create a new issue for this, including the text of the error. I do believe the change of this PR has uncovered a bug, but the bug probably resides with how bundler's bundle exec is rescuing Puma's SignalException, logging it to stderr, and calling Kernel::abort (status 1). I am able to reproduce the error, outside of Heroku, with bundle exec. The issue is avoided when using a bundle binstubs puma wrapper. I'm not sure if that is under user control and thus a viable workaround on Heroku.

shayonj force-pushed the s/exit-handler branch 3 times, most recently from eddad58 to 3fa684d Compare June 19, 2017 03:45

nateberkopec added the feature label Jun 19, 2017

shayonj force-pushed the s/exit-handler branch 3 times, most recently from 08f16c9 to fd38f5c Compare July 10, 2017 04:35

dekellum added a commit to dekellum/guppy that referenced this pull request Jul 17, 2017

Add workaround for RUBYOPT nil issue

853c01d

github: puma/puma#1344 puma/puma#1337

shayonj force-pushed the s/exit-handler branch from fd38f5c to 027bcb6 Compare July 19, 2017 15:59

shayonj force-pushed the s/exit-handler branch from 027bcb6 to 412880b Compare July 29, 2017 19:45

nateberkopec modified the milestone: 3.10.0 Aug 3, 2017

shayonj force-pushed the s/exit-handler branch 3 times, most recently from 3928781 to f43ceea Compare August 5, 2017 05:23

stereobooster reviewed Aug 5, 2017

View reviewed changes

shayonj force-pushed the s/exit-handler branch 3 times, most recently from 8db6ee5 to c9eca37 Compare August 5, 2017 17:45

shayonj force-pushed the s/exit-handler branch 3 times, most recently from 96fa815 to d4b5ad2 Compare August 5, 2017 22:43

shayonj force-pushed the s/exit-handler branch from d4b5ad2 to 2f4c7ba Compare August 11, 2017 17:29

nateberkopec merged commit 137a80d into puma:master Aug 16, 2017

wodka mentioned this pull request Oct 11, 2017

Uncaught SignalException SIGTERM #1438

Closed

dekellum mentioned this pull request Oct 11, 2017

bundle exec rescues 'SignalException' and aborts rubygems/bundler#6090

Closed

calleluks mentioned this pull request Feb 23, 2018

Logging SignalException when gracefully shutting down Puma on Heroku honeybadger-io/honeybadger-ruby#267

Closed

perlun mentioned this pull request Apr 9, 2018

Return proper exit code for TERM signal, JRuby #1558

Closed

iGEL mentioned this pull request May 3, 2018

Signal traps exit with status 0 instead of killing own process #1183

Closed

sync-by-unito bot mentioned this pull request Apr 9, 2021

Bump puma from 3.9.1 to 3.12.6 yujiloh0816/sample_app#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return proper exit code for TERM signal #1337

Return proper exit code for TERM signal #1337

shayonj commented Jun 19, 2017 •

edited

dekellum commented Jul 11, 2017

shayonj commented Jul 15, 2017 •

edited

dekellum commented Jul 17, 2017 •

edited

dekellum commented Jul 17, 2017 •

edited

dekellum commented Jul 17, 2017

shayonj commented Jul 18, 2017 •

edited

shayonj commented Jul 19, 2017 •

edited

stereobooster commented Aug 3, 2017

shayonj commented Aug 4, 2017 •

edited

stereobooster commented Aug 4, 2017 •

edited

shayonj commented Aug 5, 2017

stereobooster Aug 5, 2017

shayonj commented Aug 11, 2017

schneems commented Aug 11, 2017

shayonj commented Aug 11, 2017

wodka commented Oct 11, 2017

dekellum commented Oct 11, 2017

Return proper exit code for TERM signal #1337

Return proper exit code for TERM signal #1337

Conversation

shayonj commented Jun 19, 2017 • edited

dekellum commented Jul 11, 2017

shayonj commented Jul 15, 2017 • edited

dekellum commented Jul 17, 2017 • edited

dekellum commented Jul 17, 2017 • edited

dekellum commented Jul 17, 2017

shayonj commented Jul 18, 2017 • edited

shayonj commented Jul 19, 2017 • edited

stereobooster commented Aug 3, 2017

shayonj commented Aug 4, 2017 • edited

stereobooster commented Aug 4, 2017 • edited

shayonj commented Aug 5, 2017

stereobooster Aug 5, 2017

Choose a reason for hiding this comment

shayonj commented Aug 11, 2017

schneems commented Aug 11, 2017

shayonj commented Aug 11, 2017

wodka commented Oct 11, 2017

dekellum commented Oct 11, 2017

shayonj commented Jun 19, 2017 •

edited

shayonj commented Jul 15, 2017 •

edited

dekellum commented Jul 17, 2017 •

edited

dekellum commented Jul 17, 2017 •

edited

shayonj commented Jul 18, 2017 •

edited

shayonj commented Jul 19, 2017 •

edited

shayonj commented Aug 4, 2017 •

edited

stereobooster commented Aug 4, 2017 •

edited