New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncaught SignalException SIGTERM #1438
Comments
Nice one @dekellum! I was able to replicate the issue using |
My fix and test for rubygems/bundler#6090 was merged to master and will presumably be released in the next bundler 1.16.x and possibly backported if they are still maintaining 1.15.x. In the interim, linked issue also gives details on two possible workarounds: using a I would suggest that a puma committer can now close this issue. This could also be noted in puma release notes as a known issue, referencing the bundler issue. |
cc: @nateberkopec |
👋 Thanks. |
This thread saved me a lot of time. Thank you for the info and quick fix. |
The previous behaviour resulted in applications reporting errors even though they were gracefully and without any problems shut down. Example: the puma webserver relies on `SIGTERM` to gracefully shut down its workers and Bugsnag reported errors when it was normally shut down. For more information on puma's behaviour, see here: puma/puma#1438 With the `at_exit` handler introduced in bugsnag#397, Bugsnag would catch `SignalExceptions` on exit and report them as error, even though they are none. This PR changes the existing behaviour to add `SignalException` as a special case in which the exception should not be passed to `Bugsnag.notify`. Due to the Ruby exiting when a `SignalException` is raised I had to refactor the code to make the actual handling of the exit-exceptions testable without the test suite quitting.
From https://www.rubydoc.info/gems/puma/Puma%2FDSL:raise_exception_on_sigterm, documenting the puma config option "raise_exception_on_sigterm" that was added in puma version 4 in the context of #1690:
|
I'm using Puma 4 and as far as I could understand for the systemd documentation, with socket activation you ARE going to kill workers with SIGTERM, because there's no ExecStop in the example service files (socket or service) telling to use USR2 or otherwise, right? So, if one is using systemd with socket activation (a pretty standard deployment IMHO), we should either (1) just ignore this sigterm in our exception handling or (2) set Am I getting this right? Or my |
I wanted to check in on this thread... @dekellum I'm running on bundler 2.1.4 which I believe should include your bundler fix. Yet we're still often getting SignalException errors to Rollbar from our dynos:
Same thing happens for our worker dynos. I'm wondering if a logtrace like this one indicates that we actually do have your fix, and it's something else that's fiddly with Heroku that's sending errors off to Rollbar? I've come across the switch for turning off raising SignalException on SIGTERM from puma, but I'm concerned that would have unexpected downstream effects. Are other people encountering these issues at all? |
@AndrewSouthpaw +1. got this in our ECS servers with no relevant change in application code. |
I've been working on a re-write of the test suite, and added a few 'shutdown' related tests. I'm still trying to sort out what's happening in terms of single/cluster, halt/stop/graceful-stop, etc. One of the failures was a bad exit code, another was a unix bind path that wasn't being unlinked... |
Thanks for the feedback! For our case, I narrowed it down to an issue where an unhandled SignalException is emitted when a worker spins up and immediately is told to turn off before it can finish booting. (We had an autoscaling system that was sometimes telling workers to turn off/on within the same second.) When we changed the logic so that a boot/shutdown would only be initialized once per minute, our errors went away. Hope that maybe helps others in their investigation! |
We've been having the same issue on Heroku with Puma |
We get the same, when we push a new version and restart the server, we get:
Any solution to this? |
Looking at our Rollbar quota, the large majority of the errors we get there come from, I believe, the recurring Heroku restart that happens. I see no reason to report these to Rollbar, since they are expected and, at least while using Heroku, necessary. Luckily, looks like [Puma has an option for this][0], so it makes sense to me to turn it on (or rather off, since we’re setting it to `false`). [Found it][1] while doing some research on this subject, since I’ve had a feeling I’ve seen it before. [0]: https://www.rubydoc.info/gems/puma/Puma%2FDSL:raise_exception_on_sigterm [1]: puma/puma#1438
Steps to reproduce
setup heroku with puma
restart app
exception is thrown
this seems to be related to #1337 and can be reproduces without heroku (thx @dekellum)
#1337 (comment)
Expected behavior
no exception is thrown
Actual behavior
exception is thrown
System configuration
Ruby version: 2.2.5
Rails version: 4.2.9
Puma version: 3.10
The text was updated successfully, but these errors were encountered: