New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-existing socket file removed when TERM is issued after USR2 (if puma is running in cluster mode) #2816
Comments
Seems reasonable. I think there's probably a race or interaction going on here between the two signal handlers, since a USR2 restart can take a considerable amount of time to complete. The next step here will be converting your (very good!) description of the problem using a simple app into a test. We do have a lot of existing tests around restarts and signals which would be a good place to start. |
The issue was that when inheriting the socket it was added to the @unix_paths list even if it was pre-existing.
The issue was that when inheriting the socket it was added to the @unix_paths list even if it was pre-existing. Test added. Closed puma#2816.
The issue was that inherited socket was added to the @unix_paths list even if it was pre-existing. Test added. Closed puma#2816.
Thank you! I have created a test, and also wrote a bash script to bisect the commits. I think I have a fix, submitted a PR. |
The issue was that inherited socket was added to the @unix_paths list even if it was pre-existing. Test added. Closed puma#2816.
The issue was that inherited socket was added to the @unix_paths list even if it was pre-existing. Test added. Closed puma#2816.
The issue was that inherited socket was added to the @unix_paths list even if it was pre-existing. Test added. Closed puma#2816.
Describe the bug
Puma, when running in cluster mode, deletes the socket file if a TERM signal is issued after USR2 signal. This can be a problem when using socket activation.
I am running puma on a Linux server, in cluster mode, preloaded app, using systemd, socket activation. I noticed
Permission denied
exceptions in my logs when restarting puma. When I moved the socket file into a writable folder, those exceptions stopped, but instead the socket file just started to disappear. Further investigation showed this only happens if puma was reloaded previously. (For clarity: restart here means systemd stopping the process and then starting it again; reload means I have issued USR2 signal orpumactl reload
after deployment).Environment:
I was able to reproduce this on various configurations:
OS: Linux, MacOS
Ruby: 3.1.0, 3.0.3
Puma: 5.6.1, 5.5.2
Puma config and systemd unit file do not really matter, because as shown below, this can be reproduced on a minimalistic app. The only things that matter are that puma is bound to a socket, and is run in cluster mode.
To Reproduce
I tried to reproduce this bug locally, on MacOS, Ruby 3.1.0, puma 5.6.1.
In a new folder I created
hello.ru
:and
Gemfile
:and ran
bundle install
Because I encountered some inconsistent behaviour, I tried to test a matrix of different settings, performing the following steps:
Socket file setup:
No pre-existing socket file: I made sure to
rm puma.sock
Pre-existing socket file: I ran
touch puma.sock
, to create a socket file (removing it first if it existed).Run puma.
Single mode:
Cluster mode:
TERM
USR2+TERM
followed by
puma.sock
file is preserved or removed after termination.Results
I do not know what is the correct behaviour for each one of these combinations, but the result of issuing USR2+TERM with a pre-existing socket file in cluster mode looks especially inconsistent compared to others.
Expected behaviour
I expect the pre-activated socket not being removed after I reload my app at some point and then later try restarting it.
Additional notes:
Here is my production log, showing where exactly an attempt to remove the socket file happens:
Here you also can see a message
Detected parent died, dying
which I occasionally saw locally, but was not able to reliably reproduce.There were previously other issues with socket removal, e.g. #1988.
Thank you!
The text was updated successfully, but these errors were encountered: