Socket removed after reload #1775

j15e · 2019-04-16T03:15:13Z

Steps to reproduce

I followed instruction on the wiki to setup puma with systemd on Ubuntu 18.04 (https://github.com/puma/puma/blob/master/docs/systemd.md#socket-activation) but puma reload is not working as expected with the latest version of puma.

During a reload, I see these logs which highlight the problem (with PUMA_DEBUG=1):

Apr 16 03:38:22 fizz bundle[1184]: % Registered unix:/home/app/web/shared/tmp/sockets/puma.sock for activation from LISTEN_FDS
Apr 16 03:38:22 fizz bundle[1184]: [1184] Puma starting in cluster mode...
Apr 16 03:38:22 fizz bundle[1184]: [1184] * Version 3.12.1 (ruby 2.5.5-p157), codename: Llamas in Pajamas
Apr 16 03:38:22 fizz bundle[1184]: [1184] * Min threads: 8, max threads: 16
Apr 16 03:38:22 fizz bundle[1184]: [1184] * Environment: production
Apr 16 03:38:22 fizz bundle[1184]: [1184] * Process workers: 2
Apr 16 03:38:22 fizz bundle[1184]: [1184] * Phased restart available
Apr 16 03:38:22 fizz bundle[1184]: [1184] * Activated unix:///home/app/web/shared/tmp/sockets/puma.sock
Apr 16 03:38:22 fizz bundle[1184]: [1184] Use Ctrl-C to stop
Apr 16 03:38:25 fizz bundle[1184]: [1184] - Worker 1 (pid: 1225) booted, phase: 0
Apr 16 03:38:25 fizz bundle[1184]: [1184] - Worker 0 (pid: 1221) booted, phase: 0
Apr 16 03:38:43 fizz systemd[1]: Reloading Puma HTTP Server.
Apr 16 03:38:43 fizz bundle[1184]: [1184] - Starting phased worker restart, phase: 1
Apr 16 03:38:43 fizz bundle[1184]: [1184] + Changing to /home/app/web/current
Apr 16 03:38:43 fizz systemd[1]: Reloaded Puma HTTP Server.
Apr 16 03:38:45 fizz bundle[1184]: [1184] - Stopping 1221 for phased upgrade...
Apr 16 03:38:45 fizz bundle[1184]: [1184] - TERM sent to 1221...
Apr 16 03:38:49 fizz bundle[1184]: [1184] - Worker 0 (pid: 1302) booted, phase: 1
Apr 16 03:38:49 fizz bundle[1184]: [1184] - Stopping 1225 for phased upgrade...
Apr 16 03:38:49 fizz bundle[1184]: [1184] - TERM sent to 1225...
Apr 16 03:38:49 fizz bundle[1184]: Exception handling servers: No such file or directory @ apply2files - /home/app/web/shared/tmp/sockets/puma.sock (Errno::ENOENT)
Apr 16 03:38:49 fizz bundle[1184]: /home/app/web/shared/bundle/ruby/2.5.0/bundler/gems/puma-200c95d30c37/lib/puma/binder.rb:53:in unlink' Apr 16 03:38:49 fizz bundle[1184]: /home/app/web/shared/bundle/ruby/2.5.0/bundler/gems/puma-200c95d30c37/lib/puma/binder.rb:53:in block in close'
Apr 16 03:38:49 fizz bundle[1184]: /home/app/web/shared/bundle/ruby/2.5.0/bundler/gems/puma-200c95d30c37/lib/puma/binder.rb:53:in each' Apr 16 03:38:49 fizz bundle[1184]: /home/app/web/shared/bundle/ruby/2.5.0/bundler/gems/puma-200c95d30c37/lib/puma/binder.rb:53:in close'
Apr 16 03:38:49 fizz bundle[1184]: /home/app/web/shared/bundle/ruby/2.5.0/bundler/gems/puma-200c95d30c37/lib/puma/server.rb:939:in graceful_shutdown' Apr 16 03:38:49 fizz bundle[1184]: /home/app/web/shared/bundle/ruby/2.5.0/bundler/gems/puma-200c95d30c37/lib/puma/server.rb:419:in handle_servers'
Apr 16 03:38:49 fizz bundle[1184]: /home/app/web/shared/bundle/ruby/2.5.0/bundler/gems/puma-200c95d30c37/lib/puma/server.rb:354:in `block in run'
Apr 16 03:38:52 fizz bundle[1184]: [1184] - Worker 1 (pid: 1327) booted, phase: 1

After the reload, the workers process are reloaded & running, but nginx isn't serving anything after that point since the puma.sock socket is removed (not only closed, but the file itself is not there anymore).

I think that removing the @own_binder logic in #1685 may now cause puma to incorrectly unlink socket as Puma::Binder#close does unlink unix socket files and is not doing so in the correct order during a phased restart (unlink occurs after the new workers are spawned, when old workers are stopped) :

puma/lib/puma/binder.rb

Lines 51 to 54 in 200c95d

    
           def close 
        
             @ios.each { |i| i.close } 
        
             @unix_paths.each { |i| File.unlink i } 
        
           end

Expected behavior

Should keep working via unix socket after reload.

Actual behavior

Unix socket is removed after being activated by new restarted worker.

System configuration

Ruby version: 2.5.5
Rails version: 5.1.6.2
Puma version: master

Edit: Just for your information, sadly I am now using port binding with my systemd setup (without any issue). Thanks to everyone investigating & trying to fix those socket issues.

The text was updated successfully, but these errors were encountered:

j15e · 2019-04-16T14:55:32Z

Also, not sure how this is related, but maybe it can help pin point the issue(s) with sockets : reloading puma also removes the socket on v3.12.1 with ruby 2.6.2 and 2.5.5 when preloading application is enabled.

In that situation there is no error message since there is no additional attempt to remove the socket, but the socket disappear after the first reload too.

In the systemd puma documentation it states that preload is compatible :

Among other advantages, this keeps listening sockets open across puma restarts and achieves graceful restarts, including when upgraded puma, and is compatible with both clustered mode and application preload.

But sample configurations does not specify preload so maybe this wasn't really tested. Maybe I look at the wrong place but I didn't see any test of the signal management & reload management. Maybe we should add integrations test covering these usages in https://github.com/puma/puma/blob/master/test/test_integration.rb ?

dekellum · 2019-04-19T08:53:07Z

This issue might be better described as: "phased restart" is not currently compatible with systemd socket activation. The systemd documentation you quote doesn't actually claim otherwise—cluster mode, preload, and socket activation work with systemd level (with socket activation, graceful) restarts (non-rolling, master and all workers in parallel).

Assuming this is true, I'm not sure if the limitation can/should be fixed (your Binder::close suggestion) or might at least be better documented. For latter, the restart doc might be improved by mentioning/linking systemd and socket activation as a 4th way to achieve graceful restarts.

If you really need phased restart (e.g. because your load time is too long), then you might need, as a workaround, to disable systemd socket activation.

j15e · 2019-04-19T15:53:17Z

Ok so if I understand correctly, when using puma with systemd socket activation, puma should not be reloaded via a reload signal - this isn't supported by design? It should only be stop the start by systemd and it systemd will keep the socket active during the stop/start transition?

dekellum · 2019-04-19T16:29:56Z

I'm personally not sure of the answer is to your first question, "…by design?" That puma phased/rolling restart unlinks the UNIX socket file on reload (which I must presume has some motivation) is not going to work with systemd socket activation. There might be some permission scheme that would disallow puma from doing this, but I'd guess that would abort the reload.

As for your second question: yes, systemd will gracefully start a new worker or cluster of works and shut down the old without any request loss. The major difference is that the entire cluster is started in parallel (like a cold start) instead of restarting workers serially, as with phased/rolling restart. That can be an advantage, in that it more likely avoids a single user interacting with different deployed versions for subsequent requests within a short time period. The linked restart doc discusses this for migrations, for example.

A related question is: could you also workaround this by using TCP instead of UNIX socket? I see lots of puma issues specific to UNIX sockets where the OP isn't super clear why they want UNIX sockets in the first place. Minor possible performance advantage would be one, but IMO likely not measurably significant, at least at peak throughput if Rails is in use.

dekellum · 2019-04-19T17:21:56Z

Also regarding your original comment on testing: yes I agree there are many test coverage holes in this area and the potential for regressions is high. As part of my work on puma+systemd, I have https://github.com/dekellum/lionfish . I just updated this for puma 3.12.1 to sanity check that recent releases haven't totally hosed systemd and (tcp) socket activation. Parts of this could be reused for puma tests and CI. Unlikely I'll find time soon, but I think my next step would be to add CI to lionfish. If that looks good then could repackage it for a puma PR. I'd accept a PR for lionfish, if you wanted to work on it.

j15e · 2019-04-19T17:46:34Z

Well I was kind of asking myself the same question as why unix socket seems to be suggested in documentation, but seems to cause many issues and performance gains are not really well documented either. It has a security advantage because it opens not port, but I think a that is not really a good argument either since you should configure your binding & firewall correctly.

I think I'll move to a 0.0.0.0 port socket.

dekellum · 2019-04-19T17:55:48Z

So hopefully there are two possible workarounds, but it does seem like #1685 must have caused this regression. Reading there that Binder::close was never called, previously. Perhaps puma should include some configuration for this behavior, as a followup to #1685 and possible fix of this? I would suggest keeping this issue open even if your workaround succeeds.

calmyournerves · 2019-06-26T10:22:47Z

Any progress on this one? Seems like Puma doesn't work with systemd socket activation anymore as of 4.0.0.

nateberkopec · 2019-06-26T12:34:32Z

Can you provide more details @calmyournerves? The original issue here was on 3.12.

calmyournerves · 2019-06-26T13:26:31Z

Hey @nateberkopec!

I think this issue was reported on master. The changes in #1685 (which was already merged when this issue was created) seem to cause that the socket is removed when Puma gets stopped. So when we call systemctl stop app (or restart) it will cause the socket that is managed by systemd to disappear, breaking socket activation. This works perfectly fine for us in 3.12.

As described in the first comment, removing the logic around @own_binder seems to be the problem.

mcg · 2019-06-26T19:31:52Z

Ran into this updating from 3.12.1 to 4.0.0 using socket activation. Socket is removed on restart.

Fudoshiki · 2019-06-30T05:07:17Z

ruby 2.6.3
rails 6.1
puma 4

config/puma.rb

# frozen_string_literal: true

# Puma can serve each request in a thread from an internal thread pool.
# The `threads` method setting takes two numbers: a minimum and maximum.
# Any libraries that use thread pools should be configured to match
# the maximum value specified for Puma. Default is set to 5 threads for minimum
# and maximum; this matches the default thread size of Active Record.
#
max_threads_count = ENV.fetch('RAILS_MAX_THREADS') { 5 }
min_threads_count = ENV.fetch('RAILS_MIN_THREADS') { max_threads_count }
threads min_threads_count, max_threads_count

# Specifies the `environment` that Puma will run in.
#
environment ENV.fetch('RAILS_ENV') { 'development' }
# early_hints true

# Specifies the number of `workers` to boot in clustered mode.
# Workers are forked webserver processes. If using threads and workers together
# the concurrency of the application would be max `threads` * `workers`.
# Workers do not work on JRuby or Windows (both of which do not support
# processes).
#
workers ENV.fetch('WEB_CONCURRENCY') { 2 }

# Use the `preload_app!` method when specifying a `workers` number.
# This directive tells Puma to first boot the application and load code
# before forking the application. This takes advantage of Copy On Write
# process behavior so workers use less memory. If you use this option
# you need to make sure to reconnect any threads in the `on_worker_boot`
# block.
#
# preload_app!

# Allow workers to reload bundler context when master process is issued
# a USR1 signal. This allows proper reloading of gems while the master
# is preserved across a phased-restart. (incompatible with preload_app)
# (off by default)
#
prune_bundler

# Bind unix domain socket
bind ENV.fetch('PUMA_SOCK') { 'unix://tmp/sockets/puma.sock' }

# Use `path` as the file to store the server info state. This is
# used by `pumactl` to query and control the server.
#
state_path ENV.fetch('SERVER_STATE') { 'tmp/pids/puma.state' }

# === Puma control rack application ===

# Start the puma control rack application on `url`. This application can
# be communicated with to control the main server. Additionally, you can
# provide an authentication token, so all requests to the control server
# will need to include that token as a query parameter. This allows for
# simple authentication.
#
# Check out https://github.com/puma/puma/blob/master/lib/puma/app/status.rb
# to see what the app has available.
#
activate_control_app ENV.fetch('PUMACTL_SOCK') { 'unix://tmp/sockets/pumactl.sock' }

# Store the pid of the server in the file at `path`.
#
pidfile 'tmp/pids/puma.pid'

# Change the default timeout of workers
#
worker_timeout 10

# Allow puma to be restarted by `rails restart` command.
plugin :tmp_restart

config/deploy.rb

after 'deploy:published', 'puma:phased-restart'

After cap production deploy

$ ps aux | grep puma
webapp   18304  0.0  0.8 368084  8324 ?        Sl   04:17   0:00 puma 4.0.0 (unix://tmp/sockets/puma.sock) [20190630041534]
webapp   30011  0.2 11.7 591616 118548 ?       Sl   04:54   0:01 puma: cluster worker 0: 18304 [20190630041534]
webapp   30082  0.2 11.8 591596 119296 ?       Sl   04:54   0:01 puma: cluster worker 1: 18304 [20190630041534]
webapp   30871  0.0  0.2 226484  2756 pts/0    S+   05:02   0:00 grep --color=auto puma

$ ls /srv/webapp/repository/current/tmp/sockets
pumactl.sock

puma.sock is missing and Nginx can't establish connection

Fudoshiki · 2019-06-30T19:14:53Z

@nateberkopec this isn't related to systemd

pumactl -S tmp/pids/puma.state -F config/puma.rb phased-restart

puma 4 is removes puma.sock, puma 3.12 doesn't

Fudoshiki · 2019-06-30T20:34:04Z

seems to work https://github.com/Fudoshiki/puma/commit/ae44a9333354ce45b473550dca492277f4dafb58

schneems · 2019-07-02T20:19:53Z

seems to work Fudoshiki@ae44a93

Can you send us a PR? Is it possible to add a test for this behavior since it was working and then we broke it? I can't guarantee we won't break it again

nateberkopec · 2019-07-02T22:06:15Z

PR is here #1829

mcg · 2019-07-11T19:34:00Z

Using puma 4.0.1 that should have this fix? Socket still missing after restart.

nateberkopec · 2019-07-11T20:06:27Z

Well, that's not good. Moving discussion to #1842.

This reverts commit 3b48d32.

nateberkopec added the systemd label Apr 20, 2019

Fudoshiki mentioned this issue Jul 1, 2019

fix #1775 #1829

Merged

dentarg mentioned this issue Jul 2, 2019

Puma 4.0.0, breaking changes? #1828

Closed

nateberkopec added bug important! and removed systemd labels Jul 2, 2019

nateberkopec added this to the 4.0.1 milestone Jul 2, 2019

nateberkopec mentioned this issue Jul 3, 2019

Puma removing socket after graceful restart since 4.0 #1834

Closed

schneems closed this as completed in 3b48d32 Jul 9, 2019

nateberkopec mentioned this issue Jul 11, 2019

Socket removed after reload, part II #1842

Closed

puma locked as resolved and limited conversation to collaborators Jul 11, 2019

nateberkopec added a commit that referenced this issue Sep 2, 2019

Revert "fix #1775 (#1829)"

4fe5723

This reverts commit 3b48d32.

nateberkopec added a commit that referenced this issue Sep 3, 2019

Revert "fix #1775 (#1829)"

a63ae57

This reverts commit 3b48d32.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Socket removed after reload #1775

Socket removed after reload #1775

j15e commented Apr 16, 2019 •

edited

j15e commented Apr 16, 2019 •

edited

dekellum commented Apr 19, 2019 •

edited

j15e commented Apr 19, 2019

dekellum commented Apr 19, 2019

dekellum commented Apr 19, 2019

j15e commented Apr 19, 2019

dekellum commented Apr 19, 2019

calmyournerves commented Jun 26, 2019

nateberkopec commented Jun 26, 2019

calmyournerves commented Jun 26, 2019

mcg commented Jun 26, 2019

Fudoshiki commented Jun 30, 2019 •

edited

Fudoshiki commented Jun 30, 2019

Fudoshiki commented Jun 30, 2019

schneems commented Jul 2, 2019

nateberkopec commented Jul 2, 2019

mcg commented Jul 11, 2019 •

edited

nateberkopec commented Jul 11, 2019

Socket removed after reload #1775

Socket removed after reload #1775

Comments

j15e commented Apr 16, 2019 • edited

Steps to reproduce

Expected behavior

Actual behavior

System configuration

j15e commented Apr 16, 2019 • edited

dekellum commented Apr 19, 2019 • edited

j15e commented Apr 19, 2019

dekellum commented Apr 19, 2019

dekellum commented Apr 19, 2019

j15e commented Apr 19, 2019

dekellum commented Apr 19, 2019

calmyournerves commented Jun 26, 2019

nateberkopec commented Jun 26, 2019

calmyournerves commented Jun 26, 2019

mcg commented Jun 26, 2019

Fudoshiki commented Jun 30, 2019 • edited

Fudoshiki commented Jun 30, 2019

Fudoshiki commented Jun 30, 2019

schneems commented Jul 2, 2019

nateberkopec commented Jul 2, 2019

mcg commented Jul 11, 2019 • edited

nateberkopec commented Jul 11, 2019

j15e commented Apr 16, 2019 •

edited

j15e commented Apr 16, 2019 •

edited

dekellum commented Apr 19, 2019 •

edited

Fudoshiki commented Jun 30, 2019 •

edited

mcg commented Jul 11, 2019 •

edited