Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phased-restart: Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound) #2018

Closed
gingerlime opened this issue Oct 8, 2019 · 49 comments · Fixed by #2427
Closed
Labels
bug waiting-for-review Waiting on review from anyone

Comments

@gingerlime
Copy link
Contributor

gingerlime commented Oct 8, 2019

Describe the bug

This seems to happen when doing phased-restarts using versioned deploy folders with symlinks.

We use a folder structure similar to this:

/app/app.d/{version (int)}
/app/app -> /app/app.d/{version} # symlink
/app/prev -> /app/app.d/{version-1} # symlink

When we deploy, we increment the version, run bundle etc, then switch the symlink and do a phased-restart of puma.

All seems to work fine, but after a few deploys, we also start pruning older versions, and then we see this error in the puma.log

[12357] - Starting phased worker restart, phase: 4
[12357] - Starting phased worker restart, phase: 4
[12357] + Changing to /var/local/app/app
[12357] + Changing to /var/local/app/app
[12357] - Stopping 28107 for phased upgrade...
[12357] - Stopping 28107 for phased upgrade...
[12357] - TERM sent to 28107...
[12357] - TERM sent to 28107...
[7886] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7886] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7886] Early termination of worker
[7886] Early termination of worker
[7944] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7944] + Gemfile in context: /var/local/app/app.d/909/Gemfile
/usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:91:in `block in materialize': Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound/usr/lib/ruby/site_ruby/2.5.0
/bundler/spec_set.rb:91:in `block in materialize': Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound)
)
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `map!'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `materialize'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:108:in `block in definition_method'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `map!'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `materialize'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:108:in `block in definition_method'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler.rb:107:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler.rb:107:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:302:in `rack_builder'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:302:in `rack_builder'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:321:in `load_rackup'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:321:in `load_rackup'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:246:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:246:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:158:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:158:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:165:in `start_server'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:165:in `start_server'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:269:in `worker'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:269:in `worker'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block (2 levels) in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block (2 levels) in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `fork'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `fork'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `times'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `times'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:204:in `check_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:204:in `check_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:487:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:487:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/bin/puma-wild:31:in `<main>'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/bin/puma-wild:31:in `<main>'
[8048] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[8048] + Gemfile in context: /var/local/app/app.d/909/Gemfile

❗️ Note that the app now lives in /app/app.d/909 but the errors are pointing to puma in /app/app.d/906 ❗️

Puma config:

#!/usr/bin/env puma

# The directory to operate out of.
#
# The default is the current directory.
#
directory "/var/local/app/app"

# Set the environment in which the rack's app will run. The value must be a string.
#
# The default is "development".
#
environment ENV["RAILS_ENV"] || "development"

# Store the pid of the server in the file at "path".
#
pidfile "/var/run/puma/puma.pid"

# Use "path" as the file to store the server info state. This is
# used by "pumactl" to query and control the server.
#
state_path "/var/run/puma/puma.state"

# Redirect STDOUT and STDERR to files specified. The 3rd parameter
# ("append") specifies whether the output is appended, the default is
# "false".
#
stdout_redirect "/var/log/puma.log", "/var/log/puma.log", true

# Configure "min" to be the minimum number of threads to use to answer
# requests and "max" the maximum.
#
# The default is "0, 16".
#
# threads 0, 16
threads 8, 8


# Bind the server to "url". "tcp://", "unix://" and "ssl://" are the only
# accepted protocols.
#
# The default is "tcp://0.0.0.0:9292".
#
bind 'unix:///var/run/puma/puma.sock'


# === Cluster mode ===

# How many worker processes to run.  Typically this is set to
# to the number of available cores.
#
# The default is "0".
#
workers ENV["PUMA_WORKERS"] || 16

# Allow workers to reload bundler context when master process is issued
# a USR1 signal. This allows proper reloading of gems while the master
# is preserved across a phased-restart. (incompatible with preload_app)
# (off by default)

# see https://github.com/puma/puma/blob/master/docs/deployment.md#restarting
prune_bundler

If I do a full restart, then things are back to normal, until it happens again (we keep 3 deploys back, so after 4 deploys...)

To Reproduce

I was trying to reproduce this in a clean environment, but I can't manage to do it so far... :-/

Expected behavior
phased-restart should reload everything from the new folder... It does seem to point to the right folder, but there's some mysterious left-overs pointing to something old somewhere??

Server:

  • OS: Ubuntu 18.04 with systemd
  • Puma Version 4.2.1 (it doesn't happen with 3.12)
@gingerlime
Copy link
Contributor Author

@mcg @MSP-Greg maybe you guys have some suggestions at least what to test or how to try to reproduce this? I tried it with a toy app and using symlinks in a similar structure, but somehow can't reproduce it in this environment... But it does seem to happen fairly consistently in our staging environment so far.

@gingerlime
Copy link
Contributor Author

gingerlime commented Oct 8, 2019

One small thing I saw in our staging logs, but I don't see in my toy environment is:

Early termination of worker

Maybe it's related to the problem?

@nateberkopec
Copy link
Member

Possibly related to the changes we made to prune_bundler lately.

@gencer
Copy link

gencer commented Oct 11, 2019

I have a problem similar to this. When I update a gem in Gemfile that puma uses like puma itself or as original poster said nio4r and try to do phased restart (SIGUSR1), My app does not get reload due to mismatched gem versions.

See here: #705

So, Is this issue related to this prune_bundler changes made lately?


In development, I got same error. Early termination of worker

@nateberkopec
Copy link
Member

When I update a gem in Gemfile that puma uses like puma itself or as original poster said nio4r and try to do phased restart

Makes sense if you look at the code for how prune_bundler works. We'll have to figure out how to change/reload Puma's spec.

@gingerlime
Copy link
Contributor Author

gingerlime commented Oct 23, 2019

Slightly OT, but is it worth switching from prune_bundler to the more "traditional"(??) before_fork / on_worker_boot? btw, I find the README section about clustered mode conflicting with Deployment restarting docs e.g.

(README)

If you're using a configuration file, use the preload_app! method:

(Deployment)

Don't use preload!. This dirties the master process and means it will have to shutdown all the workers and re-exec itself to get your new code. It is not compatible with phased-restart and prune_bundler as well.

(also the deployment doc mentions using prune_bundler but there's no mention of it at all on the main README)

Which advice should I follow? :)

@nateberkopec
Copy link
Member

If you need phased restarts or prune_bundler, you cannot preload_app. If you don't need either of those things, use preload_app.

@gingerlime
Copy link
Contributor Author

Sorry if this is a dumb question, but how do I know if I need prune_bundler?

I do need phased restarts for zero-downtime deploys.

@gencer
Copy link

gencer commented Oct 23, 2019

We'll have to figure out how to change/reload Puma's spec.

This will be awesome if we can have a solution for this. At least a workaround until a permanent solution comes up. Since, updating those specs requires full restart and this means downtime.

@p8
Copy link
Contributor

p8 commented Nov 21, 2019

@gingerlime I thinks this problem occurs because you vendor the ruby gems in /var/local/app/app.d/{version (int)}. See:

/var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'

If this version gets deleted the vendored gems also get deleted.
You probably want to install gems in a shared persistent directory like: /var/local/app/app.d/shared

@gingerlime
Copy link
Contributor Author

Thanks for the suggestion! But this is working 100% fine with puma 3 for a number of years now. How come it stopped working with 4??

@p8
Copy link
Contributor

p8 commented Nov 21, 2019

Puma 3 had no dependent gems. Puma 4 added nio4r as a dependency.
I'm not sure but maybe bundler tries to reload nio4r? Where as before it didn't need to reload anything?

@gingerlime
Copy link
Contributor Author

Good point! However isn’t puma 4 potentially breaking prune_bundler then ?

@dentarg
Copy link
Member

dentarg commented Nov 21, 2019

@gingerlime The bug might have been there since before Puma 4, see #1593 and rubygems/bundler#6667, looks like the same problem you have but under different circumstances

There's a PR trying to fix it #1893, perhaps you could try it out?

Puma v4.3.0 is still using Bundler.with_clean_env

Bundler.with_clean_env do

#1893 switches to Bundler.with_original_env which reportedly solved the issue for one user in rubygems/bundler#6667

@gingerlime
Copy link
Contributor Author

Thanks @dentarg very interesting. I'll try to find some time to try out #1893 and see. Reproducing the problem in itself was a challenge, and I was only seeing it in our staging environment (which is shared, so I can't experiment there for too long). Hope I can still reproduce it, and then try again with #1893 and report back here.

Otherwise, is there a workaround perhaps? maybe moving away from prune_bundler? what's the recommended approach for zero-downtime deploys with the now "old-school" symlinked structure?

@gencer
Copy link

gencer commented Nov 22, 2019

As @gingerlime pointed out, I am also interested in any workarounds for zero-downtime deployments.

rokumatsumoto added a commit to rokumatsumoto/boyutluseyler that referenced this issue Dec 9, 2019
- Cannot boot puma workers

from puma.log on AWS EB
--
=== puma startup: 2019-12-09 12:47:04 +0000 ===
=== puma startup: 2019-12-09 12:47:04 +0000 ===
[19558] Early termination of worker
[19600] Early termination of worker
[19619] Early termination of worker
[19635] Early termination of worker
[19640] Early termination of worker
[19646] Early termination of worker
[19654] Early termination of worker
[19659] Early termination of worker

puma/puma#2018
puma/puma#1893
@cjlarose
Copy link
Member

cjlarose commented Feb 18, 2020

My team was experiencing this same issue. For us, the problem had to do with how our release process worked, not necessarily with Puma itself. Like @gingerlime, this error happened on our staging environment much more reliably and more frequently than it did in production.

In a nutshell: If your release process removes old releases (specifically, the compiled native extensions on which the puma master relies), new puma workers will check to see if the native extensions are compiled, fail to find that evidence, and bundler will ignore the installed gem, eventually raising Bundler::GemNotFound.

I don't have a ready-to-go public repro repo, but this is the setup (forgive my sloppiness):

  • Create "release_1", a directory containing your application code as well as your gems (bundle install --path .bundle). Point your current release symlink to release_1.
  • Start a puma master with a cluster (workers 2 or something). bundle exec puma -C puma.rb). You must have prune_bundler in there, too.
  • Create "release_2". Point your current release symlink to release_2. Issue a phased_restart. This will succeed since although bundler/setup on the release_2 workers will check for existence of the compiled native extensions in the directory for release_1, they're still present on disk, so everything's good.
  • rm -rf release_1. If you CD tools delete old releases to save space, that might be happening for you automatically.
  • Create "release_3". Point your current release symlink to release_3. Issue a phased restart. This will fail with the error Bundler::GemNotFound, since Gem::Specification#missing_extensions? returns true after failing to find the file at the path returned by the method gem_build_complete_path (that will encode a path to release_1, which was removed already). The workers will die and the master will restart new workers, but those workers will fail with the same error again and again and again.

This problem, I imagine, happens much more frequently with puma 4.x than with puma 3.x because puma 4.x has a runtime dependency on a gem that has native extensions. Any time that the puma master has "activated" a gem that has native extensions, workers can encounter this error if they try to activate the same gems.

My team still experienced a similar issue with puma 3.x, not with the nio4r gem, but instead with ffi, since we were loading PumaWorkerKiller (which has a runtime dependency on ffi) in the puma master. New workers would start, fail to find native extensions for ffi, and exit with the error Bunlder::GemNotFound.

For a quick workaround, just keep your old releases around on your servers (or at the very least the release directory for the version of your application where you first started the puma master). You can safely perform phased restarts as you'd expect.

My team is investigating other solutions, but we're not far along in that process yet. We'll make sure to share what we find.

@nateberkopec
Copy link
Member

Everyone having simlar issues should try master branch, as #1893 was merged and may help.

@cjlarose
Copy link
Member

cjlarose commented Feb 19, 2020

I created a repo that demonstrates that this error occurs reliably if you're deleting old releases

https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors

The master branch of that repo gets the error described in this thread. It's using puma 4.3.1.

Could not find nio4r-2.5.2 in any of the sources
Run `bundle install` to install missing gems.
[292] Early termination of worker

The puma-master branch of the repro repo is using puma at commit 8c9b3eb, which includes changes from #1893.

Workers still fail on startup, but with a different error

/usr/src/releases/1582131641325997600/Gemfile not found

@dentarg
Copy link
Member

dentarg commented Feb 19, 2020

From the repro repo

This project reproduces an issue related to phased restarts in Puma. Specifically, it demonstates that if you delete the compiled native extensions that were used when the puma master started, the next phased restart will fail.

Is that a problem Puma should solve? 🤔

@gencer
Copy link

gencer commented Feb 19, 2020

@dentarg, don't you think so? So you mean that we will update puma for getting not just for better features but for better security in the future but in fact we will not be able to gracefully deploy due to this issue because it crashes.

I believe this should be solved by Puma as soon as possible. At least a workaround.

For now, I just forcefully reject pull requests for Puma.

Update: I realized that previous quote seems mentioning about 3rd party native extension that is not related to puma. If this is the case, I believe if we can do phased-restart on unicorn and/or other web servers (and we do) we should be able to do so with puma too.

@cjlarose
Copy link
Member

Is that a problem Puma should solve?

@dentarg I'm not certain about this either. The bug might not even be with Puma itself--it might be a bug with bundler, for example. It just so happens that puma phased restarts combined with a deployment strategy that removes old releases exposes the bug. It's also a little frustrating, too, because if you were running puma 3.x (before the introduction of nio4r in 4.x), and you were removing old releases, that was probably fine since puma 3.x had no runtime dependencies on gems that had native extensions.

I believe this should be solved by Puma as soon as possible. At least a workaround.

@gencer Puma doesn't owe you anything. Maintainers are often unpaid volunteers. If you can do phased restarts using other web servers, then you should use those other web servers.

@gencer
Copy link

gencer commented Feb 20, 2020

@cjlarose Neither puma nor its developers and also volunteers doesn't owe me anything. I am deeply sorry if I offended anyone. I didn't mean that way. You misunderstood me. I was just trying to say that it should be fixed on puma-gem (that's what i believe because other web servers doesnt have this issue and puma 3.x also don't have this.) not by puma developers. This could be of course "me" or someone who knows these things better than me.

Please accept my apologies on this matter.

@MSP-Greg
Copy link
Member

This issue is very messy.

because other web servers don't have this issue and puma 3.x doesn't have this.

Yes, but @cjlarose noticed the issue with extension files, whether they be Puma dependencies (4.x only), or app dependencies (noticed with 3.x). So, it may not really be a '3.x is ok, 4.x is bad' problem. It may just appear that way because 4.x has an extension gem dependency of nio4r.

At various times, there have issues in Ruby with symlinks and the absolute paths of symlinks, where code incorrectly expected the two to be equal. That's just Ruby, who knows about Bundler/RubyGems, especially when changing a symlink to affect what files a running Ruby instance uses.

Two people have shown a log message of 'Could not find nio4r-2.5.1 in any of the sources'. That's a Bundler message, so, either Puma is misconfiguring Bundler, or Bundler is not functioning as expected.

One thing that could be tried is changing:

require 'nio'

to:

require 'nio' unless Kernel.const_defined?(:NIO)

the above isn't a solution, just trying to bypass Bundler's resolution to see if that's the issue.

* submitted by a Puma volunteer

@nateberkopec
Copy link
Member

Seen on CI

@MSP-Greg
Copy link
Member

Seen on CI

More than one job failed with the same error on Travis. And, jobs using the same Rubies pass on Actions. Good find.

I feel guilty, being one who often says "CI is more than green checks & red X's, look at the logs"...

@cjlarose
Copy link
Member

#1875 tracks the progress of a new architecture in puma that'll hopefully resolve this issue.

@cjlarose
Copy link
Member

#2374 is merged, which was a prerequisite for the changes I want to introduce to fix this issue.

I've got a branch that fixes this issue. I'm working on fixing my tests for it.

@cjlarose
Copy link
Member

I opened #2407 which is a robust solution to the original problem described in this issue. It makes it so that the puma master process can use totally separate gems from the puma worker processes, so it's fine to remove nio4r's compiled native extensions on disk and perform phased restarts. It also supports doing the same for any gem with native extensions loaded by the puma master process, such as ffi if you're using puma_worker_killer. The problem with that PR, though, is that it increases the overall memory footprint of a puma cluster since worker processes can't leverage COW memory by forking from the master process. We ultimately decided to close it.

I opened #2427 which fixes the nio4r issue with regard to phased restarts. It does not allow one to remove from disk compiled native extensions for other gems that are loaded by the puma master process (like ffi), but I suspect this doesn't matter to most people. That PR will fix this problem for most folks.

For anyone else, I opened rubygems/rubygems#4004 which I think is the root cause of the issue in Bundler. If that issue is resolved, then it will be possible to perform a phased restart after removing compiled native extensions for any gem loaded by the puma master process, including ffi.

@gingerlime
Copy link
Contributor Author

gingerlime commented Oct 14, 2020

@cjlarose you're my hero 🦸‍♂️ thank you so much. The bundler report was impeccable.

I have only one question however: if bundler resolves this (rather than Puma works around it), would a phased restart also be able to update ffi or nio4r if these gems are updated? I guess it's not a very common scenario, but I wasn't sure what happens then...

@cjlarose
Copy link
Member

cjlarose commented Oct 14, 2020

@cjlarose you're my hero 🦸‍♂️ thank you so much. The bundler report was impeccable.

Happy to help!

I have only one question however: if bundler resolves this (rather than Puma works around it), would a phased restart also be able to update ffi or nio4r if these gems are updated? I guess it's not a very common scenario, but I wasn't sure what happens then...

Excellent question. I can provide additional clarification.

Without a fix in Bundler for rubygems/rubygems#4004, but with the workaround in puma #2427, it'll be possible to perform phased restarts in a way that upgrades nio4r without a problem. This works because nio4r won't be loaded into the puma master process at all. Workers all independently load their own copy of nio4r.

With a fix in Bundler for rubygems/rubygems#4004 and with the workaround in puma #2427 applied, of course that won't make the situation any different for nio4r: phased restarts will work, even if you're upgrading nio4r. But the story is different for ffi (or really any gem that has native extensions and is loaded into the puma master process via extra_runtime_dependencies or dependencies thereof). This is because the same version of ffi will be loaded into the worker processes when they fork from the master process. Phased restarts will not support upgrading ffi. The value of the Bundler fix is just that you'll be able to freely delete the compiled native extensions on disk without worry that the subsequent phased restart will fail.

To sum it all up, this my advice for running puma after #2427 is merged: if you perform phased restarts to upgrade your application served by puma (assuming you have prune_bundler on, and regardless of whether or not you use a deployment strategy that involves deleting the gems of old releases):

  • Do not try to upgrade puma itself. Workers will fail on boot.
  • Do not try to upgrade any gem that is loaded into the puma master process via extra_runtime_dependencies or dependencies thereof (such as ffi if you're using puma_worker_killer). Workers will fail on boot.
  • If you want to do either of these things, perform a hot restart instead.

If you do have a deployment strategy that deletes the gems of old releases (assuming rubygems/rubygems#4004 is not fixed):

  • Do not put any gems into extra_runtime_dependencies that have native extensions or has dependencies that have native extensions (like puma_worker_killer and ffi) at all. Workers will fail on boot during a phased restart. Hot restarts are your only option here if you need these dependencies.

If you have a deployment strategy that deletes the gems of old releases (assuming rubygems/rubygems#4004 is fixed):

  • Feel free to put gems like puma_worker_killer and ffi into extra_runtime_dependencies. Phased restarts will work so long as you're not changing versions of those gems.

I think I'll go ahead and add this to the restart documentation for puma since it's important to operators to know when it's safe to perform a phased restart. Longer term, another potential idea is for puma to automatically perform a hot restart instead if it detects that it cannot safely perform a phased restart.

@S3od22

This comment has been minimized.

@gingerlime
Copy link
Contributor Author

gingerlime commented Oct 25, 2020

Hey @cjlarose thanks so much for the input, and sorry for the delay. Was on holidays. I'm really keen to try to upgrade Puma now that #2427 is merged. Thanks for your awesome work and patience.

I still have one question however about upgrading puma in future. Our setup does not involve any extra_runtime_dependencies and only using prune_bundler. So if we need to upgrade puma/ffi in future, we cannot rely on phased restarts but rather need to full-restart puma after an upgrade. Right? can you suggest some kind of an automated check during deployment to anticipate this? i.e. see if the version of puma/ffi differs and then issue a full restart instead of the regular phased restart?

Hope I got things right, but want to fool-proof it in future as much as possible to avoid nasty surprises :)

@nateberkopec
Copy link
Member

Released in 5.0.3.

@cjlarose
Copy link
Member

cjlarose commented Oct 28, 2020

I still have one question however about upgrading puma in future. Our setup does not involve any extra_runtime_dependencies and only using prune_bundler. So if we need to upgrade puma/ffi in future, we cannot rely on phased restarts but rather need to full-restart puma after an upgrade. Right?

Because you're not using extra_runtime_dependencies, you don't need to worry about ffi upgrades. Those will work just fine. For puma upgrades, a "hot restart" is the most graceful way to restart.

can you suggest some kind of an automated check during deployment to anticipate this? i.e. see if the version of puma/ffi differs and then issue a full restart instead of the regular phased restart?

This is definitely somewhere where puma is lacking. It'd be great if there was an option to pumactl phased-restart that either checked first to see if a phased restart is possible (and use a hot restart if not) or attempted to perform a phased-restart, but automatically fall back to a hot restart if it fails for whatever reason.

On (proprietary) applications that I maintain, we have a kinda complicated start/stop script that basically does this by utilizing the stats command offered by the Puma control/status server through pumactl stats. With stats, we can monitor the progress of a phased restart (by inspecting the phase of each worker). If the phased restart doesn't make progress within a certain timeout, we fall back to a hot restart.

I think pumactl phased-restart should just do this automatically. I opened a separate issue to address this. #2465

@gingerlime
Copy link
Contributor Author

gingerlime commented Oct 29, 2020

Thanks again @cjlarose ! I'll check out pumactl and see if we can script something. And great to see #2465 ! hope there's some way I can help with this.

I'm having trouble after upgrading to puma 5.0.4 unfortunately... I did a hot restart to make sure I'm running puma 5 but after a few deploys (with phased-restarts) I see these errors in my puma.log file and I think the paths are pointing to folders that were deleted... The gem it complains about wasn't actually updated recently.

/usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:86:in `block in materialize': Could not find json-2.3.1 in any of the sources (Bundler::GemNotFound)
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `map!'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `materialize'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:101:in `block in definition_method'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler.rb:149:in `setup'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `block in <top (required)>'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:136:in `with_level'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:88:in `silence'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
        from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:312:in `rack_builder'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:331:in `load_rackup'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:257:in `app'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:142:in `app'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:146:in `start_server'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster/worker.rb:57:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:195:in `worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `block in spawn_worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `fork'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `spawn_worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:78:in `block in spawn_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `times'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `spawn_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:142:in `check_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:410:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/launcher.rb:171:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/bin/puma-wild:25:in `<main>'
[12769] + Gemfile in context: /var/local/app/app.d/1391/Gemfile
[12769] Early termination of worker

Any tips on how to further troubleshot / investigate this?

@gingerlime
Copy link
Contributor Author

please let me know if I should open a separate issue for this, but it seems to keep happening for us after a hot restart and a few phased restarts with puma 5.0.4

@dentarg
Copy link
Member

dentarg commented Oct 29, 2020

@gingerlime I think you can open up a new issue as the issue is now about json and not nio4r

@gingerlime
Copy link
Contributor Author

Thanks @dentarg will do (it's related to phased-restarts though, but definitely looks like a different thing)

@gingerlime
Copy link
Contributor Author

see #2471

@alx75
Copy link

alx75 commented Jun 25, 2021

Thanks @cjlarose for all this work. I do have one question about what you said concerning puma_worker_killer

If you do have a deployment strategy that deletes the gems of old releases (assuming rubygems/rubygems#4004 is not fixed):
Do not put any gems into extra_runtime_dependencies that have native extensions or has dependencies that have native extensions (like puma_worker_killer and ffi) at all. Workers will fail on boot during a phased restart. Hot restarts are your only option here if you need these dependencies.

From what I understand if I want to use puma_worker_killer I need to list every dependencies in the extra_runtime_dependencies otherwise the master process will pick one version outside of the bundle context. This means in the puma_worker_killer scenario on a linux server I need to have :

extra_runtime_dependencies ['puma_worker_killer', 'get_process_mem']

It seems for linux we do not need to add ffi here as it is loaded by get_process_mem (v0.2.7) only for macos machine. It seems to me this configuration is safe to be used with prune_bundler even if rubygems/rubygems#4004 is not fixed as there is not native extension.

Is my understanding correct ?

@cjlarose
Copy link
Member

@alx75 I think your intuition is correct here. Since the ffi gem would not be activated in the puma master process in your case, then the phased restart should complete successfully.

I was able to test this reliably here: https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors/tree/ffi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug waiting-for-review Waiting on review from anyone
Projects
None yet
10 participants