phased-restart: Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound) #2018

gingerlime · 2019-10-08T13:16:50Z

Describe the bug

This seems to happen when doing phased-restarts using versioned deploy folders with symlinks.

We use a folder structure similar to this:

/app/app.d/{version (int)}
/app/app -> /app/app.d/{version} # symlink
/app/prev -> /app/app.d/{version-1} # symlink

When we deploy, we increment the version, run bundle etc, then switch the symlink and do a phased-restart of puma.

All seems to work fine, but after a few deploys, we also start pruning older versions, and then we see this error in the puma.log

[12357] - Starting phased worker restart, phase: 4
[12357] - Starting phased worker restart, phase: 4
[12357] + Changing to /var/local/app/app
[12357] + Changing to /var/local/app/app
[12357] - Stopping 28107 for phased upgrade...
[12357] - Stopping 28107 for phased upgrade...
[12357] - TERM sent to 28107...
[12357] - TERM sent to 28107...
[7886] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7886] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7886] Early termination of worker
[7886] Early termination of worker
[7944] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[7944] + Gemfile in context: /var/local/app/app.d/909/Gemfile
/usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:91:in `block in materialize': Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound/usr/lib/ruby/site_ruby/2.5.0
/bundler/spec_set.rb:91:in `block in materialize': Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound)
)
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `map!'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `materialize'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:108:in `block in definition_method'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `map!'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/spec_set.rb:85:in `materialize'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:108:in `block in definition_method'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler.rb:107:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler.rb:107:in `setup'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.5.0/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:302:in `rack_builder'
        from /usr/lib/ruby/site_ruby/2.5.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:302:in `rack_builder'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:321:in `load_rackup'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:321:in `load_rackup'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:246:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/configuration.rb:246:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:158:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:158:in `app'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:165:in `start_server'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/runner.rb:165:in `start_server'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:269:in `worker'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:269:in `worker'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block (2 levels) in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block (2 levels) in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `fork'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `fork'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `times'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:138:in `block in spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `times'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:134:in `spawn_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:204:in `check_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:204:in `check_workers'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:487:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cluster.rb:487:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/bin/puma-wild:31:in `<main>'
        from /var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/bin/puma-wild:31:in `<main>'
[8048] + Gemfile in context: /var/local/app/app.d/909/Gemfile
[8048] + Gemfile in context: /var/local/app/app.d/909/Gemfile

❗️ Note that the app now lives in /app/app.d/909 but the errors are pointing to puma in /app/app.d/906 ❗️

Puma config:

#!/usr/bin/env puma

# The directory to operate out of.
#
# The default is the current directory.
#
directory "/var/local/app/app"

# Set the environment in which the rack's app will run. The value must be a string.
#
# The default is "development".
#
environment ENV["RAILS_ENV"] || "development"

# Store the pid of the server in the file at "path".
#
pidfile "/var/run/puma/puma.pid"

# Use "path" as the file to store the server info state. This is
# used by "pumactl" to query and control the server.
#
state_path "/var/run/puma/puma.state"

# Redirect STDOUT and STDERR to files specified. The 3rd parameter
# ("append") specifies whether the output is appended, the default is
# "false".
#
stdout_redirect "/var/log/puma.log", "/var/log/puma.log", true

# Configure "min" to be the minimum number of threads to use to answer
# requests and "max" the maximum.
#
# The default is "0, 16".
#
# threads 0, 16
threads 8, 8


# Bind the server to "url". "tcp://", "unix://" and "ssl://" are the only
# accepted protocols.
#
# The default is "tcp://0.0.0.0:9292".
#
bind 'unix:///var/run/puma/puma.sock'


# === Cluster mode ===

# How many worker processes to run.  Typically this is set to
# to the number of available cores.
#
# The default is "0".
#
workers ENV["PUMA_WORKERS"] || 16

# Allow workers to reload bundler context when master process is issued
# a USR1 signal. This allows proper reloading of gems while the master
# is preserved across a phased-restart. (incompatible with preload_app)
# (off by default)

# see https://github.com/puma/puma/blob/master/docs/deployment.md#restarting
prune_bundler

If I do a full restart, then things are back to normal, until it happens again (we keep 3 deploys back, so after 4 deploys...)

To Reproduce

I was trying to reproduce this in a clean environment, but I can't manage to do it so far... :-/

Expected behavior
phased-restart should reload everything from the new folder... It does seem to point to the right folder, but there's some mysterious left-overs pointing to something old somewhere??

Server:

OS: Ubuntu 18.04 with systemd
Puma Version 4.2.1 (it doesn't happen with 3.12)

The text was updated successfully, but these errors were encountered:

gingerlime · 2019-10-08T13:23:52Z

@mcg @MSP-Greg maybe you guys have some suggestions at least what to test or how to try to reproduce this? I tried it with a toy app and using symlinks in a similar structure, but somehow can't reproduce it in this environment... But it does seem to happen fairly consistently in our staging environment so far.

gingerlime · 2019-10-08T13:58:28Z

One small thing I saw in our staging logs, but I don't see in my toy environment is:

Early termination of worker

Maybe it's related to the problem?

nateberkopec · 2019-10-10T09:12:54Z

Possibly related to the changes we made to prune_bundler lately.

gencer · 2019-10-11T15:59:55Z

I have a problem similar to this. When I update a gem in Gemfile that puma uses like puma itself or as original poster said nio4r and try to do phased restart (SIGUSR1), My app does not get reload due to mismatched gem versions.

See here: #705

So, Is this issue related to this prune_bundler changes made lately?

In development, I got same error. Early termination of worker

nateberkopec · 2019-10-23T07:39:15Z

When I update a gem in Gemfile that puma uses like puma itself or as original poster said nio4r and try to do phased restart

Makes sense if you look at the code for how prune_bundler works. We'll have to figure out how to change/reload Puma's spec.

gingerlime · 2019-10-23T08:00:15Z

Slightly OT, but is it worth switching from prune_bundler to the more "traditional"(??) before_fork / on_worker_boot? btw, I find the README section about clustered mode conflicting with Deployment restarting docs e.g.

(README)

If you're using a configuration file, use the preload_app! method:

(Deployment)

Don't use preload!. This dirties the master process and means it will have to shutdown all the workers and re-exec itself to get your new code. It is not compatible with phased-restart and prune_bundler as well.

(also the deployment doc mentions using prune_bundler but there's no mention of it at all on the main README)

Which advice should I follow? :)

nateberkopec · 2019-10-23T08:49:58Z

If you need phased restarts or prune_bundler, you cannot preload_app. If you don't need either of those things, use preload_app.

gingerlime · 2019-10-23T09:08:35Z

Sorry if this is a dumb question, but how do I know if I need prune_bundler?

I do need phased restarts for zero-downtime deploys.

gencer · 2019-10-23T11:06:09Z

We'll have to figure out how to change/reload Puma's spec.

This will be awesome if we can have a solution for this. At least a workaround until a permanent solution comes up. Since, updating those specs requires full restart and this means downtime.

p8 · 2019-11-21T20:11:45Z

@gingerlime I thinks this problem occurs because you vendor the ruby gems in /var/local/app/app.d/{version (int)}. See:

/var/local/app/app.d/906/vendor/bundle/ruby/2.5.0/gems/puma-4.2.1/lib/puma/launcher.rb:172:in `run'

If this version gets deleted the vendored gems also get deleted.
You probably want to install gems in a shared persistent directory like: /var/local/app/app.d/shared

gingerlime · 2019-11-21T20:17:08Z

Thanks for the suggestion! But this is working 100% fine with puma 3 for a number of years now. How come it stopped working with 4??

p8 · 2019-11-21T20:29:04Z

Puma 3 had no dependent gems. Puma 4 added nio4r as a dependency.
I'm not sure but maybe bundler tries to reload nio4r? Where as before it didn't need to reload anything?

gingerlime · 2019-11-21T20:51:03Z

Good point! However isn’t puma 4 potentially breaking prune_bundler then ?

dentarg · 2019-11-21T22:16:02Z

@gingerlime The bug might have been there since before Puma 4, see #1593 and rubygems/bundler#6667, looks like the same problem you have but under different circumstances

There's a PR trying to fix it #1893, perhaps you could try it out?

Puma v4.3.0 is still using Bundler.with_clean_env

puma/lib/puma/launcher.rb

Line 267 in 395337d

Bundler.with_clean_env do

#1893 switches to Bundler.with_original_env which reportedly solved the issue for one user in rubygems/bundler#6667

gingerlime · 2019-11-22T07:32:35Z

Thanks @dentarg very interesting. I'll try to find some time to try out #1893 and see. Reproducing the problem in itself was a challenge, and I was only seeing it in our staging environment (which is shared, so I can't experiment there for too long). Hope I can still reproduce it, and then try again with #1893 and report back here.

Otherwise, is there a workaround perhaps? maybe moving away from prune_bundler? what's the recommended approach for zero-downtime deploys with the now "old-school" symlinked structure?

gencer · 2019-11-22T07:53:59Z

As @gingerlime pointed out, I am also interested in any workarounds for zero-downtime deployments.

- Cannot boot puma workers from puma.log on AWS EB -- === puma startup: 2019-12-09 12:47:04 +0000 === === puma startup: 2019-12-09 12:47:04 +0000 === [19558] Early termination of worker [19600] Early termination of worker [19619] Early termination of worker [19635] Early termination of worker [19640] Early termination of worker [19646] Early termination of worker [19654] Early termination of worker [19659] Early termination of worker puma/puma#2018 puma/puma#1893

cjlarose · 2020-02-18T19:02:07Z

My team was experiencing this same issue. For us, the problem had to do with how our release process worked, not necessarily with Puma itself. Like @gingerlime, this error happened on our staging environment much more reliably and more frequently than it did in production.

In a nutshell: If your release process removes old releases (specifically, the compiled native extensions on which the puma master relies), new puma workers will check to see if the native extensions are compiled, fail to find that evidence, and bundler will ignore the installed gem, eventually raising Bundler::GemNotFound.

I don't have a ready-to-go public repro repo, but this is the setup (forgive my sloppiness):

Create "release_1", a directory containing your application code as well as your gems (bundle install --path .bundle). Point your current release symlink to release_1.
Start a puma master with a cluster (workers 2 or something). bundle exec puma -C puma.rb). You must have prune_bundler in there, too.
Create "release_2". Point your current release symlink to release_2. Issue a phased_restart. This will succeed since although bundler/setup on the release_2 workers will check for existence of the compiled native extensions in the directory for release_1, they're still present on disk, so everything's good.
rm -rf release_1. If you CD tools delete old releases to save space, that might be happening for you automatically.
Create "release_3". Point your current release symlink to release_3. Issue a phased restart. This will fail with the error Bundler::GemNotFound, since Gem::Specification#missing_extensions? returns true after failing to find the file at the path returned by the method gem_build_complete_path (that will encode a path to release_1, which was removed already). The workers will die and the master will restart new workers, but those workers will fail with the same error again and again and again.

This problem, I imagine, happens much more frequently with puma 4.x than with puma 3.x because puma 4.x has a runtime dependency on a gem that has native extensions. Any time that the puma master has "activated" a gem that has native extensions, workers can encounter this error if they try to activate the same gems.

My team still experienced a similar issue with puma 3.x, not with the nio4r gem, but instead with ffi, since we were loading PumaWorkerKiller (which has a runtime dependency on ffi) in the puma master. New workers would start, fail to find native extensions for ffi, and exit with the error Bunlder::GemNotFound.

For a quick workaround, just keep your old releases around on your servers (or at the very least the release directory for the version of your application where you first started the puma master). You can safely perform phased restarts as you'd expect.

My team is investigating other solutions, but we're not far along in that process yet. We'll make sure to share what we find.

nateberkopec · 2020-02-18T19:55:32Z

Everyone having simlar issues should try master branch, as #1893 was merged and may help.

cjlarose · 2020-02-19T17:05:46Z

I created a repo that demonstrates that this error occurs reliably if you're deleting old releases

https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors

The master branch of that repo gets the error described in this thread. It's using puma 4.3.1.

Could not find nio4r-2.5.2 in any of the sources
Run `bundle install` to install missing gems.
[292] Early termination of worker

The puma-master branch of the repro repo is using puma at commit 8c9b3eb, which includes changes from #1893.

Workers still fail on startup, but with a different error

/usr/src/releases/1582131641325997600/Gemfile not found

dentarg · 2020-02-19T19:13:44Z

From the repro repo

This project reproduces an issue related to phased restarts in Puma. Specifically, it demonstates that if you delete the compiled native extensions that were used when the puma master started, the next phased restart will fail.

Is that a problem Puma should solve? 🤔

gencer · 2020-02-19T19:17:49Z

@dentarg, don't you think so? So you mean that we will update puma for getting not just for better features but for better security in the future but in fact we will not be able to gracefully deploy due to this issue because it crashes.

I believe this should be solved by Puma as soon as possible. At least a workaround.

For now, I just forcefully reject pull requests for Puma.

Update: I realized that previous quote seems mentioning about 3rd party native extension that is not related to puma. If this is the case, I believe if we can do phased-restart on unicorn and/or other web servers (and we do) we should be able to do so with puma too.

cjlarose · 2020-02-19T21:46:33Z

Is that a problem Puma should solve?

@dentarg I'm not certain about this either. The bug might not even be with Puma itself--it might be a bug with bundler, for example. It just so happens that puma phased restarts combined with a deployment strategy that removes old releases exposes the bug. It's also a little frustrating, too, because if you were running puma 3.x (before the introduction of nio4r in 4.x), and you were removing old releases, that was probably fine since puma 3.x had no runtime dependencies on gems that had native extensions.

I believe this should be solved by Puma as soon as possible. At least a workaround.

@gencer Puma doesn't owe you anything. Maintainers are often unpaid volunteers. If you can do phased restarts using other web servers, then you should use those other web servers.

gencer · 2020-02-20T01:28:54Z

@cjlarose Neither puma nor its developers and also volunteers doesn't owe me anything. I am deeply sorry if I offended anyone. I didn't mean that way. You misunderstood me. I was just trying to say that it should be fixed on puma-gem (that's what i believe because other web servers doesnt have this issue and puma 3.x also don't have this.) not by puma developers. This could be of course "me" or someone who knows these things better than me.

Please accept my apologies on this matter.

MSP-Greg · 2020-02-20T02:51:56Z

This issue is very messy.

because other web servers don't have this issue and puma 3.x doesn't have this.

Yes, but @cjlarose noticed the issue with extension files, whether they be Puma dependencies (4.x only), or app dependencies (noticed with 3.x). So, it may not really be a '3.x is ok, 4.x is bad' problem. It may just appear that way because 4.x has an extension gem dependency of nio4r.

At various times, there have issues in Ruby with symlinks and the absolute paths of symlinks, where code incorrectly expected the two to be equal. That's just Ruby, who knows about Bundler/RubyGems, especially when changing a symlink to affect what files a running Ruby instance uses.

Two people have shown a log message of 'Could not find nio4r-2.5.1 in any of the sources'. That's a Bundler message, so, either Puma is misconfiguring Bundler, or Bundler is not functioning as expected.

One thing that could be tried is changing:

puma/lib/puma/reactor.rb

Line 6 in afb27d5

require 'nio'

to:

require 'nio' unless Kernel.const_defined?(:NIO)

the above isn't a solution, just trying to bypass Bundler's resolution to see if that's the issue.

* submitted by a Puma volunteer

nateberkopec · 2020-02-20T15:36:58Z

Seen on CI

MSP-Greg · 2020-02-20T16:01:23Z

Seen on CI

More than one job failed with the same error on Travis. And, jobs using the same Rubies pass on Actions. Good find.

I feel guilty, being one who often says "CI is more than green checks & red X's, look at the logs"...

cjlarose · 2020-03-16T22:17:14Z

#1875 tracks the progress of a new architecture in puma that'll hopefully resolve this issue.

cjlarose · 2020-09-30T21:47:47Z

#2374 is merged, which was a prerequisite for the changes I want to introduce to fix this issue.

I've got a branch that fixes this issue. I'm working on fixing my tests for it.

cjlarose · 2020-10-14T06:33:15Z

I opened #2407 which is a robust solution to the original problem described in this issue. It makes it so that the puma master process can use totally separate gems from the puma worker processes, so it's fine to remove nio4r's compiled native extensions on disk and perform phased restarts. It also supports doing the same for any gem with native extensions loaded by the puma master process, such as ffi if you're using puma_worker_killer. The problem with that PR, though, is that it increases the overall memory footprint of a puma cluster since worker processes can't leverage COW memory by forking from the master process. We ultimately decided to close it.

I opened #2427 which fixes the nio4r issue with regard to phased restarts. It does not allow one to remove from disk compiled native extensions for other gems that are loaded by the puma master process (like ffi), but I suspect this doesn't matter to most people. That PR will fix this problem for most folks.

For anyone else, I opened rubygems/rubygems#4004 which I think is the root cause of the issue in Bundler. If that issue is resolved, then it will be possible to perform a phased restart after removing compiled native extensions for any gem loaded by the puma master process, including ffi.

gingerlime · 2020-10-14T07:01:53Z

@cjlarose you're my hero 🦸‍♂️ thank you so much. The bundler report was impeccable.

I have only one question however: if bundler resolves this (rather than Puma works around it), would a phased restart also be able to update ffi or nio4r if these gems are updated? I guess it's not a very common scenario, but I wasn't sure what happens then...

cjlarose · 2020-10-14T19:51:38Z

@cjlarose you're my hero 🦸‍♂️ thank you so much. The bundler report was impeccable.

Happy to help!

I have only one question however: if bundler resolves this (rather than Puma works around it), would a phased restart also be able to update ffi or nio4r if these gems are updated? I guess it's not a very common scenario, but I wasn't sure what happens then...

Excellent question. I can provide additional clarification.

Without a fix in Bundler for rubygems/rubygems#4004, but with the workaround in puma #2427, it'll be possible to perform phased restarts in a way that upgrades nio4r without a problem. This works because nio4r won't be loaded into the puma master process at all. Workers all independently load their own copy of nio4r.

With a fix in Bundler for rubygems/rubygems#4004 and with the workaround in puma #2427 applied, of course that won't make the situation any different for nio4r: phased restarts will work, even if you're upgrading nio4r. But the story is different for ffi (or really any gem that has native extensions and is loaded into the puma master process via extra_runtime_dependencies or dependencies thereof). This is because the same version of ffi will be loaded into the worker processes when they fork from the master process. Phased restarts will not support upgrading ffi. The value of the Bundler fix is just that you'll be able to freely delete the compiled native extensions on disk without worry that the subsequent phased restart will fail.

To sum it all up, this my advice for running puma after #2427 is merged: if you perform phased restarts to upgrade your application served by puma (assuming you have prune_bundler on, and regardless of whether or not you use a deployment strategy that involves deleting the gems of old releases):

Do not try to upgrade puma itself. Workers will fail on boot.
Do not try to upgrade any gem that is loaded into the puma master process via extra_runtime_dependencies or dependencies thereof (such as ffi if you're using puma_worker_killer). Workers will fail on boot.
If you want to do either of these things, perform a hot restart instead.

If you do have a deployment strategy that deletes the gems of old releases (assuming rubygems/rubygems#4004 is not fixed):

Do not put any gems into extra_runtime_dependencies that have native extensions or has dependencies that have native extensions (like puma_worker_killer and ffi) at all. Workers will fail on boot during a phased restart. Hot restarts are your only option here if you need these dependencies.

If you have a deployment strategy that deletes the gems of old releases (assuming rubygems/rubygems#4004 is fixed):

Feel free to put gems like puma_worker_killer and ffi into extra_runtime_dependencies. Phased restarts will work so long as you're not changing versions of those gems.

I think I'll go ahead and add this to the restart documentation for puma since it's important to operators to know when it's safe to perform a phased restart. Longer term, another potential idea is for puma to automatically perform a hot restart instead if it detects that it cannot safely perform a phased restart.

gingerlime · 2020-10-25T16:08:26Z

Hey @cjlarose thanks so much for the input, and sorry for the delay. Was on holidays. I'm really keen to try to upgrade Puma now that #2427 is merged. Thanks for your awesome work and patience.

I still have one question however about upgrading puma in future. Our setup does not involve any extra_runtime_dependencies and only using prune_bundler. So if we need to upgrade puma/ffi in future, we cannot rely on phased restarts but rather need to full-restart puma after an upgrade. Right? can you suggest some kind of an automated check during deployment to anticipate this? i.e. see if the version of puma/ffi differs and then issue a full restart instead of the regular phased restart?

Hope I got things right, but want to fool-proof it in future as much as possible to avoid nasty surprises :)

nateberkopec · 2020-10-26T13:07:55Z

Released in 5.0.3.

cjlarose · 2020-10-28T16:11:29Z

I still have one question however about upgrading puma in future. Our setup does not involve any extra_runtime_dependencies and only using prune_bundler. So if we need to upgrade puma/ffi in future, we cannot rely on phased restarts but rather need to full-restart puma after an upgrade. Right?

Because you're not using extra_runtime_dependencies, you don't need to worry about ffi upgrades. Those will work just fine. For puma upgrades, a "hot restart" is the most graceful way to restart.

can you suggest some kind of an automated check during deployment to anticipate this? i.e. see if the version of puma/ffi differs and then issue a full restart instead of the regular phased restart?

This is definitely somewhere where puma is lacking. It'd be great if there was an option to pumactl phased-restart that either checked first to see if a phased restart is possible (and use a hot restart if not) or attempted to perform a phased-restart, but automatically fall back to a hot restart if it fails for whatever reason.

On (proprietary) applications that I maintain, we have a kinda complicated start/stop script that basically does this by utilizing the stats command offered by the Puma control/status server through pumactl stats. With stats, we can monitor the progress of a phased restart (by inspecting the phase of each worker). If the phased restart doesn't make progress within a certain timeout, we fall back to a hot restart.

I think pumactl phased-restart should just do this automatically. I opened a separate issue to address this. #2465

gingerlime · 2020-10-29T07:52:42Z

Thanks again @cjlarose ! I'll check out pumactl and see if we can script something. And great to see #2465 ! hope there's some way I can help with this.

I'm having trouble after upgrading to puma 5.0.4 unfortunately... I did a hot restart to make sure I'm running puma 5 but after a few deploys (with phased-restarts) I see these errors in my puma.log file and I think the paths are pointing to folders that were deleted... The gem it complains about wasn't actually updated recently.

/usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:86:in `block in materialize': Could not find json-2.3.1 in any of the sources (Bundler::GemNotFound)
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `map!'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/spec_set.rb:80:in `materialize'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:170:in `specs'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:237:in `specs_for'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/definition.rb:226:in `requested_specs'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:101:in `block in definition_method'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/runtime.rb:20:in `setup'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler.rb:149:in `setup'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `block in <top (required)>'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:136:in `with_level'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/ui/shell.rb:88:in `silence'
        from /usr/lib/ruby/gems/2.6.0/gems/bundler-2.1.4/lib/bundler/setup.rb:20:in `<top (required)>'
        from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
        from /usr/lib/ruby/site_ruby/2.6.0/rubygems/core_ext/kernel_require.rb:92:in `require'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:312:in `rack_builder'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:331:in `load_rackup'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/configuration.rb:257:in `app'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:142:in `app'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/runner.rb:146:in `start_server'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster/worker.rb:57:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:195:in `worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `block in spawn_worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `fork'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:96:in `spawn_worker'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:78:in `block in spawn_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `times'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:71:in `spawn_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:142:in `check_workers'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cluster.rb:410:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/launcher.rb:171:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/lib/puma/cli.rb:80:in `run'
        from /var/local/app/app.d/1386/vendor/bundle/ruby/2.6.0/gems/puma-5.0.4/bin/puma-wild:25:in `<main>'
[12769] + Gemfile in context: /var/local/app/app.d/1391/Gemfile
[12769] Early termination of worker

Any tips on how to further troubleshot / investigate this?

gingerlime · 2020-10-29T12:34:10Z

please let me know if I should open a separate issue for this, but it seems to keep happening for us after a hot restart and a few phased restarts with puma 5.0.4

dentarg · 2020-10-29T12:42:35Z

@gingerlime I think you can open up a new issue as the issue is now about json and not nio4r

gingerlime · 2020-10-29T12:49:15Z

Thanks @dentarg will do (it's related to phased-restarts though, but definitely looks like a different thing)

gingerlime · 2020-10-29T12:53:53Z

see #2471

alx75 · 2021-06-25T14:48:10Z

Thanks @cjlarose for all this work. I do have one question about what you said concerning puma_worker_killer

If you do have a deployment strategy that deletes the gems of old releases (assuming rubygems/rubygems#4004 is not fixed):
Do not put any gems into extra_runtime_dependencies that have native extensions or has dependencies that have native extensions (like puma_worker_killer and ffi) at all. Workers will fail on boot during a phased restart. Hot restarts are your only option here if you need these dependencies.

From what I understand if I want to use puma_worker_killer I need to list every dependencies in the extra_runtime_dependencies otherwise the master process will pick one version outside of the bundle context. This means in the puma_worker_killer scenario on a linux server I need to have :

extra_runtime_dependencies ['puma_worker_killer', 'get_process_mem']

It seems for linux we do not need to add ffi here as it is loaded by get_process_mem (v0.2.7) only for macos machine. It seems to me this configuration is safe to be used with prune_bundler even if rubygems/rubygems#4004 is not fixed as there is not native extension.

Is my understanding correct ?

cjlarose · 2021-06-25T16:41:01Z

@alx75 I think your intuition is correct here. Since the ffi gem would not be activated in the puma master process in your case, then the phased restart should complete successfully.

I was able to test this reliably here: https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors/tree/ffi

gingerlime mentioned this issue Oct 8, 2019

Wait for demonized workers #1938

Closed

nateberkopec added the bug label Oct 10, 2019

nateberkopec mentioned this issue Feb 12, 2020

Preserve BUNDLE_GEMFILE and add a test for it #1893

Merged

nateberkopec added the waiting-for-review Waiting on review from anyone label Feb 24, 2020

AlexWayfer mentioned this issue Feb 25, 2020

Replace deprecated Bundler.with_clean_env with with_unbundled_env #2120

Closed

8 tasks

This was referenced Mar 5, 2020

"Preloader" process for preload + rolling restart #1875

Closed

BUNDLE_GEMFILE is incorrectly set to parent Gemfile in worker processes (when not setting BUNDLE_GEMFILE explicitly) #2133

Closed

dentarg mentioned this issue Mar 30, 2020

Regression in 4.3.3 with require json #2206

Closed

This was referenced Oct 2, 2020

Create new worker processes in separate memory spaces from master process #2407

Closed

Bundler.setup raises Bundler::GemNotFound for gems whose native extensions have been removed, even if the gem is already activated rubygems/rubygems#4004

Open

cjlarose mentioned this issue Oct 14, 2020

Fix Bundler::GemNotFound errors for nio4r gem #2427

Merged

8 tasks

nateberkopec closed this as completed in #2427 Oct 20, 2020

cjlarose mentioned this issue Oct 22, 2020

Update restart documentation [changelog skip] [ci skip] #2444

Merged

8 tasks

This comment has been minimized.

Sign in to view

cjlarose mentioned this issue Oct 28, 2020

Automatically fallback to a hot restart if phased restart fails #2465

Open

dentarg mentioned this issue Oct 29, 2020

phased-restart errors after upgrading from Puma 3.12.6 to 5.0.4 #2471

Closed

cjlarose mentioned this issue Dec 17, 2020

Remove nio from status server #2516

Merged

8 tasks

gingerlime mentioned this issue Apr 10, 2022

Bundler error prevents phased-restarting puma after upgrading to rails 7 #2843

Closed

phased-restart: Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound) #2018

phased-restart: Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound) #2018

Comments

gingerlime commented Oct 8, 2019 • edited

gingerlime commented Oct 8, 2019

gingerlime commented Oct 8, 2019 • edited

nateberkopec commented Oct 10, 2019

gencer commented Oct 11, 2019 • edited

nateberkopec commented Oct 23, 2019

gingerlime commented Oct 23, 2019 • edited

nateberkopec commented Oct 23, 2019

gingerlime commented Oct 23, 2019

gencer commented Oct 23, 2019

p8 commented Nov 21, 2019

gingerlime commented Nov 21, 2019

p8 commented Nov 21, 2019

gingerlime commented Nov 21, 2019

dentarg commented Nov 21, 2019

gingerlime commented Nov 22, 2019

gencer commented Nov 22, 2019

cjlarose commented Feb 18, 2020 • edited

nateberkopec commented Feb 18, 2020

cjlarose commented Feb 19, 2020 • edited

dentarg commented Feb 19, 2020

gencer commented Feb 19, 2020 • edited

cjlarose commented Feb 19, 2020

gencer commented Feb 20, 2020

MSP-Greg commented Feb 20, 2020

nateberkopec commented Feb 20, 2020

MSP-Greg commented Feb 20, 2020

cjlarose commented Mar 16, 2020

cjlarose commented Sep 30, 2020

cjlarose commented Oct 14, 2020

gingerlime commented Oct 14, 2020 • edited

cjlarose commented Oct 14, 2020 • edited

This comment has been minimized.

gingerlime commented Oct 25, 2020 • edited

nateberkopec commented Oct 26, 2020

cjlarose commented Oct 28, 2020 • edited

gingerlime commented Oct 29, 2020 • edited

gingerlime commented Oct 29, 2020

dentarg commented Oct 29, 2020

gingerlime commented Oct 29, 2020

gingerlime commented Oct 29, 2020

alx75 commented Jun 25, 2021 • edited

cjlarose commented Jun 25, 2021

gingerlime commented Oct 8, 2019 •

edited

gingerlime commented Oct 8, 2019 •

edited

gencer commented Oct 11, 2019 •

edited

gingerlime commented Oct 23, 2019 •

edited

cjlarose commented Feb 18, 2020 •

edited

cjlarose commented Feb 19, 2020 •

edited

gencer commented Feb 19, 2020 •

edited

gingerlime commented Oct 14, 2020 •

edited

cjlarose commented Oct 14, 2020 •

edited

gingerlime commented Oct 25, 2020 •

edited

cjlarose commented Oct 28, 2020 •

edited

gingerlime commented Oct 29, 2020 •

edited

alx75 commented Jun 25, 2021 •

edited