New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phased-restart errors after upgrading from Puma 3.12.6 to 5.0.4 #2471
Comments
For some reason I remember bringing this up during the |
From #2018 (comment)
Is Puma master process still using json because of the following? puma/lib/puma/cluster/worker_handle.rb Lines 43 to 47 in f0dffd4
Lines 441 to 442 in f0dffd4
|
For reference, here's that discussion: #2427 (comment) |
This issue is about running bundle/setup, and hence, is not being triggered by any I'm not sure yet, but Puma needs json, but it's not in a gemspec/Gemfile. Also, not sure about the environment, but Ruby 2.6.x's default json is 2.1.0, and the log refers to the current json version of 2.3.1 @gingerlime does your app use json? |
@MSP-Greg yes, I'm pretty sure it does (I would imagine most apps these days would?) I can find it in our Gemfile.lock file, although I'm not sure where the requirement comes from (it's not in our Gemfile) EDIT: I can see a gem that requires json (algoliasearch), plus a few other gems that require multi_json in our app |
Thanks, No need to figure what's dependent on it. I need to drink more coffee and think about this. I assume json-2.3.1 is installed somewhere in your app bundle? |
Yes, exactly. This is the version we have installed in our Gemfile.lock file Enjoy your ☕ and thanks for your help! |
Puma has the following in its gemspec s.add_runtime_dependency "nio4r", "~> 2.0" We do not have a similar line for json. I'm not sure if that's the issue. |
This is essentially the same issue as #2018 and the behavior is not especially surprising to me. From https://github.com/puma/puma/blob/d4d1ed374a72d0df020d9512707b9eea17344135/docs/restart.md
If the puma master process has activated any gem that has native extensions (such as @dentarg is essentially right here. I think the change to include |
I should mention, too, that if a solution is found for rubygems/rubygems#4004, then we can safely use the With or without a fix for rubygems/rubygems#4004, I think our best move forward is to remove the use of the |
Regardless, should the following be added to the gemspec? If it's no longer optional... s.add_runtime_dependency "json", "~> 2.3" |
My problem also. It's not a matter of whether it exists, it's a matter of whether/how it gets resolved (which is the issue here), and runtime dependencies are handled in a specific matter. Especially when they are shared by the app, which is less likely with nio4r... I think. |
I'll start by seeing if I can write a red test for this bug. I don't know what the fix will be yet since we probably don't want to re-introduce the malformed JSON bug that was fixed by #1801 .
|
If you're thinking about how |
No, I'm thinking about how the last Puma line leading to the error in the above log is (v5.0.4): puma/lib/puma/configuration.rb Line 312 in 1f0c08e
It's not throwing the exception on a |
I see. I think you're on to something with the
I think this indicates that the JSON gem is being double-loaded (it's loaded once by the puma master process and loaded again by the workers). Usually I'll spend more time digging into this, but I suspect @MSP-Greg 's hunch that we should add the |
I was able to create a minimal repro for @gingerlime's issue in Docker https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors/tree/could-not-find-json It demonstrates that if you have a deployment process that deletes gems from old releases and your application has a dependency on the I'm still investigating the precise mechanism that causes it. |
I can confirm this error too ( we also have fork_worker enabled). |
Messy. I took some of the code you mentioned above. Running four tests, upgrade/downgrade by json/nio4r. I had to shut off warnings when requiring, as reloading code will always show the constant warnings. But, the downgrade json test is always failing (in a big way). Also, json and nio4r have to explicitly be added to the 'project' Gemfile. So, I'm still working on it. The branch in my fork is https://github.com/MSP-Greg/puma/tree/json. There's two commits, the first renames test files, etc. The second (most recent) is the actual code changes... |
Ok so I took a look at all of the places where we explicitly use the The worker
|
require 'json' | |
io << "p#{Process.pid}#{server.stats.to_json}\n" |
This usage of require 'json'
is safe. It necessarily happens inside the worker process. If the user's application depends on json
, it'll use that version. Otherwise, it'll use the version of json
bundled with Ruby.
WorkerHandle#ping!
puma/lib/puma/cluster/worker_handle.rb
Lines 43 to 48 in d4d1ed3
def ping!(status) | |
@last_checkin = Time.now | |
require 'json' | |
@last_status = JSON.parse(status, symbolize_names: true) | |
end | |
This use of require 'json'
is not safe. Assuming prune_bundler
is on, the first time this is called will be after the first worker has booted. That worker will check in to the master process and the master process will try to parse the message. Under the normal RubyGems require
logic, if the user's application defined a dependency on the json
gem, the puma master process will add that version to the $LOAD_PATH
, activate the json
gem's gemspec, and load
the puma gem into memory. If the user's application did not define such a dependency, this will load
the version of the json
gem that's bundled with Ruby.
Subsequent attempts to boot workers will try to use the same version as the version that was first require
'd. As a consequence, so long as the application declares a dependency on the json
gem, removing that first version's representation on disk will cause workers to fail, even if they're booting up a version of the application that depends on the same version of json
(they didn't upgrade or downgrade). The root cause of this particular problem is rubygems/rubygems#4004. The other problem is that once the json
gem is load
ed into the puma master process, it is no longer possible to upgrade or downgrade the json
gem at all in a subsequent phased restart (this is true even if you don't delete old versions of your application). In short, our best path forward is to remove the json
gem entirely from being loaded into the master process.
The use of the JSON gem in this particular method was introduced in #2124 and can be safely reverted. Doing so will probably fix @gingerlime's problem and allow many other folks to upgrade to a modern version of Puma. The remaining two places where we require 'json'
probably affect fewer people, but the impact is just as severe.
Status server requests to /gc-stats
, /stats
, and /thread-backtraces
.
Lines 25 to 27 in d4d1ed3
if env['PATH_INFO'] =~ /\/(gc-stats|stats|thread-backtraces)$/ | |
require 'json' | |
end |
Lines 51 to 62 in d4d1ed3
when 'gc-stats' | |
GC.stat.to_json | |
when 'stats' | |
@launcher.stats.to_json | |
when 'thread-backtraces' | |
backtraces = [] | |
@launcher.thread_status do |name, backtrace| | |
backtraces << { name: name, backtrace: backtrace } | |
end | |
backtraces.to_json |
This usage of require 'json'
is similarly unsafe. As soon as json
is required into the puma master process, it's possible for subsequent worker boot operations to fail (either because the native extensions for an old release were deleted or because the version of json
has changed in the new version of the application).
The use of the json
gem in the status server was introduced in #1801 in order to fix a bug with malformed responses from /gc-stats
on JRuby. At the time of authorship of #1801, /stats
did not rely on the json
gem and /thread-backtraces
did not yet exist.
/thread-backtraces
is a fairly recent addition from #2054. At the time of authorship, it just depended on the json
gem having already been loaded.
It is probably possible to fix the original problem of malformed JSON with /gc-stats
without loading the json
gem. It would just mean that we'd have to implement and maintain basically a hand-rolled JSON serialization implementation.
For /stats
, it's possible to implement this without the json
gem. In fact, before #2086, it didn't use the json
gem at all--just hand-rolled JSON serialization.
For /thread-backtraces
, this can be implemented with some more hand-rolled JSON serialization, but it never has been in Puma's history (like I said, this one is fairly recent).
Puma.stats
Lines 27 to 30 in d4d1ed3
def self.stats | |
require 'json' | |
@get_stats.stats.to_json | |
end |
This use of require 'json'
is similarly unsafe for the same reasons as before.
Puma.stats
returns a JSON-encoded string. It is considered public API.
#2086 modified this method to just return a Hash instead of a String, but that was later rolled back in #2253 because it was a breaking API change. Interestingly, the revert operation in #2253 effectively deferred JSON serialization until the last moment (instead of Cluster#stats
returning a JSON-encoded string as it did before #2086, it returns a hash). Conversion to a JSON string happens in the body of Puma.stats
.
Similar to the /stats
in the discussion about the status server, there was a time when this method did not depend on the json
gem, and we can roll it back with some more hand-rolled JSON serialization.
I'll start with removing the use of the |
#2473 (and some follow-up work) has been merged. I'll take a look at implementing the fixes for the status server and for |
Opened #2479 to fix the remaining uses of the |
@gingerlime If you're able to test your application against puma's master branch, we'd appreciate finding out if it works for you! Otherwise, we can wait 'till the next puma release. We appreciate your patience and can't wait for you to be able to upgrade to puma 5.x! |
@cjlarose yes, of course. it's the least I could do :) I'll get it tested on our staging environment so we can go through a number of deploys and hopefully spot any issues. I'll report back within a day or two at the latest. |
@cjlarose we've been running puma master all day on our staging environment, with a dozen or so deploys all using phased restarts. So far it looks great 👍 🚀 I'm not sure we really tested all scenarios, like Gem upgrades/new installs/removals though, but I'm definitely optimistic about the upgrade to 5.x now. Thanks so much for all your hard work and patience. |
EDIT: Please ignore. Seems completely unrelated to puma 5.x vs 3.x |
Describe the bug
This seems to happen when doing phased-restarts using versioned deploy folders with symlinks.
We use a folder structure similar to this:
When we deploy, we increment the version, run bundle etc, then switch the symlink and do a phased-restart of puma.
All seems to work fine, but after a few deploys, we also start pruning older versions, and then we see this error in the puma.log
❗️ Note that the app now lives in
/app/app.d/1390
but the errors are pointing to puma in/app/app.d/1386
❗️Puma config:
If I do a full restart, then things are back to normal, until it happens again (we keep 3 deploys back, so after 4 deploys...)
To Reproduce
I was only able to reproduce this on our staging environment so far
Expected behavior
phased-restart should reload everything from the new folder... It does seem to point to the right folder, but there's some mysterious left-overs pointing to something old somewhere??
Server:
The text was updated successfully, but these errors were encountered: