New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phased-restart: Could not find nio4r-2.5.1 in any of the sources (Bundler::GemNotFound) #2018
Comments
@mcg @MSP-Greg maybe you guys have some suggestions at least what to test or how to try to reproduce this? I tried it with a toy app and using symlinks in a similar structure, but somehow can't reproduce it in this environment... But it does seem to happen fairly consistently in our staging environment so far. |
One small thing I saw in our staging logs, but I don't see in my toy environment is:
Maybe it's related to the problem? |
Possibly related to the changes we made to prune_bundler lately. |
I have a problem similar to this. When I update a gem in Gemfile that puma uses like puma itself or as original poster said See here: #705 So, Is this issue related to this In development, I got same error. |
Makes sense if you look at the code for how prune_bundler works. We'll have to figure out how to change/reload Puma's spec. |
Slightly OT, but is it worth switching from (README)
(Deployment)
(also the deployment doc mentions using Which advice should I follow? :) |
If you need phased restarts or prune_bundler, you cannot preload_app. If you don't need either of those things, use preload_app. |
Sorry if this is a dumb question, but how do I know if I need I do need phased restarts for zero-downtime deploys. |
This will be awesome if we can have a solution for this. At least a workaround until a permanent solution comes up. Since, updating those specs requires full restart and this means downtime. |
@gingerlime I thinks this problem occurs because you vendor the ruby gems in /var/local/app/app.d/{version (int)}. See:
If this version gets deleted the vendored gems also get deleted. |
Thanks for the suggestion! But this is working 100% fine with puma 3 for a number of years now. How come it stopped working with 4?? |
Puma 3 had no dependent gems. Puma 4 added nio4r as a dependency. |
Good point! However isn’t puma 4 potentially breaking prune_bundler then ? |
@gingerlime The bug might have been there since before Puma 4, see #1593 and rubygems/bundler#6667, looks like the same problem you have but under different circumstances There's a PR trying to fix it #1893, perhaps you could try it out? Puma v4.3.0 is still using Line 267 in 395337d
#1893 switches to |
Thanks @dentarg very interesting. I'll try to find some time to try out #1893 and see. Reproducing the problem in itself was a challenge, and I was only seeing it in our staging environment (which is shared, so I can't experiment there for too long). Hope I can still reproduce it, and then try again with #1893 and report back here. Otherwise, is there a workaround perhaps? maybe moving away from |
As @gingerlime pointed out, I am also interested in any workarounds for zero-downtime deployments. |
- Cannot boot puma workers from puma.log on AWS EB -- === puma startup: 2019-12-09 12:47:04 +0000 === === puma startup: 2019-12-09 12:47:04 +0000 === [19558] Early termination of worker [19600] Early termination of worker [19619] Early termination of worker [19635] Early termination of worker [19640] Early termination of worker [19646] Early termination of worker [19654] Early termination of worker [19659] Early termination of worker puma/puma#2018 puma/puma#1893
My team was experiencing this same issue. For us, the problem had to do with how our release process worked, not necessarily with Puma itself. Like @gingerlime, this error happened on our staging environment much more reliably and more frequently than it did in production. In a nutshell: If your release process removes old releases (specifically, the compiled native extensions on which the puma master relies), new puma workers will check to see if the native extensions are compiled, fail to find that evidence, and I don't have a ready-to-go public repro repo, but this is the setup (forgive my sloppiness):
This problem, I imagine, happens much more frequently with My team still experienced a similar issue with For a quick workaround, just keep your old releases around on your servers (or at the very least the release directory for the version of your application where you first started the puma master). You can safely perform phased restarts as you'd expect. My team is investigating other solutions, but we're not far along in that process yet. We'll make sure to share what we find. |
Everyone having simlar issues should try master branch, as #1893 was merged and may help. |
I created a repo that demonstrates that this error occurs reliably if you're deleting old releases https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors The
The Workers still fail on startup, but with a different error
|
From the repro repo
Is that a problem Puma should solve? 🤔 |
@dentarg, don't you think so? So you mean that we will update puma for getting not just for better features but for better security in the future but in fact we will not be able to gracefully deploy due to this issue because it crashes. I believe this should be solved by Puma as soon as possible. At least a workaround. For now, I just forcefully reject pull requests for Puma. Update: I realized that previous quote seems mentioning about 3rd party native extension that is not related to puma. If this is the case, I believe if we can do phased-restart on unicorn and/or other web servers (and we do) we should be able to do so with puma too. |
@dentarg I'm not certain about this either. The bug might not even be with Puma itself--it might be a bug with bundler, for example. It just so happens that puma phased restarts combined with a deployment strategy that removes old releases exposes the bug. It's also a little frustrating, too, because if you were running puma 3.x (before the introduction of nio4r in 4.x), and you were removing old releases, that was probably fine since puma 3.x had no runtime dependencies on gems that had native extensions.
@gencer Puma doesn't owe you anything. Maintainers are often unpaid volunteers. If you can do phased restarts using other web servers, then you should use those other web servers. |
@cjlarose Neither puma nor its developers and also volunteers doesn't owe me anything. I am deeply sorry if I offended anyone. I didn't mean that way. You misunderstood me. I was just trying to say that it should be fixed on puma-gem (that's what i believe because other web servers doesnt have this issue and puma 3.x also don't have this.) not by puma developers. This could be of course "me" or someone who knows these things better than me. Please accept my apologies on this matter. |
This issue is very messy.
Yes, but @cjlarose noticed the issue with extension files, whether they be Puma dependencies (4.x only), or app dependencies (noticed with 3.x). So, it may not really be a '3.x is ok, 4.x is bad' problem. It may just appear that way because 4.x has an extension gem dependency of nio4r. At various times, there have issues in Ruby with symlinks and the absolute paths of symlinks, where code incorrectly expected the two to be equal. That's just Ruby, who knows about Bundler/RubyGems, especially when changing a symlink to affect what files a running Ruby instance uses. Two people have shown a log message of 'Could not find nio4r-2.5.1 in any of the sources'. That's a Bundler message, so, either Puma is misconfiguring Bundler, or Bundler is not functioning as expected. One thing that could be tried is changing: Line 6 in afb27d5
to:
the above isn't a solution, just trying to bypass Bundler's resolution to see if that's the issue. * submitted by a Puma volunteer |
More than one job failed with the same error on Travis. And, jobs using the same Rubies pass on Actions. Good find. I feel guilty, being one who often says "CI is more than green checks & red X's, look at the logs"... |
#1875 tracks the progress of a new architecture in puma that'll hopefully resolve this issue. |
#2374 is merged, which was a prerequisite for the changes I want to introduce to fix this issue. I've got a branch that fixes this issue. I'm working on fixing my tests for it. |
I opened #2407 which is a robust solution to the original problem described in this issue. It makes it so that the puma master process can use totally separate gems from the puma worker processes, so it's fine to remove I opened #2427 which fixes the For anyone else, I opened rubygems/rubygems#4004 which I think is the root cause of the issue in Bundler. If that issue is resolved, then it will be possible to perform a phased restart after removing compiled native extensions for any gem loaded by the puma master process, including |
@cjlarose you're my hero 🦸♂️ thank you so much. The bundler report was impeccable. I have only one question however: if bundler resolves this (rather than Puma works around it), would a phased restart also be able to update |
Happy to help!
Excellent question. I can provide additional clarification. Without a fix in Bundler for rubygems/rubygems#4004, but with the workaround in puma #2427, it'll be possible to perform phased restarts in a way that upgrades With a fix in Bundler for rubygems/rubygems#4004 and with the workaround in puma #2427 applied, of course that won't make the situation any different for To sum it all up, this my advice for running puma after #2427 is merged: if you perform phased restarts to upgrade your application served by puma (assuming you have
If you do have a deployment strategy that deletes the gems of old releases (assuming rubygems/rubygems#4004 is not fixed):
If you have a deployment strategy that deletes the gems of old releases (assuming rubygems/rubygems#4004 is fixed):
I think I'll go ahead and add this to the restart documentation for puma since it's important to operators to know when it's safe to perform a phased restart. Longer term, another potential idea is for puma to automatically perform a hot restart instead if it detects that it cannot safely perform a phased restart. |
This comment has been minimized.
This comment has been minimized.
Hey @cjlarose thanks so much for the input, and sorry for the delay. Was on holidays. I'm really keen to try to upgrade Puma now that #2427 is merged. Thanks for your awesome work and patience. I still have one question however about upgrading puma in future. Our setup does not involve any Hope I got things right, but want to fool-proof it in future as much as possible to avoid nasty surprises :) |
Released in 5.0.3. |
Because you're not using
This is definitely somewhere where puma is lacking. It'd be great if there was an option to On (proprietary) applications that I maintain, we have a kinda complicated start/stop script that basically does this by utilizing the I think |
Thanks again @cjlarose ! I'll check out pumactl and see if we can script something. And great to see #2465 ! hope there's some way I can help with this. I'm having trouble after upgrading to puma 5.0.4 unfortunately... I did a hot restart to make sure I'm running puma 5 but after a few deploys (with phased-restarts) I see these errors in my puma.log file and I think the paths are pointing to folders that were deleted... The gem it complains about wasn't actually updated recently.
Any tips on how to further troubleshot / investigate this? |
please let me know if I should open a separate issue for this, but it seems to keep happening for us after a hot restart and a few phased restarts with puma 5.0.4 |
@gingerlime I think you can open up a new issue as the issue is now about json and not nio4r |
Thanks @dentarg will do (it's related to phased-restarts though, but definitely looks like a different thing) |
see #2471 |
Thanks @cjlarose for all this work. I do have one question about what you said concerning puma_worker_killer
From what I understand if I want to use
It seems for linux we do not need to add ffi here as it is loaded by Is my understanding correct ? |
@alx75 I think your intuition is correct here. Since the I was able to test this reliably here: https://github.com/cjlarose/puma-phased-restart-could-not-find-gem-errors/tree/ffi |
Describe the bug
This seems to happen when doing phased-restarts using versioned deploy folders with symlinks.
We use a folder structure similar to this:
When we deploy, we increment the version, run bundle etc, then switch the symlink and do a phased-restart of puma.
All seems to work fine, but after a few deploys, we also start pruning older versions, and then we see this error in the puma.log
❗️ Note that the app now lives in
/app/app.d/909
but the errors are pointing to puma in/app/app.d/906
❗️Puma config:
If I do a full restart, then things are back to normal, until it happens again (we keep 3 deploys back, so after 4 deploys...)
To Reproduce
I was trying to reproduce this in a clean environment, but I can't manage to do it so far... :-/
Expected behavior
phased-restart should reload everything from the new folder... It does seem to point to the right folder, but there's some mysterious left-overs pointing to something old somewhere??
Server:
The text was updated successfully, but these errors were encountered: