New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Preloader" process for preload + rolling restart #1875
Comments
@cjlarose kind of reminds me of the puma-wild proposal you made. |
Yeah, I see the connection. In #2018 (comment) I described a way to separate the gems of the puma master process from those of the workers in a way that was more robust than the existing The comments in #1861 describe a way to leverage the benefit of In that new architecture, fixing the kinds of bugs related to #2018 would only require a small change: Whenever the puma master forks to form the "stem-cell" worker-generator process, it must immediately The architecture proposed in my comments in #2018 doesn't require the new "stem-cell"/worker-generator process idea, but is definitely compatible with it. This strategy seems like a good way forward. I might have some time to work on this in the next couple of days if no one's actively working on it already. |
Just confirming no one in Puma core has spiked this yet. |
I was able to spend some time on this during a company hackathon. Early results are great: I was able to confirm that with the new architecture, it's possible to support both The code's not great and it's not ready for review, but it's here: cjlarose@a1f7c11 It doesn't yet From here, I'll spend some time cleaning up the code, breaking up the change into smaller, easier-to-review commits (maybe even separate PRs), fixing tests, writing new tests, and all that jazz. |
Thanks for your efforts here @cjlarose! |
I noticed there's some behavior of I opened up #2165 to backfill tests related to |
Worth mentioning that #2099 is an existing solution to this issue. The The tiny difference between a 'refork' and a Lines 420 to 423 in 53839f9
|
No, according to the Docker test in |
Thanks for clarifying @wjordan !! much appreciated. |
Sorry to nag, but is there any way to help move this forward? We're not deeply familiar with Puma unfortunately, but if there's something we can do to help, we'd like to at least try :) we're getting a bit worried that we're still "stuck" on 3.x and unable to upgrade to 4.x due to issues with the phased restarts... |
I finally got around to taking a look at implementing When the puma master process creates worker 0, worker 0 The Docker test in cjlarose/puma-phased-restart-could-not-find-gem-errors passes with my change. I've got to clean it up and make sure I didn't break anything critical, and add tests and everything, but at least we know the idea is sound. |
Thank you @cjlarose. I'm so happy to hear. With the release of 5.0 I was really starting to get nervous that we're still stuck on 3.x. Is there anything we can do to support you to move this forward? side note / question: Given the little perceived interest in #2018 I am wondering if we're "doing it wrong" in some way with Puma and our deployment process? We're still using symlinks and phased restarts to achieve zero downtime deploys. I always assumed this is the Rails way of deploying, but things are obviously moving to docker, kubernetes and other methods I guess. Should we consider moving along to something better? if so, is there a new recommended way to deploy Rails these days? |
I think when we get a PR merged, it'd be great if you could deploy it and see if it fixes problems for your application. That'd help make sure we got it!
The product I work on is in a similar spot. We use the same deployment strategy (I think this matches the behavior of capistrano, though we don't use capistrano anymore). I think that deployment strategy makes sense for applications deployed on bare metal or onto VMs (such as in EC2) that you keep around between releases (this is sometimes called mutable infrastructure). I think you're right though that one reason why phased restarts might get less attention these days is because folks are deploying puma applications using container orchestration. If you use kubernetes or ECS or any other container orchestrator, phased restarts are irrelevant because you can more easily handle restarts and rollouts by spinning up new containers and killing the old ones. I don't have the numbers on what deployment environment is most popular for puma apps these days, but I my guess is that there are still many applications like yours that use a more traditional deployment strategy. Puma still supports features that really only make sense for mutable infrastructure (hot restarts, phased restarts, and re-evaluating the release directory symlink on restart), and to my knowledge, Puma has hasn't expressed any interest in deprecating those features.
I don't think I made it clear in #2018, but my team did upgrade to puma 4.x despite the nio4r issue. We changed our deployment process to stop deleting old release directories. We added monitoring to make sure we don't exceed the disk space on our VMs. Our autoscaling rules basically guarantee that no VM lives longer than 24 hours, so as long as we don't deploy so many releases in that window that we exceed the available space on disk, we're good: phased restarts work reliably. Clearly, it's not ideal and I don't think puma users should be expected to do the same. And even with these practices in place, we still can't, for example, upgrade |
Thank you so much for your thoughtful response, @cjlarose.
Absolutely. I hope to also test the PR on our staging environment next week and report back on what we see. As far as how people are deploying, I also wonder. I would have thought that if this method was very prevalent, then there would be many people commenting and reporting a similar problem. I don't know how to interpret the relative "quiet". We're also using a capistrano-like process, without using capistrano. However, our VM is much longer-lived, and we don't have any auto-scaling processes to clean it up regularly unfortunately. We also run on only one web server, so we can't even rely on load balancing to do a rolling restart... So we have to rely on the folder cleanup and phased restarts... It's definitely not cutting-edge, but are we becoming old fossils? We do use docker for development, and it's great. But it still feels like a big leap to switch to kubernetes or something else. That's why I'm curious to understand what other people are using and perhaps also learning about a migration path forward. Thanks for sharing your setup, Chris. I do hope that Puma continues to support this use-case, although I guess there's an increasingly stronger argument to rely on generic orchestration tools for continuous deployment, and it can help simplify the puma code base and reduce its surface area. |
I just wanted to point out that some people like my company are totally in this situation (standard stuff: capistrano, bare metal, in need for reliable and fast rolling restart for 40+ processes on a single server). We are currently stuck on 3.x for the same reasons and are looking forward to try 5.x, just didn't feel the need to comment here as it wouldn't bring any new information, I don't want to put more pressure on the dev team who is doing what they can. So I just subscribed and ":+1:". |
Hi @jarthod and thank you. Yes, you're right. It's always hard to strike a balance between too many +1 and "me too" posts that don't add value to the discussion, and not really knowing how many people are affected by an issue :) I didn't mean to imply that I encourage lots of noise, but just that I didn't know if this was an important issue affecting many or not. |
#1861 (comment)
The text was updated successfully, but these errors were encountered: