New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Puma 5 Experimental Feature Reports #2258
Comments
Here's memory usage of a production Rails app on a large EC2 instance ( Triggering There might be different ways to use |
After fixing the config bug in nakayoshi_fork, Codetriage is now showing about a 10% reduction in memory usage 👍 |
After setting Edit: I went from Puma 4 directly to Puma 5 with |
No difference seen with |
Yeah, you probably won't see a big difference if utilization isn't higher than ~50-60%. |
We were looking at using Puma 5 and this feature because we have been seeing some odd perf issues related to Puma queues... I can provide more data and capture more metrics, but probably best to file that in a separate issue, it is safe to say the tail latency issues we have seen are not affected by these performance features. |
We attempted to use nakayoshi_fork today, and found some very strange behaviour, seemingly related to Most notably, we received an exception in https://github.com/rgeo/rgeo/blob/v2.1.1/lib/rgeo/wkrep/wkb_generator.rb#L159 where the exception raised by
It turns out that So.. the parameter passed to wkb_generator was part of some actionpack comment code from a related gem (!!!) This points at some sort of memory corruption - most likely related to the fact that rgeo uses an underlying c implementation: https://github.com/rgeo/rgeo/tree/master/ext/geos_c_impl |
@oskarpearson Sounds like a bug in GC.compact... @tenderlove how should people report that? Ruby tracker I would guess? |
In the @crimethinc website Rails app, the Puma 5 experimental features seem to have gained us ~2gb of memory back. BeforeAfterPast 72 hours |
@veganstraightedge Are you running all three? |
I tried the |
Yep! |
Hey everybody, we added this config and saw ~2% reduction. More info here. https://forem.dev/foremteam/results-of-implementing-puma-s-nakayoshifork-on-forem-3pdl Forem is open source at https://github.com/forem/forem so folks are welcome to propose other experiments that might be worth trying. |
We applied the For reference, we compute the BeforeAfter |
I captured a ~10 day view of memory usage in our app (tapology) before enabling two of these features. Then I turned on nakayoshi_fork and wait_for_less_busy_worker, and left it running for about the same amount of time. I can't control the scales of these charts, but on our two servers with puma workers, after ~10 days of uptime:
The 2 features seem to have had a positive impact but this is not a perfectly controlled experiment! Out of curiosity I'm actually considering turning the features off again and letting it go for a bit, to see if it clearly reverts to the prior level. |
Thanks! To update everyone on my current thinking:
|
Sharing my experience with enabling Shortly after enabling the config option in It follows the @nateberkopec claim of the |
Closing. We have removed Nakayoshi Fork, and |
Puma 6 dropped support for nakayoshi_fork (puma/puma#2258). GitLab has been running Puma 6 since GitLab 16.1 (https://gitlab.com/gitlab-org/gitlab/-/merge_requests/119135). Changelog: changed
Puma 5 contains 3 new experimental performance features:
fork_worker
option andrefork
command for reduced memory usage by forking from a worker process instead of the master process. (Experiment: Fork from existing worker process to optimize copy-on-write performance #2099) Additional docs on this feature are here. Intended result: If enabled, should reduce memory usage.wait_for_less_busy_worker
config. This may reduce latency on MRI through inserting a small delay before re-listening on the socket if worker is busy (Inject small delay for busy workers to improve requests distribution #2079). Intended result: If enabled, should reduce latency in high-load (>50% utilization) Puma clusters.nakayoshi_fork
config option. Reduce memory usage in preloaded cluster-mode apps by GCing before fork and compacting, where available. (Manually compact GC before fork #2093, Addednakayoshi_fork
option. #2256) Intended result: If enabled, should reduce memory usage.Note that all of these experiments are only for cluster mode Puma configs running on MRI.
Part of the reason we're calling them experimental is because we're not sure if they'll actually have any benefit. People's workloads in the real world are often not what we anticipate, and synthetic benchmarks are usually not of any help in figuring out if a change will be beneficial or not.
We do not believe any of the new features will have a negative effect or impact the stability of your application. This is either a "it works" or "it does nothing" experiment.
If any of the features turn out to be particularly beneficial, we may make them defaults in future versions of Puma.
If you are using any of the 3 new features, please post before and after results or screenshots to this issue. Note that "it didn't do anything" is still a useful report. Posting ~24 hours of "before" and ~24 hours of "after" data is preferred.
The text was updated successfully, but these errors were encountered: