Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puma 5 Experimental Feature Reports #2258

Closed
nateberkopec opened this issue May 12, 2020 · 19 comments
Closed

Puma 5 Experimental Feature Reports #2258

nateberkopec opened this issue May 12, 2020 · 19 comments

Comments

@nateberkopec
Copy link
Member

nateberkopec commented May 12, 2020

Puma 5 contains 3 new experimental performance features:

Note that all of these experiments are only for cluster mode Puma configs running on MRI.

Part of the reason we're calling them experimental is because we're not sure if they'll actually have any benefit. People's workloads in the real world are often not what we anticipate, and synthetic benchmarks are usually not of any help in figuring out if a change will be beneficial or not.

We do not believe any of the new features will have a negative effect or impact the stability of your application. This is either a "it works" or "it does nothing" experiment.

If any of the features turn out to be particularly beneficial, we may make them defaults in future versions of Puma.

If you are using any of the 3 new features, please post before and after results or screenshots to this issue. Note that "it didn't do anything" is still a useful report. Posting ~24 hours of "before" and ~24 hours of "after" data is preferred.

@nateberkopec
Copy link
Member Author

Codetriage memory before/after adding nakayoshi_fork seems more or less unaffected
Screen Shot 2020-05-12 at 12 50 38 PM

Codetriage initially had a wee latency spike (2nd to last deploy on the right here) but since calmed down, I'm going to chalk it up to random noise or possibly a bad neighbor (it runs on Heroku 1x dynos)
Screen Shot 2020-05-12 at 12 52 41 PM

@wjordan
Copy link
Contributor

wjordan commented May 15, 2020

Here's memory usage of a production Rails app on a large EC2 instance (m5.12xlarge = 48 vCPUs / 192GB RAM), testing out the fork_worker option (with the fix listed in #2267) comparing against another baseline server running the same workload:

image

Triggering refork at 12:36 (and at 12:30 for a smaller app hosted on the same server) resulted in a ~20-30% reduction in memory utilization (starting at ~30%, narrowing to ~17% below baseline after about an hour). Triggering again at 14:15 reduces the memory utilization again, which starts to increase again at roughly the same pace over the next 45min.

There might be different ways to use refork to optimize memory utilization- invoke once after n requests for a baseline improvement (that may level off a bit over time as ongoing GC eats away at the copy-on-write savings), or invoke frequently: after every n requests, or whenever utilization exceeds a target threshold, or whenever the server load dips below a target threshold. Since refork is a relatively lightweight operation (compared to a full app reload), depending on the workload the performance impact might actually be minimal to run it rather frequently (kind of like a server-wide GC operation).

@nateberkopec
Copy link
Member Author

After fixing the config bug in nakayoshi_fork, Codetriage is now showing about a 10% reduction in memory usage 👍

@fluke
Copy link

fluke commented Jun 6, 2020

After setting nakayoshi_fork(true) on Puma 5.0.0.beta1 I'm seeing a ~17% reduction in memory. I'm able to run another worker within the same Heroku 2x dyno.

Edit: I went from Puma 4 directly to Puma 5 with nakayoshi_fork so some of that might just be from the version upgrade.

@danmayer
Copy link

danmayer commented Jul 9, 2020

No difference seen with wait_for_less_busy_worker but CPU utilization was already less than 40%... So load might not have been high enough.

@nateberkopec
Copy link
Member Author

Yeah, you probably won't see a big difference if utilization isn't higher than ~50-60%.

@danmayer
Copy link

We were looking at using Puma 5 and this feature because we have been seeing some odd perf issues related to Puma queues... I can provide more data and capture more metrics, but probably best to file that in a separate issue, it is safe to say the tail latency issues we have seen are not affected by these performance features.

@oskarpearson
Copy link
Contributor

oskarpearson commented Sep 25, 2020

We attempted to use nakayoshi_fork today, and found some very strange behaviour, seemingly related to
https://github.com/rgeo/activerecord-postgis-adapter and rgeo-activerecord and the rgeo gem.

Most notably, we received an exception in https://github.com/rgeo/rgeo/blob/v2.1.1/lib/rgeo/wkrep/wkb_generator.rb#L159 where the exception raised by raise Error::ParseError, "Unrecognized Geometry Type: #{type}" was this bizarre string:

Unrecognized Geometry Type: # this method if you wish to change how action methods are called,

It turns out that # this method if you wish to change how action methods are called, is from https://github.com/rails/rails/blob/070d4afacd3e9721b7e3a4634e4d026b5fa2c32c/actionpack/lib/abstract_controller/base.rb#L199

So.. the parameter passed to wkb_generator was part of some actionpack comment code from a related gem (!!!)

This points at some sort of memory corruption - most likely related to the fact that rgeo uses an underlying c implementation: https://github.com/rgeo/rgeo/tree/master/ext/geos_c_impl

@nateberkopec
Copy link
Member Author

@oskarpearson Sounds like a bug in GC.compact... @tenderlove how should people report that? Ruby tracker I would guess?

@veganstraightedge
Copy link

veganstraightedge commented Oct 27, 2020

In the @crimethinc website Rails app, the Puma 5 experimental features seem to have gained us ~2gb of memory back.

Before

97143045-278e6300-171f-11eb-9bde-265a3250eb7d

After

97260105-80bacd00-17d9-11eb-90f7-b339d2572b73

Past 72 hours

97260213-b9f33d00-17d9-11eb-890e-cd4cdd2c03de

@nateberkopec
Copy link
Member Author

@veganstraightedge Are you running all three?

@h0jeZvgoxFepBQ2C
Copy link

I tried the fork_worker option on one of our production machines and it's really really really really amazing. I see the normal CPU spike for loading the first worker - and then no CPU spikes anymore, just the log entries telling me the workers are starting. I tried to load 8 puma processes on a 2 CPU machine, and it worked fine without any problem and perfect performance (before fork_worker this would have blocked the whole machine for 3-4 minutes). Amazing, thank you so much for this ❤️

@veganstraightedge
Copy link

@veganstraightedge Are you running all three?

Yep!

https://github.com/crimethinc/website/pull/1829/files

@benhalpern
Copy link

Hey everybody, we added this config and saw ~2% reduction. More info here.

https://forem.dev/foremteam/results-of-implementing-puma-s-nakayoshifork-on-forem-3pdl

Forem is open source at https://github.com/forem/forem so folks are welcome to propose other experiments that might be worth trying.

@fdelache
Copy link

We applied the wait_for_less_busy_worker configuration on one of Shopify's webhook proxy application.
It successfully reduced the puma utilization spikes we had (it was often going above 80% utilization for the same volume of incoming requests).

For reference, we compute the puma utilization as ( ( puma_running - puma_pool_capacity ) + puma_backlog ) / puma_max_thread

Before

image

After

See the different scale for the Y-axis
image

@gsaks123
Copy link

gsaks123 commented Jul 2, 2021

I captured a ~10 day view of memory usage in our app (tapology) before enabling two of these features. Then I turned on nakayoshi_fork and wait_for_less_busy_worker, and left it running for about the same amount of time.

I can't control the scales of these charts, but on our two servers with puma workers, after ~10 days of uptime:

  • Server 1 before was 37.6% memory usage, Server 1 after was at 33.7% memory usage
  • Server 2 before was 39.3% memory usage, Server 2 after was at 34.7% memory usage

The 2 features seem to have had a positive impact but this is not a perfectly controlled experiment! Out of curiosity I'm actually considering turning the features off again and letting it go for a bit, to see if it clearly reverts to the prior level.

Server 1 Before:
Front End 1 Before

Server 1 After:
Front End 1 After

Server 2 Before:
Front End 2 Before

Server 2 After:
Front End 2 After

@nateberkopec
Copy link
Member Author

Thanks! To update everyone on my current thinking:

  1. wait_for_less_busy_worker is all upside with no downside. It should become the default in Puma 6.
  2. refork needs more fleshing out, but looks promising.
  3. nakayoshi_fork has very minimal gain and can trigger nasty bugs in c-extensions due to usage of GC.compact. Considering removal in Puma 6.

@sbocinec
Copy link

Sharing my experience with enabling nakayoshi_fork after upgrading our RoR app to puma v5 (5.2.2), running with ruby 2.7.3:

Shortly after enabling the config option in production puma config we started receiving NotImplementedError exception with method 'new_with_args' called on unexpected T_IMEMO object (0x00005643e8e79208 flags=0x1607a) error message sporadically when running MySQL ActiveRecord create queries (using the latest available mysql gem v0.5.3). After disabling the config option 7 days ago, so far no more exception like this reported.

It follows the @nateberkopec claim of the nakayoshi_fork might be triggering bugs in C-extensions when GC.compact is executed.

@nateberkopec
Copy link
Member Author

Closing. We have removed Nakayoshi Fork, and wait_for_less_busy_worker will be made the default in Puma 6.

maxlazio pushed a commit to gitlabhq/omnibus-gitlab that referenced this issue Sep 11, 2023
Puma 6 dropped support for nakayoshi_fork
(puma/puma#2258). GitLab has been running
Puma 6 since GitLab 16.1
(https://gitlab.com/gitlab-org/gitlab/-/merge_requests/119135).

Changelog: changed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests