Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Ruby 3.3's Process.warmup before fork #3304

Open
bensheldon opened this issue Jan 5, 2024 · 14 comments
Open

Support Ruby 3.3's Process.warmup before fork #3304

bensheldon opened this issue Jan 5, 2024 · 14 comments
Labels

Comments

@bensheldon
Copy link

Is your feature request related to a problem? Please describe.

Ruby 3.3 introduces Process.warmup which does the following:

  • Performs a major GC
  • Compacts the heap.
  • Promotes all surviving objects to the old generation.
  • Precomputes the coderange of all strings.
  • Frees all empty heap pages and increments the allocatable pages counter by the number of pages freed.
  • Invoke malloc_trim if available to free empty malloc pages.

The functionality is somewhat similar to the removed nakayoshi fork functionality, but (hopefully) performs better.

I'm imagining this would be exposed/implemented in a similar manner to the nakayoshi_fork DSL directive.

@dentarg
Copy link
Member

dentarg commented Jan 5, 2024

The functionality is somewhat similar to the #2925, but (hopefully) performs better.

Is it possible to find out? Can we construct benchmarks?

@nateberkopec
Copy link
Member

One of the reasons we removed nakayoshi fork was it's tendency to surface extremely difficult to debug problems in C-extensions due to compaction. I'm not sure that will have really improved since the time that the feature was considered.

@casperisfine
Copy link
Contributor

was it's tendency to surface extremely difficult to debug problems in C-extensions due to compaction.

That is indeed a problem. I (and others) fixed many compaction issues in popular extensions, but there is a long tail of potential problems in less popular ones.

I'm always happy to help debug and fix these, but as you mention, often the problem is to identify the extension responsible for it. In most case it's not too hard, but sometimes it's super tricky.

@bensheldon
Copy link
Author

I guess it's not much practical difference to do this:

before_fork { ::Process.warmup }

...versus having in the Puma DSL something like this:

ruby_process_warmup

Probably the vast majority of the value is having it officially documented and visible in Puma as something one could or should do. Though I totally understand if it creates an untenable support burden by surfacing other issues (though maybe something that could be addressed through documentation, though I also recognize that doesn't redirect everyone who might open an issue).

@rus-max
Copy link

rus-max commented Jan 28, 2024

sidekiq/sidekiq@f258fb2

@tomasv
Copy link

tomasv commented Mar 13, 2024

I have tested this in our pretty complex Rails application with Puma running in clustered mode and it led to segfaults. There's risk in making it a default. At the very least my suggestion is to keep it configurable unlike Sidekiq.

@rus-max
Copy link

rus-max commented Mar 13, 2024

No problem with

on_worker_fork do
  Process.warmup
end

@casperisfine
Copy link
Contributor

I have tested this in our pretty complex Rails application with Puma running in clustered mode and it led to segfaults

I know it's not always easy, but it would be very valuable to figure out which gems are impacted and report the issue upstream. If you have trouble identifying which gem, I'd be happy to look at your crash reports to try to help.

No problem with on_worker_fork

Doing this would entirely defeat the benefits of Process.warmup, it would even be counter-productive.

@tomasv
Copy link

tomasv commented Mar 25, 2024

I had some time to dig into this, I was able to reproduce with our codebase locally, but could not narrow down it entirely.

It crashes right on the Process.warmup. The C stack trace looks like this:

-- C level backtrace information -------------------------------------------
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_vm_bugreport+0xb4c) [0x101404c28]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_bug_for_fatal_signal+0x100) [0x101247ebc]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(sig_do_nothing+0x0) [0x10136c840]
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x38) [0x18d67f584]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(coderange_scan+0x34) [0x101393c5c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_enc_str_coderange+0x60) [0x10137b904]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(gc_set_candidate_object_i+0x68) [0x1012699ac]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(objspace_each_objects_try+0x2f0) [0x101278ee4]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_ensure+0xcc) [0x1012544cc]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_gc_prepare_heap+0xa8) [0x101269458]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(proc_warmup+0x34) [0x101318b60]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_call_cfunc_with_frame_+0xf0) [0x1013f791c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_exec_core+0x2048) [0x1013dd674]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_vm_exec+0x3d8) [0x1013da6a4]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(invoke_block_from_c_bh+0x368) [0x1013ff09c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_yield+0xa8) [0x1013e881c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_ary_each+0x40) [0x1011af43c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_call_cfunc_with_frame_+0xf0) [0x1013f791c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_exec_core+0x1df8) [0x1013dd424]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_vm_exec+0x3d8) [0x1013da6a4]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(load_iseq_eval+0x1fc) [0x1012b5a18]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(require_internal+0x374) [0x1012b3914]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_require_string+0x68) [0x1012b2c74]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_call_cfunc_with_frame_+0xf0) [0x1013f791c]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_call_alias+0x70) [0x1013f3660]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(vm_exec_core+0x2048) [0x1013dd674]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_vm_exec+0x1ec) [0x1013da4b8]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(rb_ec_exec_node+0xa0) [0x101253644]
/Users/tomas/.rbenv/versions/3.3.0/lib/libruby.3.3.dylib(ruby_run_node+0x60) [0x10125353c]
/Users/tomas/.rbenv/versions/3.3.0/bin/ruby(main+0x68) [0x100747f24]

In our before_fork we have some warm up logic that triggers certain Rails controller actions:

before_fork do
  StartupWarmup.warmup_puma
  ApplicationRecord.connection.disconnect!
  Process.warmup
end

Skipping StartupWarmup.warmup_puma or moving Process.warmup before it actually solves the issue. It invokes several Rails controller actions to warm up caches. Unfortunately I cannot share the actual code, I will share findings if I dig out something more.

I think for simple Rails apps this is not an issue, and if your before_fork is not doing anything complex then the segfault is probably low risk.

@byroot
Copy link
Contributor

byroot commented Mar 25, 2024

coderange_scan: this means it's crashing while trying to read the content of a String. Which means some extension created a corrupted RString (pointing to an invalid memory region).

That's 100% a bug in a C extension. The hard part is to track which one. A start is to grep with something like:

$ rg -F rb_str_ %(bundle show --paths)

There is a couple C API that allow creating such broken string, the most common is rb_str_new_static.

If you share the result of that search in a gist, I'd be happy to see if I can figure out which one it is.

@tomasv
Copy link

tomasv commented Mar 27, 2024

Thanks to your tip I managed to narrow it down and find the gem with the issue. It's memcached. This gem is abandoned and we use our own fork with extra fixes. We're not on dalli because we have an mcrouter cluster of memcached instances, and unfortunately it only works with the text protocol of Memcached.

So the good news is that most likely no one else is using this gem and shouldn't encounter this issue. 😄

@byroot
Copy link
Contributor

byroot commented Mar 27, 2024

Ah, that rings a bell. 99% sure I fixed that bug not so long ago, let me look.

@byroot
Copy link
Contributor

byroot commented Mar 27, 2024

Alright, I'm bad at writing commit messages, so I can't be fully certain, but I think this was the fix: Shopify/memcached@28b5840

But overall you can use this branch, we use it with Process.warmup and even reforking without issue. So if the issue persists it's either not in memcached or it's in a area of memcached you use and we don't.: https://github.com/Shopify/memcached/commits/1-0-stable-shopify/

@tomasv
Copy link

tomasv commented Mar 30, 2024

@byroot I can confirm that with your fork there is no issue. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants