Use "interned" `String`s when possible to avoid uneccessary heap allocation. #276

okeeblow · 2021-07-20T04:56:25Z

Fixes #275

Background info: https://en.wikipedia.org/wiki/String_interning
Available to C extensions since MRI Ruby 3.0:
- https://bugs.ruby-lang.org/issues/13381
- https://bugs.ruby-lang.org/issues/16029

This change makes Ox use "interned" (frozen and deduplicated) Strings anywhere possible,
the same effect as calling String#-@ except without the heap churn of RVALUE allocation:
https://ruby-doc.org/core/String.html#method-i-2B-40

Uses the HAVE_<funcname> preprocessor macros to detect this functionality
and avoid breaking older Rubies (confirmed with at least MRI 2.7.2).
I explicitly use interned Strings anywhere an allocated String seemed
entirely internal to Ox, e.g. in sax_value_as_time when allocating
the String argument to ox_time_class.
Adds a new user-facing option intern_strings to control the behavior
of String return values from user-facing functions, e.g. from sax_as_s.
Results in a ~25% reduction in startup GC pressure in my library! :D

Inspired by similar changes to json and msgpack:

My goal is Object allocation reduction in the Ox::Sax handler of my MIME Type
file-identification library caused by the large number of duplicate Strings in the
shared-mime-info-format XML it uses as a data source.

I was already calling #-@ (or using the -some_str_variable syntax) everywhere possible
in my handler, but that only freezes and deduplicates an already-allocated String:

  irb(main):068:0> oid = proc { puts "#{_1} is object_id #{_1.object_id}" }
  => #<Proc:0x0000564d53eb6850 (irb):66>
  irb(main):069:0> lol = 'lol'.tap(&oid).-@.tap(&oid)
  lol is object_id 19800
  lol is object_id 19740
  => "lol"
  irb(main):070:0> -lol.tap(&oid).-@.tap(&oid)
  lol is object_id 19740
  lol is object_id 19740
  => "lol"

That avoids duplicate retained Strings but still causes my library to take
a big GC hit right at startup when it loads its data.

All the stats below were collected using SamSaffron's memory_profiler:
https://github.com/SamSaffron/memory_profiler

First, a sanity-check to make sure I didn't break MRI < 3.0,
using MRI 2.7 + latest official Ox from RubyGems (2.14.5):

  [okeeblow@emi#CHECKING-YOU-OUT] ruby -v
  ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]

  [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total
  Total allocated: 20561351 bytes (401372 objects)
  Total retained:  1680971 bytes (22102 objects)

versus same MRI 2.7 but with my patched Ox:

  [okeeblow@emi#CHECKING-YOU-OUT] gem install ../../ox/ox-2.14.5.gem
  [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total
  Total allocated: 20561111 bytes (401366 objects)
  Total retained:  1680971 bytes (22102 objects)

No difference between the two, showing that this patch is a no-op for pre-3.0.
Sorry I did not test older versions of MRI or other Rubies like J/Truffle/etc
since I don't have them on my system.

Now the real gainz can be seen when comparing MRI 3.0 with unpatched Ox:

  [okeeblow@emi#CHECKING-YOU-OUT] bundle install
  Fetching gem metadata from https://rubygems.org/......
  …
  Installing ox 2.14.5 with native extensions
  Using checking-you-out 0.7.0 from source at `.`
  Bundle complete! 11 Gemfile dependencies, 19 gems now installed.
  [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total
  Total allocated: 20081080 bytes (390133 objects)
  Total retained:  1713209 bytes (22095 objects)

against same MRI 3.0 with my patched Ox:

  [okeeblow@emi#CHECKING-YOU-OUT] gem install ../../ox/ox-2.14.5.gem
  Building native extensions. This could take a while...
  Successfully installed ox-2.14.5
  Parsing documentation for ox-2.14.5
  unknown encoding name ""UTF-8"" for README.md, skipping
  Installing ri documentation for ox-2.14.5
  Done installing documentation for ox after 0 seconds
  1 gem installed

  [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total
  Total allocated: 17298804 bytes (322860 objects)
  Total retained:  1713202 bytes (22073 objects)

against same MRI 3.0, my patched Ox, and Ox.parse_sax(intern_strings: true):

  [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total
  Total allocated: 16414938 bytes (301382 objects)
  Total retained:  1713226 bytes (22073 objects)

This shows that just having Ruby 3.0 available results in a huge win,
and opting into immutable Strings from Value#as_s helps me even more!

Unit tests pass:

  [okeeblow@emi#test] ruby tests.rb
  Loaded suite tests
  Started
  .................
  16 tests, 40 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
  100% passed
  .....
  18 tests, 42 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
  100% passed
  ...............................................................................................................................
  151 tests, 249 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
  100% passed

Apologies for possible indentation issues in this patch. There seem to be lots of existing
lines with a mix of tabs and spaces — nbd, it happens — so I tried to match the surrounding
lines everywhere I made an addition to make sure things line up and look okay in git diff.

I only use Ox for Sax parsing, not for marshalling, so please scrutinize my changes
in gen_load.c and friends extra hard in case I omitted any intern_strings option
checks that could result in an unwary user getting an immutable String where
there wasn't one before. The one change I had to make to a String#force_encoding-using
test case is an example of what I want to avoid surprising anyone with.

…cation. Fixes ohler55#275 - Background info: https://en.wikipedia.org/wiki/String_interning - Available to C extensions since MRI Ruby 3.0: - https://bugs.ruby-lang.org/issues/13381 - https://bugs.ruby-lang.org/issues/16029 This change makes `Ox` use "interned" (frozen and deduplicated) `String`s anywhere possible, the same effect as calling `String#-@` except without the heap churn of `RVALUE` allocation: https://ruby-doc.org/core/String.html#method-i-2B-40 - Uses the `HAVE_<funcname>` preprocessor macros to detect this functionality and avoid breaking older Rubies (confirmed with at least MRI 2.7.2). - I explicitly use interned `String`s anywhere an allocated `String` seemed entirely internal to `Ox`, e.g. in `sax_value_as_time` when allocating the `String` argument to `ox_time_class`. - Adds a new user-facing option `intern_strings` to control the behavior of `String` return values from user-facing functions, e.g. from `sax_as_s`. - Results in a ~25% reduction in startup GC pressure in my library! :D Inspired by similar changes to `json` and `msgpack`: - flori/json#451 - msgpack/msgpack-ruby#196 My goal is `Object` allocation reduction in the `Ox::Sax` handler of my MIME Type file-identification library caused by the large number of duplicate `String`s in the `shared-mime-info`-format XML it uses as a data source. I was already calling `#-@` (or using the `-some_str_variable` syntax) everywhere possible in my handler, but that only freezes and deduplicates an already-allocated `String`: irb(main):068:0> oid = proc { puts "#{_1} is object_id #{_1.object_id}" } => #<Proc:0x0000564d53eb6850 (irb):66> irb(main):069:0> lol = 'lol'.tap(&oid).-@.tap(&oid) lol is object_id 19800 lol is object_id 19740 => "lol" irb(main):070:0> -lol.tap(&oid).-@.tap(&oid) lol is object_id 19740 lol is object_id 19740 => "lol" That avoids duplicate retained `String`s but still causes my library to take a big GC hit right at startup when it loads its data. All the stats below were collected using SamSaffron's `memory_profiler`: https://github.com/SamSaffron/memory_profiler First, a sanity-check to make sure I didn't break MRI < 3.0, using MRI 2.7 + latest official `Ox` from RubyGems (2.14.5): [okeeblow@emi#CHECKING-YOU-OUT] ruby -v ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux] [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total Total allocated: 20561351 bytes (401372 objects) Total retained: 1680971 bytes (22102 objects) versus same MRI 2.7 but with my patched `Ox`: [okeeblow@emi#CHECKING-YOU-OUT] gem install ../../ox/ox-2.14.5.gem [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total Total allocated: 20561111 bytes (401366 objects) Total retained: 1680971 bytes (22102 objects) No difference between the two, showing that this patch is a no-op for pre-3.0. Sorry I did not test older versions of MRI or other Rubies like J/Truffle/etc since I don't have them on my system. Now the real gainz can be seen when comparing MRI 3.0 with unpatched `Ox`: [okeeblow@emi#CHECKING-YOU-OUT] bundle install Fetching gem metadata from https://rubygems.org/...... … Installing ox 2.14.5 with native extensions Using checking-you-out 0.7.0 from source at `.` Bundle complete! 11 Gemfile dependencies, 19 gems now installed. [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total Total allocated: 20081080 bytes (390133 objects) Total retained: 1713209 bytes (22095 objects) against same MRI 3.0 with my patched `Ox`: [okeeblow@emi#CHECKING-YOU-OUT] gem install ../../ox/ox-2.14.5.gem Building native extensions. This could take a while... Successfully installed ox-2.14.5 Parsing documentation for ox-2.14.5 unknown encoding name ""UTF-8"" for README.md, skipping Installing ri documentation for ox-2.14.5 Done installing documentation for ox after 0 seconds 1 gem installed [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total Total allocated: 17298804 bytes (322860 objects) Total retained: 1713202 bytes (22073 objects) against same MRI 3.0, my patched `Ox`, and `Ox.parse_sax(intern_strings: true)`: [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total Total allocated: 16414938 bytes (301382 objects) Total retained: 1713226 bytes (22073 objects) This shows that just having Ruby 3.0 available results in a huge win, and opting into immutable `String`s from `Value#as_s` helps me even more! Unit tests pass: [okeeblow@emi#test] ruby tests.rb Loaded suite tests Started ................. 16 tests, 40 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed ..... 18 tests, 42 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed ............................................................................................................................... 151 tests, 249 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed Apologies for possible indentation issues in this patch. There seem to be lots of existing lines with a mix of tabs and spaces — nbd, it happens — so I tried to match the surrounding lines everywhere I made an addition to make sure things line up and look okay in `git diff`. I only use `Ox` for Sax parsing, not for marshalling, so please scrutinize my changes in `gen_load.c` and friends extra hard in case I omitted any `intern_strings` option checks that could result in an unwary user getting an immutable `String` where there wasn't one before. The one change I had to make to a `String#force_encoding`-using test case is an example of what I want to avoid surprising anyone with.

okeeblow · 2021-07-20T05:04:42Z

For context this is my Ox::Sax handler class: https://github.com/okeeblow/DistorteD/blob/master/CHECKING-YOU-OUT/lib/checking-you-out/ghost_revival/mr_mime.rb

And this is how I am collecting these stats: https://github.com/okeeblow/DistorteD/blob/master/CHECKING-YOU-OUT/bin/are-we-unallocated-yet

ohler55 · 2021-07-20T23:14:06Z

ext/ox/gen_load.c

 	    rb_hash_aset(ah, rstr, rb_str_new2(attrs->value));
+#if HAVE_RB_INTERNED_STR_CSTR
+	    }
+#endif


The whole section here with the various #if checks is next to impossible to read. I think the volatile VALUE rstr statements are not used in the set and would expect a compile error under the right set of conditions. Hard to tell though. How about tolerating some code duplication in favour of readability?

ohler55 · 2021-07-20T23:16:24Z

ext/ox/gen_load.c


 #if HAVE_RB_ENC_ASSOCIATE
    if (0 != pi->options->rb_enc) {
        rb_enc_associate(s, pi->options->rb_enc);
    }
+#endif
+#if HAVE_RB_ENC_INTERNED_STR_CSTR && HAVE_RB_INTERNED_STR_CSTR
+    }


Similar comment to the above here. Don't cross if/else with #if conditionals. I know I might have violated that in some places but I have been fixing them as I run into them.

I see the same code duplicated many times. Can this be a put into a function?

ohler55 · 2021-07-20T23:19:52Z

ext/ox/gen_load.c

-    VALUE       s = rb_str_new2(name);
-    VALUE       c = Qnil;
+    VALUE       s;
+    VALUE       c;


Please initialize with Qnil. If not interned and content is NULL then c will not be set.

ohler55 · 2021-07-20T23:29:29Z

I am excited about using interned strings. Great that you got this started. I'd like to see some code changes before merging though.

It might be a little too extreme to internal all strings when a single options is set. For keys it often makes sense to intern but string values are different in that some a very long and really should not be interned. I added a cache (intern) feature to Oj that I will change to use the ruby intern functions and in that I have two options. One for keys and one for other strings. The other strings option is a number size limit. So for example if the intern limit is 10 then only strings less than 10 characters are cached. That might be better for Ox as well.

If you are willing to make those changes that would be great. If you would prefer I helped and made some of the changes I can join you on the MR edits. It's up to you.

okeeblow · 2021-07-21T00:39:07Z

Thank you for the quick turnaround! It will probably be a few days before I get back to working on this, but that all sounds very reasonable.

Would you prefer additional commits fixing up these issues, or amending/squashing and force-pushing my branch so there's only one commit?

How do the names intern_string_keys and intern_strings sound for splitting that option? That fits the same naming convention as symbolize_keys / symbolize but might be better to explicitly call them _values for situations when both options could apply. I assume the symbolize options would take precedence in both cases.

Perhaps intern_strings (intern_string_values?) could be a multiple-choice option like skip instead of just Yes/No. I don't want to do anything a user couldn't override, but that is a good point about long Strings. I don't have many of those in my data and didn't consider it, so maybe something like intern_string_values [:none, :embedded, :all, <length_int>]?

I would try to come up with a more intuitive name than embedded, but that's what MRI's codebase calls things that fit entirely inside a single RVALUE. Seems like a good boundary for an intern/don't-intern decision anyway. That would be 23 characters on 32-bit systems and 40 (iirc?) characters on 64-bit.

ohler55 · 2021-07-21T01:17:57Z

No rush.

How about simply intern_keys and intern_strings or maybe intern_string_limit? Although maybe _values is better. In any case the docs will cover that. I think two options one a boolean and one a length limit is better than trying to pack that into one option.

As for a hard coded limit I think that misses the point. Caching a dictionary of strings less than 40 characters means there would be whole lot of memory used for interned strings. Better to let the user/developer decide what make sense for their situation.

ohler55 · 2021-07-23T00:05:08Z

Thought you might be interested in some benchmarking I just tried out. In the Oj project, new-parser branch I've been writing a new parser and have been using my own cache (interned) string function and comparing it to a previous version of the parser. Most of the time is spent in the Ruby calls. The file is test/perf_parser.rb. Anyway it runs the old SAJ callback parser against the new one. The new one has a cache option. So without the cache options the new one is 15% faster. With the caching it is 25% faster. Now using rb_enc_interned_str_cstr() instead of my own caching the new parser is only 3% faster, slower than without the cache even.

I was planning on getting to Ox after finishing with Oj but if you want to try the Oj hash implementation with Ox that might give an even greater performance boost. Are you interested in working with me on that?

okeeblow · 2021-07-23T23:22:07Z

That's a promising result. I've admittedly never used Oj but will take a look and familiarize myself with that codebase assuming your new-parser branch is online. I'd be interested to compare the memory consumption too and not just the parsing speed, since that is the resource I'm most trying to optimize right now in my own library. I'm currently allocating ~12MiB despite retaining only ~1.5MiB, but I've at least managed to get that down from over 20.

ohler55 · 2021-07-24T01:05:05Z

I think the Oj hash will show memory savings as well. You would have to test that of course though.

ohler55 · 2021-08-28T00:00:03Z

I haven't forgotten this PR. I am just finishing up a cache for Oj that looks like something worth copying over to Ox. Some changes will be needed to support multiple encodings but it performs better than the Ruby intern so will use the Oj cache instead. Like Oj, Ox will need options to intern or not as with interning on there there is a performance penalty when all new keys and strings are encounter.

okeeblow · 2021-09-01T21:51:25Z

Please take as much time as you like. I haven't had time to come back to this and improve the readability of my patch but also haven't forgotten about that :)

ohler55 · 2021-09-01T22:28:14Z

I have started and realized the current cache has some issue that will be fixed with a new approach. I'll post here when I make progress.

ohler55 · 2022-02-08T01:25:06Z

A new set of intern functions were added that interns strings in a flexible cache. This PR is no longer relevant.

ohler55 requested changes Jul 20, 2021

View reviewed changes

ohler55 closed this Feb 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use "interned" `String`s when possible to avoid uneccessary heap allocation. #276

Use "interned" `String`s when possible to avoid uneccessary heap allocation. #276

okeeblow commented Jul 20, 2021

okeeblow commented Jul 20, 2021

ohler55 Jul 20, 2021

ohler55 Jul 20, 2021

ohler55 Jul 20, 2021

ohler55 commented Jul 20, 2021

okeeblow commented Jul 21, 2021

ohler55 commented Jul 21, 2021

ohler55 commented Jul 23, 2021

okeeblow commented Jul 23, 2021

ohler55 commented Jul 24, 2021

ohler55 commented Aug 28, 2021

okeeblow commented Sep 1, 2021

ohler55 commented Sep 1, 2021

ohler55 commented Feb 8, 2022

Use "interned" Strings when possible to avoid uneccessary heap allocation. #276

Use "interned" Strings when possible to avoid uneccessary heap allocation. #276

Conversation

okeeblow commented Jul 20, 2021

okeeblow commented Jul 20, 2021

ohler55 Jul 20, 2021

Choose a reason for hiding this comment

ohler55 Jul 20, 2021

Choose a reason for hiding this comment

ohler55 Jul 20, 2021

Choose a reason for hiding this comment

ohler55 commented Jul 20, 2021

okeeblow commented Jul 21, 2021

ohler55 commented Jul 21, 2021

ohler55 commented Jul 23, 2021

okeeblow commented Jul 23, 2021

ohler55 commented Jul 24, 2021

ohler55 commented Aug 28, 2021

okeeblow commented Sep 1, 2021

ohler55 commented Sep 1, 2021

ohler55 commented Feb 8, 2022

Use "interned" `String`s when possible to avoid uneccessary heap allocation. #276

Use "interned" `String`s when possible to avoid uneccessary heap allocation. #276