Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rb_enc_interned_str if available #451

Merged
merged 2 commits into from Jan 17, 2021
Merged

Conversation

casperisfine
Copy link

Ref: https://bugs.ruby-lang.org/issues/13381

So if all goes well, Ruby 3.0 should expose rb_enc_interned_str(char* ptr, long len, rb_encoding* enc). This function returns an interned string like String#-@, except that if the string already exist, it can return it without any allocation.

So for documents with a lot of repeated object keys, or using the freeze => true option, this has the potential to very heavily cut down on the number of allocation.

NB: the function doesn't work well on current ruby master, but hopefully ruby/ruby#3786 will make it usable. But I'd like to start the discussion now if possible.

@casperisfine
Copy link
Author

ruby/ruby#3786 was merged which means rb_enc_interned_str should be usable in Ruby 3.0.

@hsbt are you interested in this PR?

@casperisfine
Copy link
Author

@hsbt could I interest you in this PR, or @marcandre maybe?

@marcandre
Copy link
Contributor

Looks like a great PR, but I don't have write access. I wish we move this to ruby/json.

@casperisfine
Copy link
Author

but I don't have write access

Yeah I know :/

Who else could I try to pick their interest on this?

@hsbt
Copy link
Collaborator

hsbt commented Jan 7, 2021

I'm not sure what advantage with this change.

@nurse Do you interest in this?

@marcandre
Copy link
Contributor

It should be more efficient. Just to be sure: interned strings may be garbage collected like symbols?

@nurse
Copy link
Collaborator

nurse commented Jan 8, 2021

It should be more efficient. Just to be sure: interned strings may be garbage collected like symbols?

It will be garbage collected when there's not references to the fstring: https://github.com/ruby/ruby/blob/55e52c19e74d5df90560ea1cc4f2a2b9f5d7a5c4/string.c#L1410-L1419

1e95cf1 Refactor json_string_unescape

How large this improve the performance? If it improves enough performance, see review comments.

ext/json/ext/parser/parser.rl Show resolved Hide resolved
ext/json/ext/parser/parser.rl Outdated Show resolved Hide resolved
@casperisfine
Copy link
Author

I'm not sure what advantage with this change.

As discussed in https://bugs.ruby-lang.org/issues/16029, we have large configuration files we parse on boot and keep in memory forever. They contains a lot of repeated keys, that end up allocated and deduplicated later. This change would reduce our allocations on boot by over 20%, and since we spend 25-30% of our boot time in the GC, I'd expect a relatively nice speedup for us here.

Concretely, the following JSON:

[
  {"foo": 1},
  {"foo": 2},
]

Will currently allocate "foo" twice, but then likely GC both of them because Hash keys are automatically interned. By using rb_enc_interned_str we'd save two allocations.

Just to be sure: interned strings may be garbage collected like symbols?

Yes. It's similar to String#-@, it simply store them in some kind of internal weakmap to allow reusing them, but if they are dangling they'll be cleaned up.

How large this improve the performance?

If the numbers I posted above aren't enough I can try to craft a benchmark.

@casperisfine
Copy link
Author

How large this improve the performance?

Unless you meant the json_unescape refactoring itself? Because the reason I refactor it is that to prevent allocation, rb_enc_interned_str takes a char *. So the escaping has to be done on char * rather than on a Ruby String.

@nurse
Copy link
Collaborator

nurse commented Jan 8, 2021

Unless you meant the json_unescape refactoring itself? Because the reason I refactor it is that to prevent allocation, rb_enc_interned_str takes a char *. So the escaping has to be done on char * rather than on a Ruby String.

Yes, and I know. But as far as I understand these days just creating Ruby object is not so heavy while they are collected by Generational GC. I want to ensure the refactoring is worth to add more code complexity. Micro bench mark is ok, but also want one which tests alloca and xmalloc (shorter strings and longer strings).

@casperisfine
Copy link
Author

Ok. I'll try to produce theses soon.

@casperisfine
Copy link
Author

casperisfine commented Jan 8, 2021

Ok, so first micro benchmark, to see if allocating on the stack is worth it:

# frozen_string_literals: true
require 'benchmark/ips'
require 'json'
puts "Ruby: #{RUBY_VERSION}"
json = JSON.dump("\n\r\t\\")

Benchmark.ips do |x|
  x.report('parse escaped') { JSON.parse(json) }
end

This PR:

$ ruby -Ilib:ext/ /tmp/bench-json.rb 
Ruby: 2.7.2
Warming up --------------------------------------
       parse escaped   117.556k i/100ms
Calculating -------------------------------------
       parse escaped      1.178M (± 1.3%) i/s -      5.995M in   5.088271s
$ ruby -Ilib:ext/ /tmp/bench-json.rb 
Ruby: 2.7.2
Warming up --------------------------------------
       parse escaped   116.897k i/100ms
Calculating -------------------------------------
       parse escaped      1.154M (± 1.4%) i/s -      5.845M in   5.064481s

With same but with the condition commented out (so always using xmalloc):

$ ruby -Ilib:ext/ /tmp/bench-json.rb 
Ruby: 2.7.2
Warming up --------------------------------------
       parse escaped   110.132k i/100ms
Calculating -------------------------------------
       parse escaped      1.107M (± 1.8%) i/s -      5.617M in   5.074198s
$ ruby -Ilib:ext/ /tmp/bench-json.rb 
Ruby: 2.7.2
Warming up --------------------------------------
       parse escaped   110.173k i/100ms
Calculating -------------------------------------
       parse escaped      1.092M (± 1.5%) i/s -      5.509M in   5.045623s

So the difference doesn't seem that huge, but seem consistent. If you think it is a useless optimization, I'm fine with removing it.

@casperisfine
Copy link
Author

Oh, I forgot you asked for larger strings as well:

# frozen_string_literals: true
require 'benchmark/ips'
require 'json'
puts "Ruby: #{RUBY_VERSION}"
json = JSON.dump("\n\r\t\\")
json2 = JSON.dump("\n\r\t\\" * 16)

Benchmark.ips do |x|
  x.report('parse escaped(10B)') { JSON.parse(json) }
  x.report('parse escaped(130B)') { JSON.parse(json2) }
end

This PR:

$ ruby -Ilib:ext/ /tmp/bench-json.rb 
Ruby: 2.7.2
Warming up --------------------------------------
  parse escaped(10B)   116.996k i/100ms
 parse escaped(130B)    73.702k i/100ms
Calculating -------------------------------------
  parse escaped(10B)      1.169M (± 0.9%) i/s -      5.850M in   5.006243s
 parse escaped(130B)    736.164k (± 1.2%) i/s -      3.685M in   5.006539s
$ ruby -Ilib:ext/ /tmp/bench-json.rb 
Ruby: 2.7.2
Warming up --------------------------------------
  parse escaped(10B)   117.741k i/100ms
 parse escaped(130B)    74.573k i/100ms
Calculating -------------------------------------
  parse escaped(10B)      1.175M (± 1.7%) i/s -      5.887M in   5.010103s
 parse escaped(130B)    741.524k (± 1.2%) i/s -      3.729M in   5.029060s

With alloca removed:

$ ruby -Ilib:ext/ /tmp/bench-json.rb 
Ruby: 2.7.2
Warming up --------------------------------------
  parse escaped(10B)   109.443k i/100ms
 parse escaped(130B)    66.379k i/100ms
Calculating -------------------------------------
  parse escaped(10B)      1.101M (± 1.0%) i/s -      5.582M in   5.070559s
 parse escaped(130B)    668.632k (± 1.3%) i/s -      3.385M in   5.063923s
$ ruby -Ilib:ext/ /tmp/bench-json.rb 
Ruby: 2.7.2
Warming up --------------------------------------
  parse escaped(10B)   109.493k i/100ms
 parse escaped(130B)    66.244k i/100ms
Calculating -------------------------------------
  parse escaped(10B)      1.093M (± 1.0%) i/s -      5.475M in   5.009441s
 parse escaped(130B)    667.558k (± 1.2%) i/s -      3.378M in   5.061627s

@casperisfine
Copy link
Author

Another benchmark, so see the need for this feature overall:

# frozen_string_literal: true
require 'benchmark/ips'
require 'json'
puts "Ruby: #{RUBY_VERSION}"
json = JSON.dump(
  'foo' => 'bar',
  'plopplopplopplopplopplopplopplopplopplopplopplopplopplopplop' => 12,
  'bar' => 'foo',
  'egg' => 'spam',
  'spam' => 'egg',
)

Benchmark.ips do |x|
  x.report('parse(freeze: true)') { JSON.parse(json, freeze: true) }
end

This time on ruby 3.0.0 (to have rb_enc_interned_str):

8f7ddba (before this PR):

$ ruby -Ilib:ext/ /tmp/bench-interned-str.rb 
Ruby: 3.0.0
Warming up --------------------------------------
 parse(freeze: true)    27.322k i/100ms
Calculating -------------------------------------
 parse(freeze: true)    271.591k (± 1.8%) i/s -      1.366M in   5.031762s
$ ruby -Ilib:ext/ /tmp/bench-interned-str.rb 
Ruby: 3.0.0
Warming up --------------------------------------
 parse(freeze: true)    27.437k i/100ms
Calculating -------------------------------------
 parse(freeze: true)    275.807k (± 1.3%) i/s -      1.399M in   5.074361s

This PR:

$ ruby -Ilib:ext/ /tmp/bench-interned-str.rb 
Ruby: 3.0.0
Warming up --------------------------------------
 parse(freeze: true)    39.884k i/100ms
Calculating -------------------------------------
 parse(freeze: true)    399.643k (± 1.5%) i/s -      2.034M in   5.090945s
$ ruby -Ilib:ext/ /tmp/bench-interned-str.rb 
Ruby: 3.0.0
Warming up --------------------------------------
 parse(freeze: true)    40.338k i/100ms
Calculating -------------------------------------
 parse(freeze: true)    400.744k (± 1.5%) i/s -      2.017M in   5.034074s

@casperisfine
Copy link
Author

@nurse if these benchmark are good enough for you I can work on rebasing and solving your comments.

@nurse
Copy link
Collaborator

nurse commented Jan 8, 2021

@casperisfine Thank you for the benchmark. It looks this PR improves so much and alloca is also worth to introduce additional complexity. I'll merge this when you fix this.

@casperisfine casperisfine force-pushed the ruby-interned-str branch 2 times, most recently from 492bb70 to 33bcc78 Compare January 8, 2021 14:25
@casperisfine
Copy link
Author

@nurse I moved the rb_str_intern call inside json_string_unescape as requested. However I can't use rb_sym_intern as it's not public.

@nurse
Copy link
Collaborator

nurse commented Jan 8, 2021

@nurse I moved the rb_str_intern call inside json_string_unescape as requested. However I can't use rb_sym_intern as it's not public.

Oops sorry indeed. Please ignore that.

@casperisfine
Copy link
Author

I just noticed travis isn't testing against Ruby 3.0. I can push that if you'd like.

@casperisfine
Copy link
Author

@nurse is there anything else I could do?

@nurse
Copy link
Collaborator

nurse commented Jan 14, 2021

Some CIs are failed and how about 38e8477#r553682398?

@casperisfine
Copy link
Author

Some CIs are failed

Indeed, but I don't see any output. I don't know why they failed. I'll see if I can retry the jobs.

how about 38e8477#r553682398?

Not sure what this links to. It point to and old commit but I see no comment.

@casperisfine
Copy link
Author

I'll see if I can retry the jobs.

Seems like I don't have the permission to retry the jobs, and they are all very specific versions of OSX, test fully pass on ubuntu. I suspect some kind of flakiness, but no way to tell without the output.

@casperisfine
Copy link
Author

how about 38e8477#r553682398?

Ok, based on the comment id I think you are referring to ALLOC_N / ALLOCA_N. I've fixed that a while ago.

I edited my last commit message to try to restart CI, we'll see how it goes.

@casperisfine
Copy link
Author

So all the macos- job simply won't start. I suppose GitHub doesn't have enough macos capacity.

@nurse nurse merged commit ce14bf6 into flori:master Jan 17, 2021
@nurse
Copy link
Collaborator

nurse commented Jan 17, 2021

@casperisfine thanks! I merged this! Merged commit's CI works fine.

@casperisfine
Copy link
Author

Thank you!

okeeblow added a commit to okeeblow/ox that referenced this pull request Jul 20, 2021
…cation.

Fixes ohler55#275

- Background info: https://en.wikipedia.org/wiki/String_interning
- Available to C extensions since MRI Ruby 3.0:
  - https://bugs.ruby-lang.org/issues/13381
  - https://bugs.ruby-lang.org/issues/16029

This change makes `Ox` use "interned" (frozen and deduplicated) `String`s anywhere possible,
the same effect as calling `String#-@` except without the heap churn of `RVALUE` allocation:
https://ruby-doc.org/core/String.html#method-i-2B-40

- Uses the `HAVE_<funcname>` preprocessor macros to detect this functionality
  and avoid breaking older Rubies (confirmed with at least MRI 2.7.2).
- I explicitly use interned `String`s anywhere an allocated `String` seemed
  entirely internal to `Ox`, e.g. in `sax_value_as_time` when allocating
  the `String` argument to `ox_time_class`.
- Adds a new user-facing option `intern_strings` to control the behavior
  of `String` return values from user-facing functions, e.g. from `sax_as_s`.
- Results in a ~25% reduction in startup GC pressure in my library! :D

Inspired by similar changes to `json` and `msgpack`:

- flori/json#451
- msgpack/msgpack-ruby#196

My goal is `Object` allocation reduction in the `Ox::Sax` handler of my MIME Type
file-identification library caused by the large number of duplicate `String`s in the
`shared-mime-info`-format XML it uses as a data source.

I was already calling `#-@` (or using the `-some_str_variable` syntax) everywhere possible
in my handler, but that only freezes and deduplicates an already-allocated `String`:

  irb(main):068:0> oid = proc { puts "#{_1} is object_id #{_1.object_id}" }
  => #<Proc:0x0000564d53eb6850 (irb):66>
  irb(main):069:0> lol = 'lol'.tap(&oid).-@.tap(&oid)
  lol is object_id 19800
  lol is object_id 19740
  => "lol"
  irb(main):070:0> -lol.tap(&oid).-@.tap(&oid)
  lol is object_id 19740
  lol is object_id 19740
  => "lol"

That avoids duplicate retained `String`s but still causes my library to take
a big GC hit right at startup when it loads its data.

All the stats below were collected using SamSaffron's `memory_profiler`:
https://github.com/SamSaffron/memory_profiler

First, a sanity-check to make sure I didn't break MRI < 3.0,
using MRI 2.7 + latest official `Ox` from RubyGems (2.14.5):

  [okeeblow@emi#CHECKING-YOU-OUT] ruby -v
  ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-linux]
  [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total
  Total allocated: 20561351 bytes (401372 objects)
  Total retained:  1680971 bytes (22102 objects)

versus same MRI 2.7 but with my patched `Ox`:

  [okeeblow@emi#CHECKING-YOU-OUT] gem install ../../ox/ox-2.14.5.gem
  [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total
  Total allocated: 20561111 bytes (401366 objects)
  Total retained:  1680971 bytes (22102 objects)

No difference between the two, showing that this patch is a no-op for pre-3.0.
Sorry I did not test older versions of MRI or other Rubies like J/Truffle/etc
since I don't have them on my system.

Now the real gainz can be seen when comparing MRI 3.0 with unpatched `Ox`:

  [okeeblow@emi#CHECKING-YOU-OUT] bundle install
  Fetching gem metadata from https://rubygems.org/......
  …
  Installing ox 2.14.5 with native extensions
  Using checking-you-out 0.7.0 from source at `.`
  Bundle complete! 11 Gemfile dependencies, 19 gems now installed.
  [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total
  Total allocated: 20081080 bytes (390133 objects)
  Total retained:  1713209 bytes (22095 objects)

against same MRI 3.0 with my patched `Ox`:

  [okeeblow@emi#CHECKING-YOU-OUT] gem install ../../ox/ox-2.14.5.gem
  Building native extensions. This could take a while...
  Successfully installed ox-2.14.5
  Parsing documentation for ox-2.14.5
  unknown encoding name ""UTF-8"" for README.md, skipping
  Installing ri documentation for ox-2.14.5
  Done installing documentation for ox after 0 seconds
  1 gem installed
  [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total
  Total allocated: 17298804 bytes (322860 objects)
  Total retained:  1713202 bytes (22073 objects)

against same MRI 3.0, my patched `Ox`, and `Ox.parse_sax(intern_strings: true)`:

  [okeeblow@emi#CHECKING-YOU-OUT] ./bin/are-we-unallocated-yet|grep Total
  Total allocated: 16414938 bytes (301382 objects)
  Total retained:  1713226 bytes (22073 objects)

This shows that just having Ruby 3.0 available results in a huge win,
and opting into immutable `String`s from `Value#as_s` helps me even more!

Unit tests pass:

  [okeeblow@emi#test] ruby tests.rb
  Loaded suite tests
  Started
  .................
  16 tests, 40 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
  100% passed
  .....
  18 tests, 42 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
  100% passed
  ...............................................................................................................................
  151 tests, 249 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
  100% passed

Apologies for possible indentation issues in this patch. There seem to be lots of existing
lines with a mix of tabs and spaces — nbd, it happens — so I tried to match the surrounding
lines everywhere I made an addition to make sure things line up and look okay in `git diff`.

I only use `Ox` for Sax parsing, not for marshalling, so please scrutinize my changes
in `gen_load.c` and friends extra hard in case I omitted any `intern_strings` option
checks that could result in an unwary user getting an immutable `String` where
there wasn't one before. The one change I had to make to a `String#force_encoding`-using
test case is an example of what I want to avoid surprising anyone with.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants