Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression with json/pure from 1.7 to 9k #3921

Open
cprice404 opened this issue May 24, 2016 · 12 comments
Open

Performance regression with json/pure from 1.7 to 9k #3921

cprice404 opened this issue May 24, 2016 · 12 comments

Comments

@cprice404
Copy link
Contributor

Environment

JRuby 1.7.20.1, JRuby 9.1.0.0, JRuby 9.1.1.0; run from Java via ScriptingContainer, with various values tested for the setCompileMode method. Standalone script execution with no frameworks. Benchmarked JSON gems json/pure, json/ext, and jrjackson.

Expected Behavior

Expected performance for 9.1.x to be on par with performance for 1.7.x.

Actual Behavior

For json/pure, JRuby 9k performance is markedly slower (75%) when compileMode is set to OFF.

I have set up a github repo with a simple reproducer here:
https://github.com/cprice404/jruby9k-benchmarks/tree/19a2719e4edddc17e3f1fabbf311b9d80c6eba80/json-comparison

I also graphed some of the results in a google spreadsheet here:
https://docs.google.com/spreadsheets/d/1M64QXK_RSFd05_Te9PamTv7ALi7XlAkhWeyRoAoIFZU/edit#gid=866554277

Here are some relevant numbers from the benchmark output (the rest of it is available in the linked reproducer repro if you are interested):

JRUBY VERSION: 1.7.20.1
JSON VERSION: json/pure
COMPILE MODE: off
              user     system      total        real
warmup (100 runs): 79.419000   0.000000  79.419000 ( 79.418000)
middle (1000 runs):754.537000   0.000000 754.537000 (754.537000)
tail (100 runs): 74.968000   0.000000  74.968000 ( 74.968000)

JRUBY VERSION: 9.1.1.0
JSON VERSION: json/pure
COMPILE MODE: off
              user     system      total        real
warmup (100 runs):137.360000   1.240000 138.600000 (127.982415)
middle (1000 runs):1352.780000   1.620000 1354.400000 (1327.873641)
tail (100 runs):149.260000   0.350000 149.610000 (147.532336)

The other JSON libraries besides json/pure are much, much faster, and are in the ballpark of JRuby 1.7 performance. Also, when using json/pure with any other compileMode besides OFF, the delta between JRuby 1.7 and JRuby 9k drops significantly.

Unfortunately, our application codebase currently vendors a fork of json/pure that will make it very difficult for us to switch to a different JSON backend.

Also, we haven't had great luck in our experiments with using compileMode JIT, most likely because it seems to be very expensive during the warmup of a ScriptingContainer, and we destroy and create new ScriptingContainers with some frequency during our application execution.

I recognize that many users will be able to avoid using json/pure, and thus the priority of solving this may be diminished; however, my hope is that perhaps the reproducer highlights some kind of memory usage or performance issue that would be relevant to other JRuby applications, and thus, that looking into optimizations for this scenario might yield improvements in other areas as well. Or, at least, that it would be interesting to try to characterize why the performance changes so drastically between JRuby 1.7 and 9k.

@headius
Copy link
Member

headius commented Sep 26, 2016

Link with #4112.

@headius
Copy link
Member

headius commented May 15, 2018

Ok, this is still slower for me on 9.2, but the difference is closer:

1.7.27 jit: 38s
1.7.27 int: 51s
9.2.0.0 jit: 41s
9.2.0.0 int: 93s

Note this is just the warmup run. I'm going to bump to 9.2.1 to go along with other perf work post 9.2.

@headius headius modified the milestones: JRuby 9.2.0.0, JRuby 9.2.1.0 May 15, 2018
@enebo
Copy link
Member

enebo commented Sep 19, 2018

I think we finally caught up in upcoming 9.2.1.0:

Detected JRuby version 1.7.x, setting compatVersion 1.9.
Setting compileMode to: 'JIT'
JRUBY VERSION: 1.7.20.1
COMPILE MODE: jit
                           user     system      total        real
warmup (100 runs):     0.105000   0.000000   0.105000 (  0.105000)
middle (10000 runs):   0.772000   0.000000   0.772000 (  0.772000)
tail (100 runs):       0.005000   0.000000   0.005000 (  0.005000)
Detected JRuby version 1.7.x, setting compatVersion 1.9.
Setting compileMode to: 'OFF'
JRUBY VERSION: 1.7.20.1
COMPILE MODE: off
                           user     system      total        real
warmup (100 runs):     0.086000   0.000000   0.086000 (  0.086000)
middle (10000 runs):   0.913000   0.000000   0.913000 (  0.912000)
tail (100 runs):       0.010000   0.000000   0.010000 (  0.010000)
JRUBY VERSION: 9.2.1.0-SNAPSHOT
COMPILE MODE: jit
                           user     system      total        real
warmup (100 runs):     0.460000   0.010000   0.470000 (  0.089430)
middle (10000 runs):   2.990000   0.010000   3.000000 (  0.648920)
tail (100 runs):       0.020000   0.000000   0.020000 (  0.004793)
Detected JRuby version 9k, skipping compatVersion.
Setting compileMode to: 'OFF'
JRUBY VERSION: 9.2.1.0-SNAPSHOT
COMPILE MODE: off
                           user     system      total        real
warmup (100 runs):     0.380000   0.000000   0.380000 (  0.067444)
middle (10000 runs):   2.770000   0.020000   2.790000 (  0.628573)
tail (100 runs):       0.030000   0.000000   0.030000 (  0.006038)

@enebo
Copy link
Member

enebo commented Sep 19, 2018

I don't understand though why my runs are so short...

@enebo
Copy link
Member

enebo commented Sep 19, 2018

Oh but I should say we are markedly faster with these short runs than 9.2 and 9.1 so something definitely got better. Will try and lengthen this out and see how it does with a longer execution.

@enebo
Copy link
Member

enebo commented Sep 19, 2018

HAHAHA...I just realized I ran timeparse bench and not the json one. So good news the time parse work we did for 9.2.1.0 shows us much better...now to actually re-bench with the proper benchmark.

@enebo
Copy link
Member

enebo commented Sep 19, 2018

Detected JRuby version 1.7.x, setting compatVersion 1.9.
Setting compileMode to: 'JIT'
Using JSON gem: 'json/pure'
JRUBY VERSION: 1.7.20.1
JSON VERSION: json/pure
COMPILE MODE: jit
              user     system      total        real
warmup (100 runs): 39.246000   0.000000  39.246000 ( 39.247000)
middle (1000 runs):379.237000   0.000000 379.237000 (379.237000)
tail (100 runs): 37.718000   0.000000  37.718000 ( 37.718000)

Detected JRuby version 1.7.x, setting compatVersion 1.9.
Setting compileMode to: 'OFF'
Using JSON gem: 'json/pure'
JRUBY VERSION: 1.7.20.1
JSON VERSION: json/pure
COMPILE MODE: off
              user     system      total        real
warmup (100 runs): 51.216000   0.000000  51.216000 ( 51.216000)
middle (1000 runs):506.285000   0.000000 506.285000 (506.285000)
tail (100 runs): 51.710000   0.000000  51.710000 ( 51.710000)

Detected JRuby version 9k, skipping compatVersion.
Setting compileMode to: 'JIT'
Using JSON gem: 'json/pure'
JRUBY VERSION: 9.2.1.0-SNAPSHOT
JSON VERSION: json/pure
COMPILE MODE: jit
              user     system      total        real
warmup (100 runs): 42.770000   0.500000  43.270000 ( 34.794050)
middle (1000 runs):348.780000   1.210000 349.990000 (335.423335)
tail (100 runs): 34.920000   0.140000  35.060000 ( 33.398251)

Detected JRuby version 9k, skipping compatVersion.
Setting compileMode to: 'OFF'
Using JSON gem: 'json/pure'
JRUBY VERSION: 9.2.1.0-SNAPSHOT
JSON VERSION: json/pure
COMPILE MODE: off
              user     system      total        real
warmup (100 runs): 92.020000   0.460000  92.480000 ( 85.222364)
middle (1000 runs):898.370000   2.400000 900.770000 (862.650188)
tail (100 runs): 87.450000   0.080000  87.530000 ( 85.964957)

JIT has passed 1.7.x now but interp is still like 40% slower. Ratio has not changed much since this was reported. This should stay open as json/pure probably a good test case for something expensive in the interp.

@headius
Copy link
Member

headius commented Aug 9, 2019

Updated numbers for JRuby 9.2.8 versus JRuby 1.7.27 on my machine.

Detected JRuby version 9k, skipping compatVersion.
Setting compileMode to: 'JIT'
Using JSON gem: 'json/pure'
JRUBY VERSION: 9.2.8.0-SNAPSHOT
JSON VERSION: json/pure
COMPILE MODE: jit
              user     system      total        real
warmup (100 runs): 39.540000   1.000000  40.540000 ( 32.464168)
middle (1000 runs):317.680000   2.780000 320.460000 (308.937475)
tail (100 runs): 30.900000   0.110000  31.010000 ( 30.490583)

Detected JRuby version 9k, skipping compatVersion.
Setting compileMode to: 'OFF'
Using JSON gem: 'json/pure'
JRUBY VERSION: 9.2.8.0-SNAPSHOT
JSON VERSION: json/pure
COMPILE MODE: off
              user     system      total        real
warmup (100 runs): 90.490000   1.350000  91.840000 ( 85.663210)
middle (1000 runs):845.530000   3.840000 849.370000 (839.079444)
tail (100 runs): 85.550000   0.330000  85.880000 ( 85.484035)
Detected JRuby version 1.7.x, setting compatVersion 1.9.
Setting compileMode to: 'JIT'
Using JSON gem: 'json/pure'
JRUBY VERSION: 1.7.27
JSON VERSION: json/pure
COMPILE MODE: jit
              user     system      total        real
warmup (100 runs): 37.474000   0.000000  37.474000 ( 37.473000)
middle (1000 runs):364.468000   0.000000 364.468000 (364.468000)
tail (100 runs): 35.984000   0.000000  35.984000 ( 35.985000)

Detected JRuby version 1.7.x, setting compatVersion 1.9.
Setting compileMode to: 'OFF'
Using JSON gem: 'json/pure'
JRUBY VERSION: 1.7.27
JSON VERSION: json/pure
COMPILE MODE: off
              user     system      total        real
warmup (100 runs): 51.976000   0.000000  51.976000 ( 51.975000)
middle (1000 runs):519.210000   0.000000 519.210000 (519.210000)
tail (100 runs): 49.629000   0.000000  49.629000 ( 49.629000)

The numbers for json/pure are perhaps slightly improved from what @enebo found them in September, with JIT comfortably faster (30s vs 36s for tail) and interpreter still around 1.75x the time of the interpreter (86s vs 50s).

I'm going to take a bit of time and see if there's a reason why the interpreter is so much slower.

@headius
Copy link
Member

headius commented Aug 9, 2019

A quick allocation profile shows one possible avenue: the interpreted mode allocates significantly more Object[] than the jitted mode. Most of these appear to be coming from the interpreter's temp variables, which must be allocated anew for each call to any interpreted method that has local or temporary variables (which basically means all of them, since it's a register machine).

@enebo Maybe it's worth doing another experiment with using a large per-thread Object[] and carving off sections of it rather than allocating every time? I don't see anything obvious in the sampled profile; the overhead is mostly within joni in both cases.

@headius
Copy link
Member

headius commented Aug 9, 2019

Frustratingly, another attempt at pooling Object[] does not appear to have helped. I tried with and without value-clearing (we need to clear to avoid anchoring references beyond the life of a method call) and it still always ends up roughly the same level of perf.

Here's the patch for what it's worth: https://gist.github.com/a7eed3fc8f9870bc2c0ff6b41e892c15

@headius
Copy link
Member

headius commented Aug 10, 2019

I made a few more small attempts to improve interpreter performance, but nothing really made a difference:

  • Inlining more of the bodies we outlined, or making sure outlined bodies can inline (using JITWatch to detect failed inlines).
  • Tweaking the JUMP switch to be a == comparison rather than a lookupswitch (this actually slowed things down measurably because of the static get of the JUMP enum value).
  • Outlining more elements from the outlined bodies.

FWIW I also measured TruffleRuby against json/pure, to get a complete picture of how we all do. I am not sure if there's a way to disable TruffleRuby JIT, so this is their JIT performance with the native image:

JSON VERSION: json/pure
              user     system      total        real
warmup (100 runs):187.997878   7.928058 195.925936 (144.851443)
middle (1000 runs):1115.383371  37.316330 1152.699701 (1136.046028)
tail (100 runs):102.493481   3.187866 105.681347 (106.068449)

@kares kares modified the milestones: JRuby 9.2.8.0, JRuby 9.2.9.0 Aug 19, 2019
@enebo enebo removed this from the JRuby 9.2.9.0 milestone Oct 22, 2019
@enebo
Copy link
Member

enebo commented Oct 22, 2019

De-targetting. I do have an idea on how to improve cases and branching/jumping in the interpreter (load/stores to temp array can be elided in single 1:1 use:def cases to a local variable) and this will have some effect...but...the work to implement that is going to be substantial (all calls/lookups must be in same local variable scope as all branches). There are bigger fish to fry so we will leave this open and it may be a funday friday acitivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants