Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpObjectEncoder scalability issue due to instanceof checks #12708

Closed
franz1981 opened this issue Aug 17, 2022 · 0 comments
Closed

HttpObjectEncoder scalability issue due to instanceof checks #12708

franz1981 opened this issue Aug 17, 2022 · 0 comments
Assignees
Labels
benchmark Affect the JMH benchmarks important improvement

Comments

@franz1981
Copy link
Contributor

franz1981 commented Aug 17, 2022

Due to https://bugs.openjdk.org/browse/JDK-8180450 the chain of instanceof checks on HttpObjectEncoder competes to update Klass::secondary_super_cache field causing a scalability issue (very relevant with many cores and NUMA topologies).

Detection of such issue isn't easy for 3 reasons:

  1. performance hit won't happen in any Java code (ie no BCI - byte code index)
  2. updates on Klass::secondary_super_cache field cause false sharing over other fields sitting on the same cache line used to perform other parts of the instanceof logic (Klass::secondary_supers's length read to perform MacroAssembler::check_klass_subtype_slow_path) causing other instructions to receive a performance penalty while not directly causing the issue itself
  3. benchmarks (and micro-benchmarks) tends to use just a single http msg type making the JIT able to NOT perform any instanceof, but placing an uncommon trap to guard against new types used (ie by making the issue to not happen)

The 1. and 2. makes profilers very confused (see http://psy-lob-saw.blogspot.com/2018/07/how-inlined-code-confusing-profiles.html) if other Java methods are inlined near (forward/backward - the direction depends by the profiler, really) the guilty emitted JIT checks, getting all the blame. Only assembly analysis using tools like https://builds.shipilev.net/perfasm/ and perf c2c makes evident which instructions to blame.
The last reason means that validation benchmarks should consider polluting the JIT profiling data at the instanceof call-sites (see https://wiki.openjdk.org/display/HotSpot/MethodData) in order to have a synthetic way to provoke/detect it.

Sadly Netty 4.1 cannot rewrite the existing http types (that are public) in order to NOT use the type information to perform state changes on the encoder, but Netty 5 can consider a different approach to solve this performance issue.

In an ideal world where there's a single implementor for each interface type (implementing just one too), the issue shouldn't happen.

To better understand (one flavour of) this issue, an example below.
Let's consider DefaultFullHttpRequest; it implements FullHttpRequest and (transitively) ReferenceCounted too.
In a code like this:

    void write(Object o) {
        try {
            if (o instanceof FullHttpMessage) {
                foo(o);
            }
            // ... other logic
        } finally {
            if (o instanceof ReferenceCounted) { 
                ((ReferenceCounted) o).release();
            }
       } 
    }

if the both instanceof check observes more then 1 concrete types eg DefaultFullHttpRequest and DefaultHttpRequest
then the checks will make use of the mechanism explained in https://bugs.openjdk.org/browse/JDK-8180450.

For example, if a single thread write(o) and o is a DefaultFullHttpRequest, the checks invalidate the mentioned DefaultFullHttpRequest Klass field (first check set it to FullHttpMessage, second to ReferenceCounted), that's just inefficient; but if many threads will do it, they would compete vs the same Klass field, invalidating each others and causing false sharing for surrounding Klass fields eg length of the Klass array used to perform the slow path check.

In the existing code base this is the simplest form of Klass::secondary_super_cache field invalidation, but the logic of the encoder can make that happen too.

@franz1981 franz1981 added improvement benchmark Affect the JMH benchmarks important labels Aug 17, 2022
@franz1981 franz1981 self-assigned this Aug 17, 2022
franz1981 added a commit to franz1981/netty that referenced this issue Aug 19, 2022
…etty#12708)

Motivation:

Current http encoder logic cause contention over the klass field used by the JIT to speed up instanceof checks vs interfaces,
preventing scaling with multi-core machines.
See https://bugs.openjdk.org/browse/JDK-8180450 for more info.

Modifications:

Code duplication to reduce the arity of the morphism on the instanceof checks on the http encoder.
Removed using encoder inherited methods to take control of the arity of the
call-site  morphism while releasing ref counted http msg types.

Result:

Scalable HTTP encoding
franz1981 added a commit to franz1981/netty that referenced this issue Aug 22, 2022
…etty#12708)

Motivation:

Current http encoder logic cause contention over the klass field used by the JIT to speed up instanceof checks vs interfaces,
preventing scaling with multi-core machines.
See https://bugs.openjdk.org/browse/JDK-8180450 for more info.

Modifications:

Code duplication to reduce the arity of the morphism on the instanceof checks on the http encoder.
Removed using encoder inherited methods to take control of the arity of the
call-site  morphism while releasing ref counted http msg types.

Result:

Scalable HTTP encoding
franz1981 added a commit to franz1981/netty that referenced this issue Aug 25, 2022
…etty#12708)

Motivation:

Current http encoder logic cause contention over the klass field used by the JIT to speed up instanceof checks vs interfaces,
preventing scaling with multi-core machines.
See https://bugs.openjdk.org/browse/JDK-8180450 for more info.

Modifications:

Code duplication to reduce the arity of the morphism on the instanceof checks on the http encoder.
Removed using encoder inherited methods to take control of the arity of the
call-site  morphism while releasing ref counted http msg types.

Result:

Scalable HTTP encoding
franz1981 added a commit to franz1981/vert.x that referenced this issue Oct 27, 2022
vietj pushed a commit to eclipse-vertx/vert.x that referenced this issue Nov 7, 2022
vietj pushed a commit to eclipse-vertx/vert.x that referenced this issue Mar 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Affect the JMH benchmarks important improvement
Projects
None yet
Development

No branches or pull requests

1 participant