-
-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HttpObjectEncoder scalability issue due to instanceof checks #12708
Labels
Comments
franz1981
added a commit
to franz1981/netty
that referenced
this issue
Aug 19, 2022
…etty#12708) Motivation: Current http encoder logic cause contention over the klass field used by the JIT to speed up instanceof checks vs interfaces, preventing scaling with multi-core machines. See https://bugs.openjdk.org/browse/JDK-8180450 for more info. Modifications: Code duplication to reduce the arity of the morphism on the instanceof checks on the http encoder. Removed using encoder inherited methods to take control of the arity of the call-site morphism while releasing ref counted http msg types. Result: Scalable HTTP encoding
franz1981
added a commit
to franz1981/netty
that referenced
this issue
Aug 22, 2022
…etty#12708) Motivation: Current http encoder logic cause contention over the klass field used by the JIT to speed up instanceof checks vs interfaces, preventing scaling with multi-core machines. See https://bugs.openjdk.org/browse/JDK-8180450 for more info. Modifications: Code duplication to reduce the arity of the morphism on the instanceof checks on the http encoder. Removed using encoder inherited methods to take control of the arity of the call-site morphism while releasing ref counted http msg types. Result: Scalable HTTP encoding
franz1981
added a commit
to franz1981/netty
that referenced
this issue
Aug 25, 2022
…etty#12708) Motivation: Current http encoder logic cause contention over the klass field used by the JIT to speed up instanceof checks vs interfaces, preventing scaling with multi-core machines. See https://bugs.openjdk.org/browse/JDK-8180450 for more info. Modifications: Code duplication to reduce the arity of the morphism on the instanceof checks on the http encoder. Removed using encoder inherited methods to take control of the arity of the call-site morphism while releasing ref counted http msg types. Result: Scalable HTTP encoding
This was referenced Sep 15, 2022
franz1981
added a commit
to franz1981/vert.x
that referenced
this issue
Oct 27, 2022
vietj
pushed a commit
to eclipse-vertx/vert.x
that referenced
this issue
Nov 7, 2022
franz1981
added a commit
to franz1981/vert.x
that referenced
this issue
Feb 23, 2023
franz1981
added a commit
to franz1981/vert.x
that referenced
this issue
Feb 28, 2023
vietj
pushed a commit
to eclipse-vertx/vert.x
that referenced
this issue
Mar 1, 2023
5 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Due to https://bugs.openjdk.org/browse/JDK-8180450 the chain of instanceof checks on
HttpObjectEncoder
competes to updateKlass::secondary_super_cache field
causing a scalability issue (very relevant with many cores and NUMA topologies).Detection of such issue isn't easy for 3 reasons:
Klass::secondary_super_cache field
cause false sharing over other fields sitting on the same cache line used to perform other parts of theinstanceof
logic (Klass::secondary_supers's length read to perform MacroAssembler::check_klass_subtype_slow_path) causing other instructions to receive a performance penalty while not directly causing the issue itselfinstanceof
, but placing an uncommon trap to guard against new types used (ie by making the issue to not happen)The 1. and 2. makes profilers very confused (see http://psy-lob-saw.blogspot.com/2018/07/how-inlined-code-confusing-profiles.html) if other Java methods are inlined near (forward/backward - the direction depends by the profiler, really) the guilty emitted JIT checks, getting all the blame. Only assembly analysis using tools like https://builds.shipilev.net/perfasm/ and perf c2c makes evident which instructions to blame.
The last reason means that validation benchmarks should consider polluting the JIT profiling data at the
instanceof
call-sites (see https://wiki.openjdk.org/display/HotSpot/MethodData) in order to have a synthetic way to provoke/detect it.Sadly Netty 4.1 cannot rewrite the existing http types (that are public) in order to NOT use the type information to perform state changes on the encoder, but Netty 5 can consider a different approach to solve this performance issue.
In an ideal world where there's a single implementor for each interface type (implementing just one too), the issue shouldn't happen.
To better understand (one flavour of) this issue, an example below.
Let's consider
DefaultFullHttpRequest
; it implementsFullHttpRequest
and (transitively)ReferenceCounted
too.In a code like this:
if the both instanceof check observes more then 1 concrete types eg
DefaultFullHttpRequest
andDefaultHttpRequest
then the checks will make use of the mechanism explained in https://bugs.openjdk.org/browse/JDK-8180450.
For example, if a single thread
write(o)
ando
is aDefaultFullHttpRequest
, the checks invalidate the mentionedDefaultFullHttpRequest
Klass field (first check set it toFullHttpMessage
, second toReferenceCounted
), that's just inefficient; but if many threads will do it, they would compete vs the same Klass field, invalidating each others and causing false sharing for surrounding Klass fields eg length of the Klass array used to perform the slow path check.In the existing code base this is the simplest form of
Klass::secondary_super_cache field
invalidation, but the logic of the encoder can make that happen too.The text was updated successfully, but these errors were encountered: