Fix scalability issue due to checkcast on context's invoke operations #12806

franz1981 · 2022-09-15T14:08:07Z

Motivation:

ChannelDuplexHandler can implements both ChannelOutboundHandler and ChannelInboundHandler causing a scalability issue due to checkcast due to https://bugs.openjdk.org/browse/JDK-8180450

Modifications:

Peeling-off invoke methods turning the checkcast vs interfaces into an instanceof vs ChannelDuplexHandler, saving the scalability issue to happen. Sadly, if users manually implements both ChannelOutboundHandler and ChannelInboundHandler without extending ChannelDuplexHandler the fix won't be enough.

Result:

Scalable duplex channel handler operations.

Motivation: ChannelDuplexHandler can implements both ChannelOutboundHandler and ChannelInboundHandler causing a scalability issue due to checkcast due to https://bugs.openjdk.org/browse/JDK-8180450 Modifications: Peeling-off invoke methods turning the checkcast vs interfaces into an instanceof vs ChannelDuplexHandler, saving the scalability issue to happen. Sadly, if users manually implements both ChannelOutboundHandler and ChannelInboundHandler without extending ChannelDuplexHandler the fix won't be enough. Result: Scalable duplex channel handler operations.

franz1981 · 2022-09-15T14:13:07Z

In theory this is already working @normanmaurer @chrisvest :
in our labs, in many tests I'm getting this level of improvement

4.5 M req/sec -> 5.8 M req/sec

And flamegraphs looks so much better

to

and the same for the outbound side too:

to

In short, no weird costs (that was costing as much has http decoding/encoding) while firing/traversing the channel pipeline
This is going to improve not just http but everyone that's using duplex channel handlers, but it has some limits, and I'm opened to suggestions here; that's why is still in draft despite being ready to be reviewed.

franz1981 · 2022-09-15T14:14:03Z

Closing #12591 because this one was the proper fix and root cause.
Similar to #12708 but on checkcast bytecode

franz1981 · 2022-09-15T14:18:26Z

FYI @doom369

franz1981 · 2022-09-15T14:23:21Z

FYI given that head/tail element of the pipeline are concrete class with high probability of being observed, it seems that adding checks vs them won't help that much because the JIT is quite capable of guarding for the "frequent" cases without making us to change any line of code, but I'm opened to change this and add explicit equality check(s) too (not using any instanceof), if it makes sense and not relies on internal JDK counters for this.

normanmaurer · 2022-09-15T15:47:17Z

FYI given that head/tail element of the pipeline are concrete class with high probability of being observed, it seems that adding checks vs them won't help that much because the JIT is quite capable of guarding for the "frequent" cases without making us to change any line of code, but I'm opened to change this and add explicit equality check(s) too (not using any instanceof), if it makes sense and not relies on internal JDK counters for this.

I agree

transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java

doom369 · 2022-09-15T16:35:40Z

it seems that adding checks vs them won't help that much because the JIT is quite capable of guarding for the "frequent" cases without making us to change any line of code

Agree. However, from the other side - you never know when and if a certain JIT optimization will kick in. That's, I would say, the main problem with JIT. You can't predict it and thus you can't rely on it. So the only real option is to profile under the specific conditions that fit your use case. Also, let's not forget that JIT is not free as well, especially on the server start.

franz1981 · 2022-09-15T16:51:34Z

Re the head/tail instance checks I have tried doing it manually by hitting different probabilities to match what the JIT decide for they bytecode MethodData and it wasn't bringing any visible improvement.
If I can distill a JMH bench to have fun with this we can explore that option too, but consider that's beyond the scope of this change that's merely patching the scalability issue (still not completely solved as mentioned in the commit msg).

chrisvest · 2022-09-15T17:13:15Z

Restarted the build because of #12809

transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java

franz1981 · 2022-09-16T13:21:59Z

@normanmaurer Running benchmarks again on this version (shouldn't be a problem TBH, but better be safe...)

franz1981 · 2022-09-16T14:19:48Z

numbers on the CI are strange - I'm investigating if it's due this last commit

chrisvest · 2022-09-16T15:59:45Z

@franz1981 Ping me when it's ready for review again.

transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java

normanmaurer · 2022-09-18T11:18:09Z

@franz1981 is it also true that we don't have this issue in netty 5 anymore as there is only ChannelHandler ?

franz1981 · 2022-09-18T12:19:53Z

Yep @normanmaurer it shouldn't be a problem anymore but better have a proper reproducer and verify it...
In the meantime, I am working with my team to stabilize the CI we used to detect this; do you have anything on Apple/at home suitable to try it ?

normanmaurer · 2022-09-18T23:22:29Z

Yep

franz1981 · 2022-09-20T22:50:57Z

In the meantime we're stabilizing the perf CI with a proper system test, I've created a microbenchmark to help reproducing this;
below some numbers from my (well-tuned, noiseless but still - SIGH - no NUMA) laptop

# JMH version: 1.33
# VM version: JDK 1.8.0_345, OpenJDK 64-Bit Server VM, 25.345-b01
# VM invoker: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.345.b01-1.fc36.x86_64/jre/bin/java
# VM options: -server -dsa -da -ea:io.netty... -XX:+HeapDumpOnOutOfMemoryError -Dio.netty.leakDetection.level=disabled -Xms768m -Xmx768m -XX:MaxDirectMemorySize=768m -XX:BiasedLockingStartupDelay=0 -Djmh.executor=CUSTOM -Djmh.executor.class=io.netty.microbench.util.AbstractMicrobenchmark$HarnessExecutor
# Blackhole mode: full + dont-inline hint (default, use -Djmh.blackhole.autoDetect=true to auto-detect)

Now:

Benchmark                                                                      Mode  Cnt         Score         Error  Units
DefaultChannelPipelineDuplexHandlerBenchmark.parallelPropagateEvent           thrpt   10  51122083.557 ± 2916748.927  ops/s
DefaultChannelPipelineDuplexHandlerBenchmark.propagateEvent                   thrpt   10  12952652.309 ±  239318.791  ops/s

while on 4.1:

Benchmark                                                                      Mode  Cnt         Score        Error  Units
DefaultChannelPipelineDuplexHandlerBenchmark.parallelPropagateEvent           thrpt   10  11502465.689 ± 293861.798  ops/s
DefaultChannelPipelineDuplexHandlerBenchmark.propagateEvent                   thrpt   10   5264915.927 ±  18689.621  ops/s

The parallel version is using 4 physical cores (HT is disabled to save weird CPU-bound benchmarking quirks to happen).
In this PR the scalability is perfectly linear, while on 4.1 is ~50% of the expected speedup (ie ~21 M ops/sec).

Please don't compare the same test across versions, because right now we benefit from bimorphic inlining of duplex invoke calls (for this bench), while on 4.1 we have itable_stub ones + slow patch checkcast (see #12708 for more details); in the real world channel handlers perform real work and the dispatching speedup shouldn't be that evident (although present).

Below few metrics using -prof perfc2c to compare between the 2 versions:

Now, parallel:

=================================================
    Global Shared Cache Line Event Information   
=================================================
  Total Shared Cache Lines          :         56
  Load HITs on shared lines         :       1713
  Fill Buffer Hits on shared lines  :        576
  L1D hits on shared lines          :        537
  L2D hits on shared lines          :          0
  LLC hits on shared lines          :        600
  Locked Access on shared lines     :        362
  Blocked Access on shared lines    :          0
  Store HITs on shared lines        :        333
  Store L1D hits on shared lines    :        190
  Store No available memory level   :          0
  Total Merged records              :        761

And single threaded:

=================================================
    Global Shared Cache Line Event Information   
=================================================
  Total Shared Cache Lines          :         18
  Load HITs on shared lines         :        512
  Fill Buffer Hits on shared lines  :        196
  L1D hits on shared lines          :        178
  L2D hits on shared lines          :          0
  LLC hits on shared lines          :        138
  Locked Access on shared lines     :        183
  Blocked Access on shared lines    :          0
  Store HITs on shared lines        :         62
  Store L1D hits on shared lines    :         62
  Store No available memory level   :          0
  Total Merged records              :        187

Most numbers are nearly 4 times mores as expected by the benchmark parallelism, without outliers.

while on 4.1, parallel:

=================================================
    Global Shared Cache Line Event Information   
=================================================
  Total Shared Cache Lines          :         18
  Load HITs on shared lines         :     107834
  Fill Buffer Hits on shared lines  :      50613
  L1D hits on shared lines          :       8722
  L2D hits on shared lines          :          0
  LLC hits on shared lines          :      48499
  Locked Access on shared lines     :         70
  Blocked Access on shared lines    :          0
  Store HITs on shared lines        :      10127
  Store L1D hits on shared lines    :       9524
  Store No available memory level   :          0
  Total Merged records              :      37348

And single-threaded:

=================================================
    Global Shared Cache Line Event Information   
=================================================
  Total Shared Cache Lines          :         10
  Load HITs on shared lines         :        131
  Fill Buffer Hits on shared lines  :         52
  L1D hits on shared lines          :         36
  L2D hits on shared lines          :          0
  LLC hits on shared lines          :         43
  Locked Access on shared lines     :         39
  Blocked Access on shared lines    :          0
  Store HITs on shared lines        :         20
  Store L1D hits on shared lines    :         20
  Store No available memory level   :          0
  Total Merged records              :         48

It presents a very different picture, with 4X more "hits" on shared lines, sign of some form of heavy contention.

Using -prof perfnorm show indeed that loading Klass::secondary_super_cache field at

0x00007f1065356eee: mov    0x18(%rsi),%r10

is indeed getting some huge cost (although reported one instruction later due to skidding).

C2, level 4, io.netty.microbench.channel.DefaultChannelPipelineDuplexHandlerBenchmark$2::channelReadComplete, version 877 (483 bytes)


  0.11%   ↘     0x00007f1065356edc: mov    0x8(%r11),%r10d
                0x00007f1065356ee0: movabs $0x0,%rsi
  0.01%         0x00007f1065356eea: lea    (%rsi,%r10,8),%rsi
  0.03%         0x00007f1065356eee: mov    0x18(%rsi),%r10
 12.80%         0x00007f1065356ef2: movabs $0x1001331b8,%rax  ;   {metadata(&apos;io/netty/channel/ChannelInboundHandler&apos;)}
  0.01%         0x00007f1065356efc: cmp    %rax,%r10
           ╭    0x00007f1065356eff: jne    0x00007f1065356f24  ;*checkcast
           │                                                  ; - io.netty.channel.AbstractChannelHandlerContext::invokeChannelReadComplete@11 (line 410)
           │                                                  ; - io.netty.channel.AbstractChannelHandlerContext::invokeChannelReadComplete@15 (line 397)
           │                                                  ; - io.netty.channel.AbstractChannelHandlerContext::fireChannelReadComplete@6 (line 390)
           │                                                  ; - io.netty.microbench.channel.DefaultChannelPipelineDuplexHandlerBenchmark$2::channelReadComplete@1 (line 50)
  0.38%    │ ↗  0x00007f1065356f01: mov    %r11,%rsi
  0.05%    │ │  0x00007f1065356f04: mov    0x8(%rsp),%rdx
  0.38%    │ │  0x00007f1065356f09: movabs $0xffffffffffffffff,%rax
           │ │  0x00007f1065356f13: callq  0x00007f1065046020  ; OopMap{[0]=Oop [8]=Oop off=440}
           │ │                                                ;*invokeinterface channelReadComplete
           │ │                                                ; - io.netty.channel.AbstractChannelHandlerContext::invokeChannelReadComplete@15 (line 410)
           │ │                                                ; - io.netty.channel.AbstractChannelHandlerContext::invokeChannelReadComplete@15 (line 397)
           │ │                                                ; - io.netty.channel.AbstractChannelHandlerContext::fireChannelReadComplete@6 (line 390)
           │ │                                                ; - io.netty.microbench.channel.DefaultChannelPipelineDuplexHandlerBenchmark$2::channelReadComplete@1 (line 50)
           │ │                                                ;   {virtual_call}
           │ │  0x00007f1065356f18: add    $0x30,%rsp
  0.03%    │ │  0x00007f1065356f1c: pop    %rbp
  0.14%    │ │  0x00007f1065356f1d: test   %eax,0x17afc0dd(%rip)        # 0x00007f107ce53000
           │ │                                                ;   {poll_return}
  0.01%    │ │  0x00007f1065356f23: retq   
  0.10%    ↘ │  0x00007f1065356f24: push   %rax
  0.02%      │  0x00007f1065356f25: mov    %rax,%rax
             │  0x00007f1065356f28: mov    0x20(%rsi),%rdi
  0.34%      │  0x00007f1065356f2c: mov    (%rdi),%ecx
  0.35%      │  0x00007f1065356f2e: add    $0x8,%rdi
  0.01%      │  0x00007f1065356f32: test   %rax,%rax
  0.37%      │  0x00007f1065356f35: repnz scas %es:(%rdi),%rax
  2.73%      │  0x00007f1065356f38: pop    %rax
  0.72%     ╭│  0x00007f1065356f39: jne    0x00007f1065356f43
            ││  0x00007f1065356f3f: mov    %rax,0x18(%rsi)
  0.19%     ↘╰  0x00007f1065356f43: je     0x00007f1065356f01

And similarly here:

  C2, level 4, io.netty.channel.AbstractChannelHandlerContext::invokeFlush, version 870 (245 bytes) 
  
           ↘  0x00007f1065359963: mov    0x8(%r10),%r11d
  0.02%     0x00007f1065359967: movabs $0x0,%rsi
            0x00007f1065359971: lea    (%rsi,%r11,8),%rsi
  0.09%     0x00007f1065359975: mov    0x18(%rsi),%r11
 12.50%     0x00007f1065359979: cmp    %rax,%r11
            0x00007f106535997c: jne    0x00007f1065359cbd  ;*checkcast
                                                          ; - io.netty.channel.AbstractChannelHandlerContext::invokeFlush0@4 (line 750)
                                                          ; - io.netty.channel.AbstractChannelHandlerContext::invokeFlush@8 (line 742)
                                                          ; - io.netty.channel.AbstractChannelHandlerContext::flush@22 (line 728)
                                                          ; - io.netty.microbench.channel.DefaultChannelPipelineDuplexHandlerBenchmark$3::flush@1 (line 63)
                                                          ; - io.netty.channel.AbstractChannelHandlerContext::invokeFlush0@-1 (line 750)
                                                          ; - io.netty.channel.AbstractChannelHandlerContext::invokeFlush@8 (line 742)
  0.44%     0x00007f1065359982: mov    0x8(%r10),%ecx
  0.55%     0x00007f1065359986: cmp    $0x2002cac3,%ecx   ;   {metadata(&apos;io/netty/microbench/channel/DefaultChannelPipelineDuplexHandlerBenchmark$3&apos;)}
            0x00007f106535998c: je     0x00007f1065359a81
  0.15%     0x00007f1065359992: cmp    $0x2002678f,%ecx   ;   {metadata(&apos;io/netty/channel/DefaultChannelPipeline$HeadContext&apos;)}
            0x00007f1065359998: jne    0x00007f106535a07d  ;*invokeinterface flush
                                                          ; - io.netty.channel.AbstractChannelHandlerContext::invokeFlush0@8 (line 750)
                                                          ; - io.netty.channel.AbstractChannelHandlerContext::invokeFlush@8 (line 742)
                                                          ; - io.netty.channel.AbstractChannelHandlerContext::flush@22 (line 728)
                                                          ; - io.netty.microbench.channel.DefaultChannelPipelineDuplexHandlerBenchmark$3::flush@1 (line 63)
                                                          ; - io.netty.channel.AbstractChannelHandlerContext::invokeFlush0@-1 (line 750)
                                                          ; - io.netty.channel.AbstractChannelHandlerContext::invokeFlush@8 (line 742)

the same costs are not present in the new version (nor the same instructions, because, as said earlier no slow path is required at all).

franz1981 · 2022-09-21T00:49:36Z

@chrisvest Don't think the failures to be related to this, I've just added a microbench :(

normanmaurer · 2022-09-21T22:27:29Z

@franz1981 once happy please merge away :)

franz1981 · 2022-09-26T14:30:38Z

Run micros to check if the new changes introduce some slowdown if compared to the original code, it seems not (that's a bit weird TBH); still investigating why

franz1981 · 2022-09-26T19:51:12Z

I've modified the benchmark to add the non-duplex use case too, to evaluate the impact of the changes in a pipeline without duplex handlers and

this PR:

Benchmark                                                            (duplex)   Mode  Cnt         Score         Error  Units
DefaultChannelPipelineDuplexHandlerBenchmark.parallelPropagateEvent      true  thrpt   10  49167774.688 ± 9011369.821  ops/s
DefaultChannelPipelineDuplexHandlerBenchmark.parallelPropagateEvent     false  thrpt   10  15178977.152 ±  177359.042  ops/s
DefaultChannelPipelineDuplexHandlerBenchmark.propagateEvent              true  thrpt   10  12486509.977 ± 1266693.174  ops/s
DefaultChannelPipelineDuplexHandlerBenchmark.propagateEvent             false  thrpt   10   6821677.651 ±   35249.835  ops/s

while on 4.1:

Benchmark                                                            (duplex)   Mode  Cnt         Score        Error  Units
DefaultChannelPipelineDuplexHandlerBenchmark.parallelPropagateEvent      true  thrpt   10  11350799.728 ± 340965.400  ops/s
DefaultChannelPipelineDuplexHandlerBenchmark.parallelPropagateEvent     false  thrpt   10  15069688.346 ± 817875.159  ops/s
DefaultChannelPipelineDuplexHandlerBenchmark.propagateEvent              true  thrpt   10   5239643.540 ± 107509.862  ops/s
DefaultChannelPipelineDuplexHandlerBenchmark.propagateEvent             false  thrpt   10   6989526.337 ± 468902.190  ops/s

duplex = true case is getting the expected scalability improvement, as shown earlier, but I see something strange (not a regression, actually, but a weird behaviour) for the duplex = false one instead. Investigating

franz1981 · 2022-09-26T20:10:18Z

@normanmaurer @chrisvest I'm happy I've parked this till now: by accident (adding the duplex = false benchmark case)
i've discovered probably another scalability issue (!):

1 thread:

=================================================
    Global Shared Cache Line Event Information   
=================================================
  Total Shared Cache Lines          :         14
  Load HITs on shared lines         :        213
  Fill Buffer Hits on shared lines  :         77
  L1D hits on shared lines          :         83
  L2D hits on shared lines          :          0
  LLC hits on shared lines          :         53
  Locked Access on shared lines     :         88
  Blocked Access on shared lines    :          0
  Store HITs on shared lines        :         25
  Store L1D hits on shared lines    :         25
  Store No available memory level   :          0
  Total Merged records              :         68

4 threads:

=================================================
  
    Global Shared Cache Line Event Information   
=================================================
  Total Shared Cache Lines          :         27
  Load HITs on shared lines         :      93735
  Fill Buffer Hits on shared lines  :      62446
  L1D hits on shared lines          :      19229
  L2D hits on shared lines          :          1
  LLC hits on shared lines          :      12059
  Locked Access on shared lines     :         77
  Blocked Access on shared lines    :          0
  Store HITs on shared lines        :      10304
  Store L1D hits on shared lines    :       9087
  Store No available memory level   :          0
  Total Merged records              :      20216

And this is not using any duplex handler.
I hope this to be a benchmark only issue, investigating...will update the PR

franz1981 · 2022-09-26T21:04:23Z

Argh :( found it, @doom369 was right

    final class HeadContext extends AbstractChannelHandlerContext
            implements ChannelOutboundHandler, ChannelInboundHandler {

and indeed my fix isn't able to cover with this yet, see myself in this same PR

Sadly, if users manually implements both ChannelOutboundHandler and ChannelInboundHandler without extending ChannelDuplexHandler the fix won't be enough.

usually the JIT is capable of peeling off the call-sites but sometime, nope - as this benchmark :(
Probably worth doing something for this: will add a commit tomorrow to address it

franz1981 · 2022-09-26T21:37:42Z

New results:

Benchmark                                                            (duplex)   Mode  Cnt         Score         Error  Units
DefaultChannelPipelineDuplexHandlerBenchmark.parallelPropagateEvent      true  thrpt   10  60777303.572 ± 3081644.927  ops/s
DefaultChannelPipelineDuplexHandlerBenchmark.parallelPropagateEvent     false  thrpt   10  45924754.944 ±  254106.937  ops/s
DefaultChannelPipelineDuplexHandlerBenchmark.propagateEvent              true  thrpt   10  13987173.667 ±   74434.157  ops/s
DefaultChannelPipelineDuplexHandlerBenchmark.propagateEvent             false  thrpt   10  11225378.518 ±  191302.919  ops/s

to be compared vs #12806 (comment).
Now the scalability is correctly preserved in both cases (duplex and not duplex, addressing separately the special head case). Not reporting perfc2c output for brevity but now there are no longer weird shared cache event multipliers.

note:
reason why duplex = false is slower is because of the number of inbound handler (> 2) that makes calls to become megamorphic (the ones not inlined) and using itable_stub to dispatch to the right concrete class.

franz1981 · 2022-09-27T09:13:12Z

@normanmaurer I see that despite this fix solve most of the problems, it doesn't solve all of them, see

and

...  HttpClientCodecWrapper implements ChannelInboundHandler, ChannelOutboundHandler

These are handlers (and I hope the list is completed) that won't be fixed by this PR and that can have some adverse scalability effect that none noticed yet; some of these are quite used too
eg SslHandler

We have then an additional way to solve this:

HeadContext check has still to happen because it doesn't extend any concrete type to be used to peel it off and save checkcast with interfaces to happen
use the adapters eg ChannelOutboundHandlerAdapter and ChannelInboundHandlerAdapter to peel-off the call-sites instead of ChannelDuplexHandler - (note: ChannelDuplexHandler extends ChannelInboundHandlerAdapter implements ChannelOutboundHandler meaning that it will still use the secondary cache field on Klass just for ChannelOutboundHandler but not for ChannelInboundHandler - decent, but not ideal)

Just accept that these specific handlers are not good and fix them with some hack
ie extends duplex and add @Skip to the methods we don't want to propagate events?

I'm opened to suggestion here @normanmaurer

normanmaurer · 2022-09-27T09:14:40Z

@franz1981 I think we should merge this one first and then investigate the others.

franz1981 · 2022-09-27T09:16:24Z

I'm still running tests with type pollution on this PR, but feel free to merge without hurrying the release, so I can still add other fixes

franz1981 · 2022-09-27T09:16:52Z

@normanmaurer I would like to create an issue to track all this effort; it's useful for JDK issues as well :P

normanmaurer · 2022-09-27T09:19:38Z

@franz1981 thanks a lot for the amazing work!

…tions Motivation: ChannelDuplexHandler can implements both ChannelOutboundHandler and ChannelInboundHandler causing a scalability issue due to checkcast due to https://bugs.openjdk.org/browse/JDK-8180450 Not only: there are different classes eg Http2ConnectionHandler, which implement them transitively, by using one of the 2 existing adapters (ChannelInboundAdapter, ChanneOutboundAdapters). The existing change at netty#12806 was fixing only the duplex cases, but others like the above one was still affected. Modifications: Replace the duplex type checks with broader inbound adapter ones, given that duplex is still based on it. Add outbound adapters type checks in addition to duplex ones. Result: More scalable adapters-based channel handler operations.

…tions (#13741) Motivation: ChannelDuplexHandler can implements both ChannelOutboundHandler and ChannelInboundHandler causing a scalability issue due to checkcast due to https://bugs.openjdk.org/browse/JDK-8180450 Not only: there are different classes eg Http2ConnectionHandler, which implement them transitively, by using one of the 2 existing adapters (ChannelInboundAdapter, ChanneOutboundAdapters). The existing change at #12806 was fixing only the duplex cases, but others like the above one was still affected. Modifications: Replace the duplex type checks with broader inbound adapter ones, given that duplex is still based on it. Add outbound adapters type checks in addition to duplex ones. Result: More scalable adapters-based channel handler operations.

franz1981 added the improvement label Sep 15, 2022

franz1981 requested review from chrisvest and normanmaurer September 15, 2022 14:08

franz1981 marked this pull request as draft September 15, 2022 14:08

doom369 reviewed Sep 15, 2022

View reviewed changes

transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java Outdated Show resolved Hide resolved

normanmaurer requested changes Sep 16, 2022

View reviewed changes

transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java Outdated Show resolved Hide resolved

franz1981 mentioned this pull request Sep 16, 2022

Major performance bottleneck in scala.collection.mutable.Builder scala/bug#9823

Closed

franz1981 marked this pull request as ready for review September 16, 2022 13:15

Improved style and comments

747e470

franz1981 force-pushed the 4.1_scalability_checkcast branch from 4d27222 to 747e470 Compare September 16, 2022 13:21

franz1981 marked this pull request as draft September 16, 2022 14:19

hyperxpro reviewed Sep 17, 2022

View reviewed changes

transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java Outdated Show resolved Hide resolved

normanmaurer approved these changes Sep 18, 2022

View reviewed changes

Micro-benchmark

8e0185c

franz1981 force-pushed the 4.1_scalability_checkcast branch from 5062dd6 to 8e0185c Compare September 20, 2022 23:06

franz1981 mentioned this pull request Sep 21, 2022

Reduce instanceof/checkcast usage vs interfaces to prevent scalability issues kroxylicious/kroxylicious#79

Closed

chrisvest approved these changes Sep 21, 2022

View reviewed changes

Adding duplex/non-duplex

e67fb4b

franz1981 force-pushed the 4.1_scalability_checkcast branch from 47a534c to e67fb4b Compare September 26, 2022 19:56

Addressing scalability of HeadContext

3edcb9d

normanmaurer approved these changes Sep 27, 2022

View reviewed changes

franz1981 marked this pull request as ready for review September 27, 2022 09:15

normanmaurer merged commit b31e7f2 into netty:4.1 Sep 27, 2022

andreasdc mentioned this pull request Oct 13, 2022

Low performance #12239

Closed

franz1981 mentioned this pull request Nov 8, 2022

Type cache pollution: type io.vertx.pgclient.impl.codec.PgCodec eclipse-vertx/vertx-sql-client#1253

Closed

dstepanov mentioned this pull request Dec 12, 2022

Route processing refactor micronaut-projects/micronaut-core#8463

Merged

franz1981 mentioned this pull request Dec 19, 2023

Type Pollution Scalability issue over Vertx/Netty HTTP 2 pipeline traversal eclipse-vertx/vert.x#5039

Open

franz1981 mentioned this pull request Dec 19, 2023

Redo fix scalability issue due to checkcast on context's invoke operations #13741

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix scalability issue due to checkcast on context's invoke operations #12806

Fix scalability issue due to checkcast on context's invoke operations #12806

franz1981 commented Sep 15, 2022

franz1981 commented Sep 15, 2022 •

edited

franz1981 commented Sep 15, 2022

franz1981 commented Sep 15, 2022

franz1981 commented Sep 15, 2022 •

edited

normanmaurer commented Sep 15, 2022

doom369 commented Sep 15, 2022

franz1981 commented Sep 15, 2022 •

edited

chrisvest commented Sep 15, 2022

franz1981 commented Sep 16, 2022

franz1981 commented Sep 16, 2022

chrisvest commented Sep 16, 2022

normanmaurer commented Sep 18, 2022

franz1981 commented Sep 18, 2022

normanmaurer commented Sep 18, 2022 via email •

edited

franz1981 commented Sep 20, 2022 •

edited

franz1981 commented Sep 21, 2022

normanmaurer commented Sep 21, 2022

franz1981 commented Sep 26, 2022

franz1981 commented Sep 26, 2022 •

edited

franz1981 commented Sep 26, 2022 •

edited

franz1981 commented Sep 26, 2022 •

edited

franz1981 commented Sep 26, 2022 •

edited

franz1981 commented Sep 27, 2022

normanmaurer commented Sep 27, 2022

franz1981 commented Sep 27, 2022

franz1981 commented Sep 27, 2022

normanmaurer commented Sep 27, 2022

Fix scalability issue due to checkcast on context's invoke operations #12806

Fix scalability issue due to checkcast on context's invoke operations #12806

Conversation

franz1981 commented Sep 15, 2022

franz1981 commented Sep 15, 2022 • edited

franz1981 commented Sep 15, 2022

franz1981 commented Sep 15, 2022

franz1981 commented Sep 15, 2022 • edited

normanmaurer commented Sep 15, 2022

doom369 commented Sep 15, 2022

franz1981 commented Sep 15, 2022 • edited

chrisvest commented Sep 15, 2022

franz1981 commented Sep 16, 2022

franz1981 commented Sep 16, 2022

chrisvest commented Sep 16, 2022

normanmaurer commented Sep 18, 2022

franz1981 commented Sep 18, 2022

normanmaurer commented Sep 18, 2022 via email • edited

franz1981 commented Sep 20, 2022 • edited

franz1981 commented Sep 21, 2022

normanmaurer commented Sep 21, 2022

franz1981 commented Sep 26, 2022

franz1981 commented Sep 26, 2022 • edited

franz1981 commented Sep 26, 2022 • edited

franz1981 commented Sep 26, 2022 • edited

franz1981 commented Sep 26, 2022 • edited

franz1981 commented Sep 27, 2022

normanmaurer commented Sep 27, 2022

franz1981 commented Sep 27, 2022

franz1981 commented Sep 27, 2022

normanmaurer commented Sep 27, 2022

franz1981 commented Sep 15, 2022 •

edited

franz1981 commented Sep 15, 2022 •

edited

franz1981 commented Sep 15, 2022 •

edited

normanmaurer commented Sep 18, 2022 via email •

edited

franz1981 commented Sep 20, 2022 •

edited

franz1981 commented Sep 26, 2022 •

edited

franz1981 commented Sep 26, 2022 •

edited

franz1981 commented Sep 26, 2022 •

edited

franz1981 commented Sep 26, 2022 •

edited