High CPU usage of method handle invocations in Jetty 10 #6328

SerCeMan · 2021-05-27T04:24:41Z

Jetty version

10.0.3

Java version/vendor (use: java -version)

openjdk version "13.0.4" 2020-07-14
OpenJDK Runtime Environment Zulu13.33+25-CA (build 13.0.4+8-MTS)
OpenJDK 64-Bit Server VM Zulu13.33+25-CA (build 13.0.4+8-MTS, mixed mode, sharing)

OS type/version

Ubuntu 18.04.5 LTS

Description

Hi, Jetty maintainers!

We've recently attempted a migration from Jetty 9 to Jetty 10 and we've noticed a regression related to WebSockets. According to our metrics, there seems to be a memory leak which I'm still currently investigating and I hope to provide more information soon. However, it also seems that Jetty 10 spends a large amount of CPU resolving lambda forms to perform method handle invocations inside. On the flame graphs, we saw a large number of lambdaforms - every tiny green tower on the flame graph is a separate instance of LabmdaForm. My assumption which I'm currently investigating and hope to provide more data soon is that the large number of labmdaforms filled up the java heap.

CPU flame graph:

Allocation flame graph:

I was wondering if you've observed this behaviour before or might know what it could be caused by. I'm also currently investigating the issue and will provide more info once I have it. Thanks!

The text was updated successfully, but these errors were encountered:

sbordet · 2021-05-31T09:56:14Z

@SerCeMan we do see the tiny green towers in a quick load test that we have written to try to reproduce this issue.

We would like to know:

Are you using annotated WebSocket endpoints, or you implement WebSocketListener?
I'm assuming your messages all have different sizes, or is your flamegraph the result of some load test with fixed size messages?
While we do see the tiny green towers, they account for less than 1% of CPU time. What is your figure of CPU time?

From our point of view, we create one MethodHandle to call your WebSocket endpoint -- nothing fancy.
However, at runtime invoking this MethodHandle apparently creates a different lambda form for every invocation (not sure about this though), which causes the many tiny green towers, and possibly fill up the MetaSpace with lambda forms.
If that is the case, it seems like a problem in the implementation of MethodHandles, and as such an OpenJDK bug.

Let us know the result of your findings. Meanwhile we will investigate as well.

lachlan-roberts · 2021-06-02T12:13:30Z

@SerCeMan @sbordet I have done some testing and understand a bit more why this is happening.

For each WebSocket endpoint we get MethodHandles for the relevant methods onOpen, onMessage etc. Then for each new WebSocket connection we bind these MethodHandles to things like the endpoint and session instances. This creates new MethodHandles for each connection which shows up as a different method call on the flamegraph. I don't think this is a performance issue, just an issue on how it is displayed on the flamegraph.
See https://github.com/lachlan-roberts/MethodHandlesExample/blob/master/src/test/java/test/MethodHandleExample.java

Although I did not see much time spent in the Invokers.checkCustomized() branch.
@SerCeMan what is the signature of the onMessage() method you are using for your WebSocket endpoint?

sbordet · 2021-06-02T22:43:33Z

Then for each new WebSocket connection we bind these MethodHandles to things like the endpoint and session instances.

Do you need to do this? I am referring to the binding, because if we have 1M connections we would have 1M different MethodHandles, while I'm assuming that if you don't bind, we would only have 1 MethodHandle, no?

lachlan-roberts · 2021-06-03T00:13:35Z

Do you need to do this? I am referring to the binding, because if we have 1M connections we would have 1M different MethodHandles, while I'm assuming that if you don't bind, we would only have 1 MethodHandle, no?

@sbordet We don't need to bind the MethodHandle, you can invoke the original MethodHandle providing all the arguments every time. We would just need to remember the endpoint and session everywhere we invoke the MethodHandle.

In the project I linked there is a benchmark comparing the bind to only using 1 MethodHandle and providing all arguments each time. Interestingly the benchmark result showed the case where we use methodHandle.bindTo(endpoint) as about 10 times faster. So we might not want to change the code to use only 1 MethodHandle if it is going to impact performance.

gregw · 2021-06-03T01:21:45Z

Perhaps we should raise an issue on whatever software is producing the flame graph. Binding to a method handle is a normal thing to do and as @lachlan-roberts benchmark show, it is the right thing to do.
The tools should be smart enough to know that bound MethodHandles are the same class, differing only by data.

gregw · 2021-06-03T01:23:03Z

@lachlan-roberts can you paste the benchmark report into a comment on this issue.

SerCeMan · 2021-06-03T02:30:08Z

Hey, folks! Sorry for the delayed response. I'll try to prepare answers later today. I'm still trying to reproduce the issue in the test environment - no luck yet, but I suspect that it might be related to Shenandoah GC that we're using. I'm still working on a test case that can reproduce it.

lachlan-roberts · 2021-06-03T05:14:40Z

Benchmark Results:

Benchmark                        (STRATEGY)   Mode  Cnt     Score      Error  Units
MethodHandlesBenchmark.test    BOUND_INVOKE  thrpt    3  8055.479 ± 3022.922  ops/s
MethodHandlesBenchmark.test  UNBOUND_INVOKE  thrpt    3   667.597 ± 3597.695  ops/s

SerCeMan · 2021-06-07T08:54:46Z

Sorry for the delay. To answer the questions above,

Are you using annotated WebSocket endpoints, or you implement WebSocketListener?

We implement WebSocketListener.

I'm assuming your messages all have different sizes, or is your flamegraph the result of some load test with fixed size messages?

Yes, messages have a wide range of sizes from a few bytes up to 5 Mb.

While we do see the tiny green towers, they account for less than 1% of CPU time. What is your figure of CPU time?

Because most of the towers are different lambda forms, their CPU consumption varies. However, the part that was unexpected was the fact Invokers.checkCustomized is responsible for 30% of the total CPU spent in message processing.

I'll get back to you once I'm able to reproduce the issue in some test environment.

SerCeMan · 2021-06-07T09:51:22Z

Hey, @lachlan-roberts! Regarding the benchmarks, please correct me if I'm wrong but it seems that it's possible to make the methodhandle stored in a final variable which makes the results equal considering error, lachlan-roberts/MethodHandlesExample#1.

Benchmark                        (STRATEGY)   Mode  Cnt     Score      Error  Units
MethodHandlesBenchmark.test    BOUND_INVOKE  thrpt    3  5909.462 ± 2128.965  ops/s
MethodHandlesBenchmark.test  UNBOUND_INVOKE  thrpt    3  6540.706 ± 2714.425  ops/s

lachlan-roberts · 2021-06-09T07:26:24Z

I can confirm that making the MethodHandle a final variable equalizes the performance difference that I was seeing in the benchmarks.

joakime · 2021-11-02T16:39:52Z

@lachlan-roberts is this still reproducible on current Jetty 10.0.x and/or 11.0.x HEAD?

SerCeMan · 2021-11-02T23:59:08Z

I haven't been able to reproduce it in a synthetic environment, stumbled across #6696, and now that 10.0.7, I'm planning to try to upgrade again. Because I can't reproduce it in a synthetic environment, I can close the issue and re-open it with additional information if it manifests again.

lachlan-roberts · 2021-11-03T00:11:53Z

The separate spikes on the flamegraph are reproducible, but I think it is really just an interaction between the profiler and MethodHandles and is unlikely to be causing any performance degradation.

I am still planning to do a PR to use only final unbound MethodHandles to see if it is any better, but I haven't gotten around to doing it yet. So I would leave this issue open for this reason.

There have been a number of PRs to improve performance in Jetty 10 since 10.0.3, so if you update it may be that you no longer experience this performance regression. For example PR #6635 was designed to reduce allocation of buffers for whole message aggregation and also reduce the amount of data copies.

SerCeMan · 2022-03-08T01:55:06Z

Hey, @lachlan-roberts and the team! Would you accept a PR that replaces a large number of method handles with a single one considering that the benchmarks above show no negative performance impact? After attempting to upgrade to 10.0.8, we still see a large amount of CPU time spent resolving method handles.

lachlan-roberts · 2022-03-08T14:30:25Z

@SerCeMan I think this could be difficult to implement, even more so if you are not already familiar with the Jetty WebSocket implementation. I will not have time to attempt this for a few weeks.

Can you attach the full flamegraph file instead of just the screenshot? Also if you have some reproducer code which can reproduce this checkCustomized branch, it would be good to see that as well.

Signed-off-by: Lachlan Roberts <lachlan@webtide.com>

… algorithm Signed-off-by: Lachlan Roberts <lachlan@webtide.com>

Signed-off-by: Lachlan Roberts <lachlan@webtide.com>

github-actions · 2023-03-09T00:00:21Z

This issue has been automatically marked as stale because it has been a
full year without activity. It will be closed if no further activity occurs.
Thank you for your contributions.

gregw · 2023-07-05T08:08:26Z

@SerCeMan What is the current status of this issue for you? Are you still seeing high CPU? Our PR to address this has significant performance impact, so it was never merged.

SerCeMan · 2023-07-05T09:15:10Z

Thanks, @gregw! Apologies for not providing an update sooner. The issue related to the CPU usage linked to resolving method handles (in the yellow part of the flame graph) seemed to be related to a specific combination of GC settings (ShenandoahGC) and the JVM version we were using at the time (version 13). A series of JDK upgrades, although I'm unsure of the exact version, likely the transition to version 17, managed to resolve it.

There is still an issue with not being able to use async-profiler, but it's more of a nice-to-have observability featuer considering that it is theoretically possible to employ the async-profiler with some extra pre/post stack processing.

gregw · 2023-07-05T09:27:13Z

@lachlan-roberts So it seams this is not needed so much now....
However, it is still non-optimal that the AsyncProfiler makes many man towers instead of one. Did you get any joy raising this with the flamegraph folks? Is there some post processing that can be applied to merge those peaks?

lachlan-roberts · 2023-07-14T05:00:22Z

@gregw we did open an issue with them at some stage but they said it was the expected behaviour with method handles and flagged it as not an issue.

gregw assigned lachlan-roberts May 27, 2021

gregw added the High Priority label May 28, 2021

SerCeMan mentioned this issue Sep 5, 2021

High WebSocket memory usage in Jetty 10 #6696

Closed

lachlan-roberts removed the High Priority label Nov 3, 2021

lachlan-roberts mentioned this issue Jun 1, 2022

Issue #6328 - avoid binding WebSocket MethodHandles #8087

Closed

lachlan-roberts added a commit that referenced this issue Jun 3, 2022

Issue #6328 - renaming and javadoc from review

49adfe5

Signed-off-by: Lachlan Roberts <lachlan@webtide.com>

lachlan-roberts added a commit that referenced this issue Jun 6, 2022

Issue #6328 - changes from review

1dd20fd

Signed-off-by: Lachlan Roberts <lachlan@webtide.com>

lachlan-roberts added a commit that referenced this issue Jun 6, 2022

Issue #6328 - changes NonBindingMethodHolder impl to use lookup table…

9c3fdd3

… algorithm Signed-off-by: Lachlan Roberts <lachlan@webtide.com>

lachlan-roberts added a commit that referenced this issue Jun 8, 2022

Issue #6328 - changes from review

72ba95d

Signed-off-by: Lachlan Roberts <lachlan@webtide.com>

goekay mentioned this issue Feb 11, 2023

Duplicate of 812. Memory increase after upgrade to 3.4.9 steve-community/steve#846

Closed

github-actions bot added the Stale For auto-closed stale issues and pull requests label Mar 9, 2023

sbordet removed the Stale For auto-closed stale issues and pull requests label Mar 13, 2023

lachlan-roberts mentioned this issue Oct 18, 2023

Issue #6328 - avoid binding WebSocket MethodHandles #10750

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU usage of method handle invocations in Jetty 10 #6328

High CPU usage of method handle invocations in Jetty 10 #6328

SerCeMan commented May 27, 2021

sbordet commented May 31, 2021

lachlan-roberts commented Jun 2, 2021

sbordet commented Jun 2, 2021

lachlan-roberts commented Jun 3, 2021

gregw commented Jun 3, 2021

gregw commented Jun 3, 2021

SerCeMan commented Jun 3, 2021

lachlan-roberts commented Jun 3, 2021

SerCeMan commented Jun 7, 2021

SerCeMan commented Jun 7, 2021 •

edited

lachlan-roberts commented Jun 9, 2021

joakime commented Nov 2, 2021

SerCeMan commented Nov 2, 2021 •

edited

lachlan-roberts commented Nov 3, 2021

SerCeMan commented Mar 8, 2022

lachlan-roberts commented Mar 8, 2022

github-actions bot commented Mar 9, 2023

gregw commented Jul 5, 2023

SerCeMan commented Jul 5, 2023

gregw commented Jul 5, 2023

lachlan-roberts commented Jul 14, 2023

High CPU usage of method handle invocations in Jetty 10 #6328

High CPU usage of method handle invocations in Jetty 10 #6328

Comments

SerCeMan commented May 27, 2021

sbordet commented May 31, 2021

lachlan-roberts commented Jun 2, 2021

sbordet commented Jun 2, 2021

lachlan-roberts commented Jun 3, 2021

gregw commented Jun 3, 2021

gregw commented Jun 3, 2021

SerCeMan commented Jun 3, 2021

lachlan-roberts commented Jun 3, 2021

SerCeMan commented Jun 7, 2021

SerCeMan commented Jun 7, 2021 • edited

lachlan-roberts commented Jun 9, 2021

joakime commented Nov 2, 2021

SerCeMan commented Nov 2, 2021 • edited

lachlan-roberts commented Nov 3, 2021

SerCeMan commented Mar 8, 2022

lachlan-roberts commented Mar 8, 2022

github-actions bot commented Mar 9, 2023

gregw commented Jul 5, 2023

SerCeMan commented Jul 5, 2023

gregw commented Jul 5, 2023

lachlan-roberts commented Jul 14, 2023

SerCeMan commented Jun 7, 2021 •

edited

SerCeMan commented Nov 2, 2021 •

edited