New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate suboptimal score on TechEmpower benchmarks #29421
Comments
Addendum 1: There aren't exactly many benchmarks based on Tomcat (the default for SB apps), but comparing the JSON serialization with e.g. a vanilla undertow reveals that there seems to be a quite a lot of framework overhead. And it's not Jackson (because that is used in both |
@dreis2211 Same, but as I am myself working on some benchmarks to compare Boot 3 JVM and native apps, we can maybe use this opportunity to compare:
Let's maybe start by focusing on the https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Java/spring. Do you have the bandwidth to create a branch with a Spring Boot |
I will have some time next week or earliest this weekend to do some preliminary tests, I guess. |
Fun fact, apparently the |
So, I had a bit of time tonight: Local specs: NOTE: The results should be taken with a grain of salt. After all, local benchmarks are not really isolated and might be influenced by things running on my machine. Spring-Boot 2.6.9 (Undertow): Spring-Boot 3.0.0-SNAPSHOT (Undertow): Spring-Boot 3.0.0-SNAPSHOT (Undertow) (with Spring-Boot 3.0.0-SNAPSHOT (Tomcat) (with Spring-Boot 2.3.1 (Tomcat) (Round 20): I'm sort of focussing on the Plaintext & JSON ones here as the others are somewhat volatile and due to the database involvement include some more flakyness not under any frameworks control. What I found was that there seems to be a substantial overhead if the application needs to determine the producible media types. E.g. the changes that moved from 2.3.1 & 2.6.9 include the Lines 385 to 395 in b72ee5f
Funny enough I came across this method already when playing around with the type pollution agent from @franz1981 / RedHat, which yields the following output:
If true, this would maybe indicate a scaling/performance problem caused by https://bugs.openjdk.org/browse/JDK-8180450 . Maybe it's not that, but just the fact, that it iterates over 8 message converters for each request to get their producible media types. I haven't really had the time tonight to look deeper into this. In terms of Tomcat vs Undertow, there seems to be a huge difference for plain text responses (e.g. Tomcat only reaching half of what Undertow achieves). Cheers, |
Interesting, better to focus initially on JSON and plaintext indeed. It would be interesting to do for example a new round of JSON test with only I can also provide a branch that removes this |
I think E.g. if I force the content-type to be preselected (a.k.a skipping even more logic than just the producible media types) via
it yields the following numbers: Spring-Boot 2.6.9 (Undertow): Remember - the status quo of that was: |
@dreis2211 Pretty good results on skipping |
is it worth testing with jetty ? |
@GeorgeSalu Generally, I wouldn't worry too much about Jetty or Undertow. Tomcat is usually the one that falls behind a bit. But I've compared it just for you - with latest snapshots, bullseye and enforced content-type (throughout all benchmark modes and thus having a better weighted score overall): From personal experience, the margin between them is not as huge as one might read from this results. I'm in fact a little bit surprised about the numbers generally. I have done some memory improvements that don't explain these results, though. I need to take a deeper look what changed in the snapshots in the meanwhile. What has changed is that my Macbook runs on Ventura now. Maybe there have been major improvements to the host OS that generally boost the numbers. Don't forget there is a bit of flakyness involved doing these on a local setup. Anyhow: Back to your question. Undertow usually wins, also in internal loadtests that we do. There is also a dated benchmark comparison over at https://www.baeldung.com/spring-boot-servlet-containers that hasn't drastically changed in my opinion over recent years. Cheers, |
As you may have seen, more benchmarks on #29438 did not achieve to reproduce in a reliable way the gains I initially saw, I think it would maybe more visible with realistic workloads involving different converters. But this kind of No promise yet, but https://bugs.openjdk.org/browse/JDK-8180450 may be backported to Java 17, so at least with this version we should be able to avoid this painful @dreis2211 Except this |
I've contributed some low hanging fruits already in #29428 & #29412 As noted earlier that the media-type/content-type dance is quite a big chunk - regardless of the I think with the update to 3.x plus bullseye in the framework benchmarks, we should be able to achieve already quite some uplift. I think I will contribute this in the next days. And while being at it, probably also setting the content-type to avoid the costly computation. |
Great, thanks a lot for your various contributions. Could be great if TechEmpower benchmark could be updated with Spring Boot 3.0.x, explicit content type and compare with the previous figures. Do you plan to do it? @rstoyanchev I am wondering if you could have a look on if you see some reasonable way to skip some content type resolution part for more common use cases, if we should drop a note in our reference documentation to advise configuring explicitly the content type for better performances, etc. |
Yeah, I will likely do that somewhen this week. Fill do another comparison then as well on my local machine. But this week is busy, so might be something for the weekend. |
No rush, thanks for your help on this, much appreciated. |
There you go: TechEmpower/FrameworkBenchmarks#7749 Compared to previous results, I updated my Mac, Docker etc.. The results were already better & worse before, but I think the head-to-head comparison from this evening is already a good step into the right direction |
We evaluate conditions in That said, the None of this would impact the benchmark which now sets the response content-type directly, but there might be some new hotspots to be optimized in its present form, perhaps in |
Thanks for creating this follow-up issue @rstoyanchev. @dreis2211 I close this issue as I think we have reached a reasonable state, feel free to open more focused one if you identify some interesting optimizations. I will on my side make sure that a fix for https://bugs.openjdk.org/browse/JDK-8180450 is available in Java 17 with our partner Bellsoft. |
Same here - we improved a few things along the road. Great stuff. Thanks 🙏 |
Maybe check the flux versions too, do they use the same infrastructure for media type? |
Functional web APIs for both Spring MVC and WebFlux skip the infrastructure for media type by design, so that's maybe this that should be tried. I would expected annotation-based WebFlux controllers to use a similar kind of media type infrastructure. |
It will be great to add benchmark with virtual thread based executor, by the way. |
More an FYI. Round 22 results have been published. The screenshot below shows the composite scores ( with fullstack + java filter applied). Quarkus ranks at But more important than the rank the plaintext & json benchmark shows that Quarkus has still a throughput ~4 times "better" than Spring. Nonetheless still some good news. The absolute results are better than last time apparently, despite the same machine. So the few things we've done here seem to have had at least a positive impact. |
From what I've seen in a plain text use case (not techempower benchmark, but can be the same), Which does unnecessary buffer allocation (8kb) inside an Maybe it can be avoided, so instead of: Writer writer = new OutputStreamWriter(out, charset);
writer.write(in);
writer.flush(); write to the stream directly. out.write(in.getBytes(charset));
out.flush(); Here's approximate difference in time
|
I plan to give another look to the benchmark. |
Hi 👋
I'm not a particular huge fan of the TechEmpower benchmarks, because they barely reflect real use-cases, but (novice) programmers who maybe look for some resources that might help them with their questions of what framework to choose for their next projects might look into these benchmarks and make their decision (partly) based on these numbers.
I've compared Quarkus and Spring(-Boot) here to compare two popular frameworks that have similar goals and deliver similar functionality. There is also a comparison over at Baeldung https://www.baeldung.com/spring-boot-vs-quarkus that compares these two.
Imho, this particular comparison isn't exactly great news for the Spring ecosystem. While there are some benchmarks on par with Quarkus, the majority is "won" by Quarkus - and by a greater margin one would expect. I've recently overheard a conversation why one should use Spring if Quarkus delivers everything as well, but with better performance. Now, we all know that there is more to this story and application A might be better suited for Quarkus while application B might be better suited for Spring - for whatever reasons. There is no silver bullet, after all. But this conversation kept me thinking. Obviously, I'm biased towards Spring and I love the ecosystem, but I'm wondering if we can do something about the performance gap (in these benchmarks).
I know for example that recent changes to the Quarkus benchmarks have been done by people involved with Quarkus. The Spring benchmarks are somewhat driven infrequently by community members (I think I have done something on these ages ago myself as well).
I'm also happy to help with this and at least update the Spring(-Boot) one to latest versions but I wonder if you have comparable machines at hand that you could run these benchmarks against and find out why these numbers are so comparably "bad". (I do have the suspicion that the numbers are actually much closer in reality). I'm also happy to help with the investigations, if you want me to 😉
Cheers,
Christoph
P.S.: The point of this should be not to overtake Quarkus or establish a Spring vs Quarkus mentality, but to simply improve the Spring numbers. Quarkus is just a good comparison data point imho.
P.S.S: I'm reporting this here, because the majority of logic relevant to the benchmarks is inside Spring-Framework and not in Spring-Boot.
The text was updated successfully, but these errors were encountered: