Investigate suboptimal score on TechEmpower benchmarks #29421

dreis2211 · 2022-11-02T11:58:15Z

Hi 👋

I'm not a particular huge fan of the TechEmpower benchmarks, because they barely reflect real use-cases, but (novice) programmers who maybe look for some resources that might help them with their questions of what framework to choose for their next projects might look into these benchmarks and make their decision (partly) based on these numbers.

I've compared Quarkus and Spring(-Boot) here to compare two popular frameworks that have similar goals and deliver similar functionality. There is also a comparison over at Baeldung https://www.baeldung.com/spring-boot-vs-quarkus that compares these two.

Imho, this particular comparison isn't exactly great news for the Spring ecosystem. While there are some benchmarks on par with Quarkus, the majority is "won" by Quarkus - and by a greater margin one would expect. I've recently overheard a conversation why one should use Spring if Quarkus delivers everything as well, but with better performance. Now, we all know that there is more to this story and application A might be better suited for Quarkus while application B might be better suited for Spring - for whatever reasons. There is no silver bullet, after all. But this conversation kept me thinking. Obviously, I'm biased towards Spring and I love the ecosystem, but I'm wondering if we can do something about the performance gap (in these benchmarks).

I know for example that recent changes to the Quarkus benchmarks have been done by people involved with Quarkus. The Spring benchmarks are somewhat driven infrequently by community members (I think I have done something on these ages ago myself as well).

I'm also happy to help with this and at least update the Spring(-Boot) one to latest versions but I wonder if you have comparable machines at hand that you could run these benchmarks against and find out why these numbers are so comparably "bad". (I do have the suspicion that the numbers are actually much closer in reality). I'm also happy to help with the investigations, if you want me to 😉

Cheers,
Christoph

P.S.: The point of this should be not to overtake Quarkus or establish a Spring vs Quarkus mentality, but to simply improve the Spring numbers. Quarkus is just a good comparison data point imho.

P.S.S: I'm reporting this here, because the majority of logic relevant to the benchmarks is inside Spring-Framework and not in Spring-Boot.

dreis2211 · 2022-11-02T12:16:24Z

Addendum 1:

There aren't exactly many benchmarks based on Tomcat (the default for SB apps), but comparing the JSON serialization with e.g. a vanilla undertow reveals that there seems to be a quite a lot of framework overhead. And it's not Jackson (because that is used in both undertow and spring (haven't checked Quarkus))

sdeleuze · 2022-11-03T15:55:59Z

I'm not a particular huge fan of the TechEmpower benchmarks, because they barely reflect real use-cases

@dreis2211 Same, but as I am myself working on some benchmarks to compare Boot 3 JVM and native apps, we can maybe use this opportunity to compare:

How Spring Framework performs in a default Spring Boot 3 application versus Spring Boot 2 and see what is the evolution if any
Compare Undertow versus Tomcat on recent versions
Check if we can find low hanging fruits to optimize

Let's maybe start by focusing on the https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Java/spring. Do you have the bandwidth to create a branch with a Spring Boot 3.0.0-SNAPSHOT (we are close to GA so should be pretty stable) version of it, upgrading to Java 17, Jakarta EE, etc. and compare the result you have locally with the Spring Boot 2.6.9 version used currently? And to compare Undertow versus Tomcat on both?

dreis2211 · 2022-11-03T16:02:46Z

I will have some time next week or earliest this weekend to do some preliminary tests, I guess.

dreis2211 · 2022-11-03T17:00:38Z

Fun fact, apparently the spring benchmarks already run with Undertow and Round 21 doesn't include these changes yet.

dreis2211 · 2022-11-03T19:42:13Z

So, I had a bit of time tonight:

Local specs:
Model: MacBookPro:
Processor: 8-Core Intel Core i9 2,3 GHz
Memory: 32 GB

NOTE: The results should be taken with a grain of salt. After all, local benchmarks are not really isolated and might be influenced by things running on my machine.

Spring-Boot 2.6.9 (Undertow):

Spring-Boot 2.6.9 (Tomcat):

Spring-Boot 3.0.0-SNAPSHOT (Undertow):

Spring-Boot 3.0.0-SNAPSHOT (Undertow) (with produces on GetMapping):

Spring-Boot 3.0.0-SNAPSHOT (Tomcat) (with produces on GetMapping):

Spring-Boot 2.3.1 (Tomcat) (Round 20):

Quarkus:

I'm sort of focussing on the Plaintext & JSON ones here as the others are somewhat volatile and due to the database involvement include some more flakyness not under any frameworks control. What I found was that there seems to be a substantial overhead if the application needs to determine the producible media types. E.g. the changes that moved from 2.3.1 & 2.6.9 include the produces field on the GetMapping annotations for plaintext. As one can see the different runs for 3.0.0 also differ largely for JSON. And there the only change is again the manual produces on the endpoint. Maybe the following plays a role here:

spring-framework/spring-webmvc/src/main/java/org/springframework/web/servlet/mvc/method/annotation/AbstractMessageConverterMethodProcessor.java

Lines 385 to 395 in b72ee5f

    
           for (HttpMessageConverter<?> converter : this.messageConverters) { 
        
           	if (converter instanceof GenericHttpMessageConverter && targetType != null) { 
        
           		if (((GenericHttpMessageConverter<?>) converter).canWrite(targetType, valueClass, null)) { 
        
           			result.addAll(converter.getSupportedMediaTypes(valueClass)); 
        
           		} 
        
           	} 
        
           	else if (converter.canWrite(valueClass, null)) { 
        
           		result.addAll(converter.getSupportedMediaTypes(valueClass)); 
        
           	} 
        
           } 
        
           return (result.isEmpty() ? Collections.singletonList(MediaType.ALL) : new ArrayList<>(result));

.

Funny enough I came across this method already when playing around with the type pollution agent from @franz1981 / RedHat, which yields the following output:

--------------------------
Type Pollution Statistics:
--------------------------
1:      org.springframework.http.converter.json.MappingJackson2HttpMessageConverter
Count:  58366
Types:
        org.springframework.http.converter.HttpMessageConverter
        org.springframework.http.converter.GenericHttpMessageConverter
Traces:
        org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducibleMediaTypes(AbstractMessageConverterMethodProcessor.java:385)
                class: org.springframework.http.converter.HttpMessageConverter
                count: 29208
        org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducibleMediaTypes(AbstractMessageConverterMethodProcessor.java:386)
                class: org.springframework.http.converter.GenericHttpMessageConverter
                count: 29115
        org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducible
MediaTypes(AbstractMessageConverterMethodProcessor.java:387)
                class: org.springframework.http.converter.GenericHttpMessageConverter
                count: 43

If true, this would maybe indicate a scaling/performance problem caused by https://bugs.openjdk.org/browse/JDK-8180450 . Maybe it's not that, but just the fact, that it iterates over 8 message converters for each request to get their producible media types. I haven't really had the time tonight to look deeper into this.

In terms of Tomcat vs Undertow, there seems to be a huge difference for plain text responses (e.g. Tomcat only reaching half of what Undertow achieves).

Cheers,
Christoph

sdeleuze · 2022-11-03T22:09:50Z

Interesting, better to focus initially on JSON and plaintext indeed.

It would be interesting to do for example a new round of JSON test with only MappingJackson2HttpMessageConverter and remove the other default converters.

I can also provide a branch that removes this instanceof check, I will try to create that tomorrow and share it with you.

dreis2211 · 2022-11-03T22:35:43Z

I think AbstractMessageConverterMethodProcessor.writeWithMessageConverters is indeed something to look at.

E.g. if I force the content-type to be preselected (a.k.a skipping even more logic than just the producible media types) via

	@GetMapping(value = "/plaintext")
	String plaintext(HttpServletResponse response) {
		response.setContentType("text/plain");
		return "Hello, World!";
	}

	@GetMapping(value = "/json")
	Message json(HttpServletResponse response) {
		response.setContentType("application/json");
		return new Message("Hello, World!");
	}

it yields the following numbers:

Spring-Boot 2.6.9 (Undertow):

Remember - the status quo of that was:

dreis2211 · 2022-11-05T11:32:24Z

Another finding: the current tests run inside a Debian Buster image. With Debian Bullseye I can reach a substantially better score on both JSON and Plaintext with 3.0.0 and the forced Content-Type:

sdeleuze · 2022-11-06T13:48:45Z

@dreis2211 Pretty good results on skipping instanceof GenericHttpMessageConverter checks, see #29438. I was able to confirm the gains on 2 different benchmarks, but if you can check on your side as well, please share the results in the PR.

GeorgeSalu · 2022-11-11T16:29:20Z

is it worth testing with jetty ?

dreis2211 · 2022-11-13T10:44:33Z

@GeorgeSalu Generally, I wouldn't worry too much about Jetty or Undertow. Tomcat is usually the one that falls behind a bit.

But I've compared it just for you - with latest snapshots, bullseye and enforced content-type (throughout all benchmark modes and thus having a better weighted score overall):

Undertow

Jetty

From personal experience, the margin between them is not as huge as one might read from this results. I'm in fact a little bit surprised about the numbers generally. I have done some memory improvements that don't explain these results, though. I need to take a deeper look what changed in the snapshots in the meanwhile. What has changed is that my Macbook runs on Ventura now. Maybe there have been major improvements to the host OS that generally boost the numbers. Don't forget there is a bit of flakyness involved doing these on a local setup.

Anyhow: Back to your question. Undertow usually wins, also in internal loadtests that we do. There is also a dated benchmark comparison over at https://www.baeldung.com/spring-boot-servlet-containers that hasn't drastically changed in my opinion over recent years.

Cheers,
Christoph

dreis2211 · 2022-11-13T12:00:48Z

I've run a new round of benchmarks to have comparable results between Spring-Boot 3.0.0 RC1 and RC2

RC1

RC2

There is an upgrade from Undertow 2.2.x to 2.3.x - I wouldn't be surprised if this brings some improvements. Next to several tiny improvements in Spring itself.

sdeleuze · 2022-11-28T10:51:59Z

As you may have seen, more benchmarks on #29438 did not achieve to reproduce in a reliable way the gains I initially saw, I think it would maybe more visible with realistic workloads involving different converters. But this kind of instanceof checks are used in various places in the Framework, and it would be better to fix the root issue rather than artificially write bad code to workaround it in specific places.

No promise yet, but https://bugs.openjdk.org/browse/JDK-8180450 may be backported to Java 17, so at least with this version we should be able to avoid this painful instanceof issue on both Spring Boot 2.7.x and 3.x lines. We will push for that short and try to make that happen asap.

@dreis2211 Except this instanceof issue, have you identify other low hanging fruits we could optimize to increase the throughput//latency?

dreis2211 · 2022-11-28T11:25:16Z

I've contributed some low hanging fruits already in #29428 & #29412

As noted earlier that the media-type/content-type dance is quite a big chunk - regardless of the instanceof and secondary supers cache problematic - but in my opinion not really low-hanging unfortunately.

I think with the update to 3.x plus bullseye in the framework benchmarks, we should be able to achieve already quite some uplift. I think I will contribute this in the next days. And while being at it, probably also setting the content-type to avoid the costly computation.

sdeleuze · 2022-11-28T14:13:02Z

Great, thanks a lot for your various contributions. Could be great if TechEmpower benchmark could be updated with Spring Boot 3.0.x, explicit content type and compare with the previous figures. Do you plan to do it?

@rstoyanchev I am wondering if you could have a look on if you see some reasonable way to skip some content type resolution part for more common use cases, if we should drop a note in our reference documentation to advise configuring explicitly the content type for better performances, etc.

dreis2211 · 2022-11-28T14:30:37Z

Yeah, I will likely do that somewhen this week. Fill do another comparison then as well on my local machine. But this week is busy, so might be something for the weekend.

sdeleuze · 2022-11-28T14:40:59Z

No rush, thanks for your help on this, much appreciated.

dreis2211 · 2022-11-28T20:00:43Z

There you go: TechEmpower/FrameworkBenchmarks#7749

2.6.9 (Before)

3.0.0 (After)

Compared to previous results, I updated my Mac, Docker etc.. The results were already better & worse before, but I think the head-to-head comparison from this evening is already a good step into the right direction

rstoyanchev · 2022-11-30T15:09:44Z

We evaluate conditions in RequestMappingInfo#getMatchingCondition in a specific order and return as soon as one returns false. Patterns has always been at the bottom with String path matching, but with parsed PathPattern in place, I wonder if we should move the PathPatternsRequestCondition near the top, right after the HTTP method check. We could compare some more involved pattern vs consumes or produces.

That said, the ProducesRequestCondition does cache the accepted media types in a request attribute, so at least that should be done only once per request withing RequestMappingHandlerMapping. The same is not true for ConsumesRequestCondition so there is an optimization to be made there.

None of this would impact the benchmark which now sets the response content-type directly, but there might be some new hotspots to be optimized in its present form, perhaps in AbstractMessageConverterMethodProcessor that was discussed earlier.

sdeleuze · 2022-12-09T14:24:38Z

Thanks for creating this follow-up issue @rstoyanchev.

@dreis2211 I close this issue as I think we have reached a reasonable state, feel free to open more focused one if you identify some interesting optimizations. I will on my side make sure that a fix for https://bugs.openjdk.org/browse/JDK-8180450 is available in Java 17 with our partner Bellsoft.

dreis2211 · 2022-12-09T14:26:08Z

Same here - we improved a few things along the road. Great stuff. Thanks 🙏

cristianorvf · 2023-02-24T13:51:38Z

Maybe check the flux versions too, do they use the same infrastructure for media type?

sdeleuze · 2023-02-24T14:24:12Z

Functional web APIs for both Spring MVC and WebFlux skip the infrastructure for media type by design, so that's maybe this that should be tried. I would expected annotation-based WebFlux controllers to use a similar kind of media type infrastructure.

stillya · 2023-10-08T17:37:50Z

It will be great to add benchmark with virtual thread based executor, by the way.

dreis2211 · 2023-11-14T15:21:28Z

More an FYI. Round 22 results have been published. The screenshot below shows the composite scores ( with fullstack + java filter applied).

Quarkus ranks at #38
Spring ranks at #88

But more important than the rank the plaintext & json benchmark shows that Quarkus has still a throughput ~4 times "better" than Spring.

Nonetheless still some good news. The absolute results are better than last time apparently, despite the same machine. So the few things we've done here seem to have had at least a positive impact.

7fantasy7 · 2023-11-15T03:03:18Z

From what I've seen in a plain text use case (not techempower benchmark, but can be the same),
it uses StreamUtils#copy(String in, Charset charset, OutputStream out)
through the StringHttpMessageConverter

Which does unnecessary buffer allocation (8kb) inside an OutputStreamWriter consturctor.

Maybe it can be avoided, so instead of:

Writer writer = new OutputStreamWriter(out, charset);
writer.write(in);
writer.flush();

write to the stream directly.

out.write(in.getBytes(charset));
out.flush();

Here's approximate difference in time

Benchmark                        (stringLength)  Mode  Cnt    Score     Error  Units
StreamUtilsBenchmark.copy                    10  avgt   10    4.769 ±   0.110  ns/op
StreamUtilsBenchmark.copy                   100  avgt   10   10.368 ±   0.569  ns/op
StreamUtilsBenchmark.copy                  1000  avgt   10   42.393 ±   0.905  ns/op
StreamUtilsBenchmark.copyOld                 10  avgt   10  637.235 ±  51.005  ns/op
StreamUtilsBenchmark.copyOld                100  avgt   10  602.219 ± 188.381  ns/op
StreamUtilsBenchmark.copyOld               1000  avgt   10  898.469 ± 160.039  ns/op

sdeleuze · 2023-11-21T13:19:46Z

I plan to give another look to the benchmark.

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Nov 2, 2022

sdeleuze self-assigned this Nov 3, 2022

dreis2211 mentioned this issue Nov 4, 2022

Avoid String allocations in MediaType.checkParameters #29428

Closed

sdeleuze added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Nov 28, 2022

rstoyanchev mentioned this issue Dec 9, 2022

ConsumesRequestCondition should cache the parsed Content-Type #29669

Closed

sdeleuze closed this as completed Dec 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate suboptimal score on TechEmpower benchmarks #29421

Investigate suboptimal score on TechEmpower benchmarks #29421

dreis2211 commented Nov 2, 2022 •

edited

dreis2211 commented Nov 2, 2022

sdeleuze commented Nov 3, 2022 •

edited

dreis2211 commented Nov 3, 2022

dreis2211 commented Nov 3, 2022 •

edited

dreis2211 commented Nov 3, 2022 •

edited

sdeleuze commented Nov 3, 2022 •

edited

dreis2211 commented Nov 3, 2022 •

edited

dreis2211 commented Nov 5, 2022

sdeleuze commented Nov 6, 2022

GeorgeSalu commented Nov 11, 2022

dreis2211 commented Nov 13, 2022 •

edited

dreis2211 commented Nov 13, 2022

sdeleuze commented Nov 28, 2022

dreis2211 commented Nov 28, 2022 •

edited

sdeleuze commented Nov 28, 2022

dreis2211 commented Nov 28, 2022

sdeleuze commented Nov 28, 2022

dreis2211 commented Nov 28, 2022 •

edited

rstoyanchev commented Nov 30, 2022

sdeleuze commented Dec 9, 2022

dreis2211 commented Dec 9, 2022

cristianorvf commented Feb 24, 2023

sdeleuze commented Feb 24, 2023

stillya commented Oct 8, 2023

dreis2211 commented Nov 14, 2023 •

edited

7fantasy7 commented Nov 15, 2023 •

edited

sdeleuze commented Nov 21, 2023 •

edited

Investigate suboptimal score on TechEmpower benchmarks #29421

Investigate suboptimal score on TechEmpower benchmarks #29421

Comments

dreis2211 commented Nov 2, 2022 • edited

dreis2211 commented Nov 2, 2022

sdeleuze commented Nov 3, 2022 • edited

dreis2211 commented Nov 3, 2022

dreis2211 commented Nov 3, 2022 • edited

dreis2211 commented Nov 3, 2022 • edited

sdeleuze commented Nov 3, 2022 • edited

dreis2211 commented Nov 3, 2022 • edited

dreis2211 commented Nov 5, 2022

sdeleuze commented Nov 6, 2022

GeorgeSalu commented Nov 11, 2022

dreis2211 commented Nov 13, 2022 • edited

dreis2211 commented Nov 13, 2022

sdeleuze commented Nov 28, 2022

dreis2211 commented Nov 28, 2022 • edited

sdeleuze commented Nov 28, 2022

dreis2211 commented Nov 28, 2022

sdeleuze commented Nov 28, 2022

dreis2211 commented Nov 28, 2022 • edited

rstoyanchev commented Nov 30, 2022

sdeleuze commented Dec 9, 2022

dreis2211 commented Dec 9, 2022

cristianorvf commented Feb 24, 2023

sdeleuze commented Feb 24, 2023

stillya commented Oct 8, 2023

dreis2211 commented Nov 14, 2023 • edited

7fantasy7 commented Nov 15, 2023 • edited

sdeleuze commented Nov 21, 2023 • edited

dreis2211 commented Nov 2, 2022 •

edited

sdeleuze commented Nov 3, 2022 •

edited

dreis2211 commented Nov 3, 2022 •

edited

dreis2211 commented Nov 3, 2022 •

edited

sdeleuze commented Nov 3, 2022 •

edited

dreis2211 commented Nov 3, 2022 •

edited

dreis2211 commented Nov 13, 2022 •

edited

dreis2211 commented Nov 28, 2022 •

edited

dreis2211 commented Nov 28, 2022 •

edited

dreis2211 commented Nov 14, 2023 •

edited

7fantasy7 commented Nov 15, 2023 •

edited

sdeleuze commented Nov 21, 2023 •

edited