Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate suboptimal score on TechEmpower benchmarks #29421

Closed
dreis2211 opened this issue Nov 2, 2022 · 27 comments
Closed

Investigate suboptimal score on TechEmpower benchmarks #29421

dreis2211 opened this issue Nov 2, 2022 · 27 comments
Assignees
Labels
type: enhancement A general enhancement

Comments

@dreis2211
Copy link
Contributor

dreis2211 commented Nov 2, 2022

Hi 👋

I'm not a particular huge fan of the TechEmpower benchmarks, because they barely reflect real use-cases, but (novice) programmers who maybe look for some resources that might help them with their questions of what framework to choose for their next projects might look into these benchmarks and make their decision (partly) based on these numbers.

I've compared Quarkus and Spring(-Boot) here to compare two popular frameworks that have similar goals and deliver similar functionality. There is also a comparison over at Baeldung https://www.baeldung.com/spring-boot-vs-quarkus that compares these two.

image

Imho, this particular comparison isn't exactly great news for the Spring ecosystem. While there are some benchmarks on par with Quarkus, the majority is "won" by Quarkus - and by a greater margin one would expect. I've recently overheard a conversation why one should use Spring if Quarkus delivers everything as well, but with better performance. Now, we all know that there is more to this story and application A might be better suited for Quarkus while application B might be better suited for Spring - for whatever reasons. There is no silver bullet, after all. But this conversation kept me thinking. Obviously, I'm biased towards Spring and I love the ecosystem, but I'm wondering if we can do something about the performance gap (in these benchmarks).

I know for example that recent changes to the Quarkus benchmarks have been done by people involved with Quarkus. The Spring benchmarks are somewhat driven infrequently by community members (I think I have done something on these ages ago myself as well).

I'm also happy to help with this and at least update the Spring(-Boot) one to latest versions but I wonder if you have comparable machines at hand that you could run these benchmarks against and find out why these numbers are so comparably "bad". (I do have the suspicion that the numbers are actually much closer in reality). I'm also happy to help with the investigations, if you want me to 😉

Cheers,
Christoph

P.S.: The point of this should be not to overtake Quarkus or establish a Spring vs Quarkus mentality, but to simply improve the Spring numbers. Quarkus is just a good comparison data point imho.

P.S.S: I'm reporting this here, because the majority of logic relevant to the benchmarks is inside Spring-Framework and not in Spring-Boot.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Nov 2, 2022
@dreis2211
Copy link
Contributor Author

Addendum 1:

There aren't exactly many benchmarks based on Tomcat (the default for SB apps), but comparing the JSON serialization with e.g. a vanilla undertow reveals that there seems to be a quite a lot of framework overhead. And it's not Jackson (because that is used in both undertow and spring (haven't checked Quarkus))

image

@sdeleuze
Copy link
Contributor

sdeleuze commented Nov 3, 2022

I'm not a particular huge fan of the TechEmpower benchmarks, because they barely reflect real use-cases

@dreis2211 Same, but as I am myself working on some benchmarks to compare Boot 3 JVM and native apps, we can maybe use this opportunity to compare:

  • How Spring Framework performs in a default Spring Boot 3 application versus Spring Boot 2 and see what is the evolution if any
  • Compare Undertow versus Tomcat on recent versions
  • Check if we can find low hanging fruits to optimize

Let's maybe start by focusing on the https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Java/spring. Do you have the bandwidth to create a branch with a Spring Boot 3.0.0-SNAPSHOT (we are close to GA so should be pretty stable) version of it, upgrading to Java 17, Jakarta EE, etc. and compare the result you have locally with the Spring Boot 2.6.9 version used currently? And to compare Undertow versus Tomcat on both?

@sdeleuze sdeleuze self-assigned this Nov 3, 2022
@dreis2211
Copy link
Contributor Author

I will have some time next week or earliest this weekend to do some preliminary tests, I guess.

@dreis2211
Copy link
Contributor Author

dreis2211 commented Nov 3, 2022

Fun fact, apparently the spring benchmarks already run with Undertow and Round 21 doesn't include these changes yet.

@dreis2211
Copy link
Contributor Author

dreis2211 commented Nov 3, 2022

So, I had a bit of time tonight:

Local specs:
Model: MacBookPro:
Processor: 8-Core Intel Core i9 2,3 GHz
Memory: 32 GB

NOTE: The results should be taken with a grain of salt. After all, local benchmarks are not really isolated and might be influenced by things running on my machine.

Spring-Boot 2.6.9 (Undertow):

image

Spring-Boot 2.6.9 (Tomcat):
image

Spring-Boot 3.0.0-SNAPSHOT (Undertow):

image

Spring-Boot 3.0.0-SNAPSHOT (Undertow) (with produces on GetMapping):
image

Spring-Boot 3.0.0-SNAPSHOT (Tomcat) (with produces on GetMapping):
image

Spring-Boot 2.3.1 (Tomcat) (Round 20):
image

Quarkus:
image

I'm sort of focussing on the Plaintext & JSON ones here as the others are somewhat volatile and due to the database involvement include some more flakyness not under any frameworks control. What I found was that there seems to be a substantial overhead if the application needs to determine the producible media types. E.g. the changes that moved from 2.3.1 & 2.6.9 include the produces field on the GetMapping annotations for plaintext. As one can see the different runs for 3.0.0 also differ largely for JSON. And there the only change is again the manual produces on the endpoint. Maybe the following plays a role here:

for (HttpMessageConverter<?> converter : this.messageConverters) {
if (converter instanceof GenericHttpMessageConverter && targetType != null) {
if (((GenericHttpMessageConverter<?>) converter).canWrite(targetType, valueClass, null)) {
result.addAll(converter.getSupportedMediaTypes(valueClass));
}
}
else if (converter.canWrite(valueClass, null)) {
result.addAll(converter.getSupportedMediaTypes(valueClass));
}
}
return (result.isEmpty() ? Collections.singletonList(MediaType.ALL) : new ArrayList<>(result));
.

Funny enough I came across this method already when playing around with the type pollution agent from @franz1981 / RedHat, which yields the following output:

--------------------------
Type Pollution Statistics:
--------------------------
1:      org.springframework.http.converter.json.MappingJackson2HttpMessageConverter
Count:  58366
Types:
        org.springframework.http.converter.HttpMessageConverter
        org.springframework.http.converter.GenericHttpMessageConverter
Traces:
        org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducibleMediaTypes(AbstractMessageConverterMethodProcessor.java:385)
                class: org.springframework.http.converter.HttpMessageConverter
                count: 29208
        org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducibleMediaTypes(AbstractMessageConverterMethodProcessor.java:386)
                class: org.springframework.http.converter.GenericHttpMessageConverter
                count: 29115
        org.springframework.web.servlet.mvc.method.annotation.AbstractMessageConverterMethodProcessor.getProducible
MediaTypes(AbstractMessageConverterMethodProcessor.java:387)
                class: org.springframework.http.converter.GenericHttpMessageConverter
                count: 43

If true, this would maybe indicate a scaling/performance problem caused by https://bugs.openjdk.org/browse/JDK-8180450 . Maybe it's not that, but just the fact, that it iterates over 8 message converters for each request to get their producible media types. I haven't really had the time tonight to look deeper into this.

In terms of Tomcat vs Undertow, there seems to be a huge difference for plain text responses (e.g. Tomcat only reaching half of what Undertow achieves).

Cheers,
Christoph

@sdeleuze
Copy link
Contributor

sdeleuze commented Nov 3, 2022

Interesting, better to focus initially on JSON and plaintext indeed.

It would be interesting to do for example a new round of JSON test with only MappingJackson2HttpMessageConverter and remove the other default converters.

I can also provide a branch that removes this instanceof check, I will try to create that tomorrow and share it with you.

@dreis2211
Copy link
Contributor Author

dreis2211 commented Nov 3, 2022

I think AbstractMessageConverterMethodProcessor.writeWithMessageConverters is indeed something to look at.

E.g. if I force the content-type to be preselected (a.k.a skipping even more logic than just the producible media types) via

	@GetMapping(value = "/plaintext")
	String plaintext(HttpServletResponse response) {
		response.setContentType("text/plain");
		return "Hello, World!";
	}

	@GetMapping(value = "/json")
	Message json(HttpServletResponse response) {
		response.setContentType("application/json");
		return new Message("Hello, World!");
	}

it yields the following numbers:

Spring-Boot 2.6.9 (Undertow):

image

Remember - the status quo of that was:

image

@dreis2211
Copy link
Contributor Author

Another finding: the current tests run inside a Debian Buster image. With Debian Bullseye I can reach a substantially better score on both JSON and Plaintext with 3.0.0 and the forced Content-Type:
image

@sdeleuze
Copy link
Contributor

sdeleuze commented Nov 6, 2022

@dreis2211 Pretty good results on skipping instanceof GenericHttpMessageConverter checks, see #29438. I was able to confirm the gains on 2 different benchmarks, but if you can check on your side as well, please share the results in the PR.

@GeorgeSalu
Copy link

is it worth testing with jetty ?

@dreis2211
Copy link
Contributor Author

dreis2211 commented Nov 13, 2022

@GeorgeSalu Generally, I wouldn't worry too much about Jetty or Undertow. Tomcat is usually the one that falls behind a bit.

But I've compared it just for you - with latest snapshots, bullseye and enforced content-type (throughout all benchmark modes and thus having a better weighted score overall):

Undertow
image

Jetty
image

From personal experience, the margin between them is not as huge as one might read from this results. I'm in fact a little bit surprised about the numbers generally. I have done some memory improvements that don't explain these results, though. I need to take a deeper look what changed in the snapshots in the meanwhile. What has changed is that my Macbook runs on Ventura now. Maybe there have been major improvements to the host OS that generally boost the numbers. Don't forget there is a bit of flakyness involved doing these on a local setup.

Anyhow: Back to your question. Undertow usually wins, also in internal loadtests that we do. There is also a dated benchmark comparison over at https://www.baeldung.com/spring-boot-servlet-containers that hasn't drastically changed in my opinion over recent years.

Cheers,
Christoph

@dreis2211
Copy link
Contributor Author

I've run a new round of benchmarks to have comparable results between Spring-Boot 3.0.0 RC1 and RC2

RC1
image

RC2
image

There is an upgrade from Undertow 2.2.x to 2.3.x - I wouldn't be surprised if this brings some improvements. Next to several tiny improvements in Spring itself.

@sdeleuze
Copy link
Contributor

As you may have seen, more benchmarks on #29438 did not achieve to reproduce in a reliable way the gains I initially saw, I think it would maybe more visible with realistic workloads involving different converters. But this kind of instanceof checks are used in various places in the Framework, and it would be better to fix the root issue rather than artificially write bad code to workaround it in specific places.

No promise yet, but https://bugs.openjdk.org/browse/JDK-8180450 may be backported to Java 17, so at least with this version we should be able to avoid this painful instanceof issue on both Spring Boot 2.7.x and 3.x lines. We will push for that short and try to make that happen asap.

@dreis2211 Except this instanceof issue, have you identify other low hanging fruits we could optimize to increase the throughput//latency?

@dreis2211
Copy link
Contributor Author

dreis2211 commented Nov 28, 2022

I've contributed some low hanging fruits already in #29428 & #29412

As noted earlier that the media-type/content-type dance is quite a big chunk - regardless of the instanceof and secondary supers cache problematic - but in my opinion not really low-hanging unfortunately.
image

I think with the update to 3.x plus bullseye in the framework benchmarks, we should be able to achieve already quite some uplift. I think I will contribute this in the next days. And while being at it, probably also setting the content-type to avoid the costly computation.

@sdeleuze
Copy link
Contributor

Great, thanks a lot for your various contributions. Could be great if TechEmpower benchmark could be updated with Spring Boot 3.0.x, explicit content type and compare with the previous figures. Do you plan to do it?

@rstoyanchev I am wondering if you could have a look on if you see some reasonable way to skip some content type resolution part for more common use cases, if we should drop a note in our reference documentation to advise configuring explicitly the content type for better performances, etc.

@dreis2211
Copy link
Contributor Author

Yeah, I will likely do that somewhen this week. Fill do another comparison then as well on my local machine. But this week is busy, so might be something for the weekend.

@sdeleuze
Copy link
Contributor

No rush, thanks for your help on this, much appreciated.

@sdeleuze sdeleuze added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged or decided on labels Nov 28, 2022
@dreis2211
Copy link
Contributor Author

dreis2211 commented Nov 28, 2022

There you go: TechEmpower/FrameworkBenchmarks#7749

2.6.9 (Before)
image

3.0.0 (After)
image

Compared to previous results, I updated my Mac, Docker etc.. The results were already better & worse before, but I think the head-to-head comparison from this evening is already a good step into the right direction

@rstoyanchev
Copy link
Contributor

We evaluate conditions in RequestMappingInfo#getMatchingCondition in a specific order and return as soon as one returns false. Patterns has always been at the bottom with String path matching, but with parsed PathPattern in place, I wonder if we should move the PathPatternsRequestCondition near the top, right after the HTTP method check. We could compare some more involved pattern vs consumes or produces.

That said, the ProducesRequestCondition does cache the accepted media types in a request attribute, so at least that should be done only once per request withing RequestMappingHandlerMapping. The same is not true for ConsumesRequestCondition so there is an optimization to be made there.

None of this would impact the benchmark which now sets the response content-type directly, but there might be some new hotspots to be optimized in its present form, perhaps in AbstractMessageConverterMethodProcessor that was discussed earlier.

@sdeleuze
Copy link
Contributor

sdeleuze commented Dec 9, 2022

Thanks for creating this follow-up issue @rstoyanchev.

@dreis2211 I close this issue as I think we have reached a reasonable state, feel free to open more focused one if you identify some interesting optimizations. I will on my side make sure that a fix for https://bugs.openjdk.org/browse/JDK-8180450 is available in Java 17 with our partner Bellsoft.

@sdeleuze sdeleuze closed this as completed Dec 9, 2022
@dreis2211
Copy link
Contributor Author

Same here - we improved a few things along the road. Great stuff. Thanks 🙏

@cristianorvf
Copy link

Maybe check the flux versions too, do they use the same infrastructure for media type?

@sdeleuze
Copy link
Contributor

Functional web APIs for both Spring MVC and WebFlux skip the infrastructure for media type by design, so that's maybe this that should be tried. I would expected annotation-based WebFlux controllers to use a similar kind of media type infrastructure.

@stillya
Copy link

stillya commented Oct 8, 2023

It will be great to add benchmark with virtual thread based executor, by the way.

@dreis2211
Copy link
Contributor Author

dreis2211 commented Nov 14, 2023

More an FYI. Round 22 results have been published. The screenshot below shows the composite scores ( with fullstack + java filter applied).

Quarkus ranks at #38
Spring ranks at #88

But more important than the rank the plaintext & json benchmark shows that Quarkus has still a throughput ~4 times "better" than Spring.

Nonetheless still some good news. The absolute results are better than last time apparently, despite the same machine. So the few things we've done here seem to have had at least a positive impact.

image

@7fantasy7
Copy link

7fantasy7 commented Nov 15, 2023

From what I've seen in a plain text use case (not techempower benchmark, but can be the same),
it uses StreamUtils#copy(String in, Charset charset, OutputStream out)
through the StringHttpMessageConverter

Which does unnecessary buffer allocation (8kb) inside an OutputStreamWriter consturctor.

Maybe it can be avoided, so instead of:

Writer writer = new OutputStreamWriter(out, charset);
writer.write(in);
writer.flush();

write to the stream directly.

out.write(in.getBytes(charset));
out.flush();
Here's approximate difference in time
Benchmark                        (stringLength)  Mode  Cnt    Score     Error  Units
StreamUtilsBenchmark.copy                    10  avgt   10    4.769 ±   0.110  ns/op
StreamUtilsBenchmark.copy                   100  avgt   10   10.368 ±   0.569  ns/op
StreamUtilsBenchmark.copy                  1000  avgt   10   42.393 ±   0.905  ns/op
StreamUtilsBenchmark.copyOld                 10  avgt   10  637.235 ±  51.005  ns/op
StreamUtilsBenchmark.copyOld                100  avgt   10  602.219 ± 188.381  ns/op
StreamUtilsBenchmark.copyOld               1000  avgt   10  898.469 ± 160.039  ns/op

@sdeleuze
Copy link
Contributor

sdeleuze commented Nov 21, 2023

I plan to give another look to the benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

8 participants