New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve HTTP/2 performance #34473
Comments
More data of our benchamrk.
More significant part is that HTTP 2 requires around 18% more CPU to do the same job.
NOTES:
I will try to profile the code with JFR to check if there is any obvious bottleneck in HTTP2 implementation Not sure if you have any benchmark aligned with my results |
JFR is complaining about very high number of execeptions per sec (ARound 8000 per sec). Most of them coming from Vertx/Quarkus. This is related when the code to close stream (once per request/response pair). JFR suggests avoid using Exceptions for this, because it is more expensive. What do you think? Together with metric with stream closing that Quarkus is tagging it as a REST and CLIENT_ERR, this could be also improved in case this is the main bottleneck |
Thanks a lot for the analysis! We'll definitely need input for @vietj here |
I have also the JFR file if needed to get more info |
I discussed this issue with @vietj. Julien will look at how we can avoid the extra flush in this case. |
Thanks. Main concern is NOT correctness, it is performance. As commented in our previous benchmark, HTTP/2 need 18% more CPU than HTTP 1.1 in our use case and I was looking for differences that could explain that. I am only guessing because I do NOT know the implementation, BUT started a preliminar analysis and I found the extra message (at least Quarkus HTTP 1.1. is different), and what probably is having a major impact is the close of HTTP2 streams that is handled as an exceptional case (JFR complains about massive exceptions), and this is usually less performant (e.g Exceptions usually handle stack traces that could be expensive with high volumes). Maybe I am totally wrong, BUT wanted to share |
These exceptions do not have stacktraces, so they should be fine. |
Thanks a lot Clement. Not sure if there is any way I could help. Just let me know MAybe I could benchmark a predelievery or something similar |
hi @chevaris related the exceptions concern: JFR doesn't know that the exception raised isn't populating the stack trace... |
Thanks @franz1981 for the clarification. I saw that vertx 4.4.5 is already released and includes eclipse-vertx/vert.x#4775 that I assume should improve HTTP2 performance. Once there is a Quarkus version using vertx 4.4.5 I can repeat the benchmark again. Is that OK? |
Yes, we are working on the integration of Vert.x 4.4.5 at this very moment. Expect something on Monday. |
Out of date. |
Thanks a lot for the improvement. Currently I do NOT have access to HW, BUT as soon I can get that I will benchmark the microservice as I did before and provide data comparing HTTP 1.1 and HTTP2 |
@chevaris I am eagerly awaiting some fresh benchmarks now that 3.4.3 was released today. I found this issue today after benchmarking HTTP/1 vs HTTP/2 in our Quarkus app (2.16.6.Final). It took us by surprise that HTTP/2 resulted in less throughput overall. 🤞 3.4.3 this improves things. |
Any news @cjbooms ? |
I don't have a public test harness to share, but below is our internal results with v2/v3 quarkus and http1.1/http2.
Clear winner is v2, http2. Not sure why, but http2 appears to have degraded in v3... Notes:
|
Thanks @cjbooms |
Yes, but it will be awhile. We wont be picking up this topic again until after Cyber Week. |
Sorry for taking so late to answer back My benchmark shows different results and aligned what the issue reported in this topic. When using HTTP 2 the quarkus server is using a significant amount of extra CPU compared with HTTP 1.1 (15-17% aprox more) and latencies are worse. I am intrigued with your results and why with my application is diverging so much when sending traffic with HTTP 1.1 or HTTP 2 (server not restarted, JVM properly warm up) Quarkus version: 3.6.3 - Openjdk 17.0.9 Benchamrk running for 3 mins for each config (several warmup rounds) HTTP/2 (10 connection / max 100 stream per connection) HTTP 1.1 (100 connections) Which benchmark tool are you using? Can you ellaborate on the kind of operations , latencies, etc? I have tried with other configs in terms of numbers connections, streams per connection, etc and HTTP 1.1. is always outperforming HTTP2 implementation in my benchmark. At least in my recent experience, Vertx http2 stack is less efficient compared with HTTP 1.1. I have been using the vertx http proxy (https://vertx.io/docs/vertx-http-proxy/java/) module lately and when the proxy HTTP client is using HTTP2 the results are also significantly worse than using HTTP 1.1 (In this case the latencies are degraded heavily compared with HTTP 1.1). Thanks, Evaristo |
It's difficult to compare the two protocols this way. You can than both try increase the number of streams, but beware, by definition this is prone to some queuing effects, because they will always be served from the same connection in the same I/O thread. Vertex and Netty, without any specific configuration round robin assign physical connections among available I/O threads, while streams are served from the same physical connection. And beware (ie I didn't checked really, what's the configuration for quarkus related number of I/O threads which can serve HTTP 2 - that's why I suggest to avoid any quirk related it). Other suggestions: verify if all the configured connections are being used and how much, too. |
This so trivial Vertx app (Taken from https://vertx.io/docs/vertx-web-proxy/java/) uses more CPU and worses latencies at any number of TPSs with HTTP/2 than with HTTP 1.1 (at least with Hyperfoil) HttpServer backendServer = vertx.createHttpServer(); Router backendRouter = Router.router(vertx); backendRouter.route(HttpMethod.GET, "/foo").handler(rc -> { I'm the target resource!");}); backendServer.requestHandler(backendRouter).listen(7070); |
Please try constraining the number of vertx cores to one and use the same number of physical connections to both protocols. And please provide the Hyperfoil yaml to replicate the test, to be sure we can perform the same test of you. Is highly appreciated if you can collect profiling data using async profiler, possibly using -t option. Adding @vietj in case he got something to share |
I still do NOT understand the benchmark that was referred in this ticket to say that HTTP/2 performance is better than HTTP 1.1. I do NOT doubt that in your benchmark HTTP/2 is better BUT It is NOT clear for me what you are testing (Operations, connections, latencies, etc) and what is the angle you are using to say that. Could you clarify the use case, units of the table , etc. I could say that you are using around 300 request per seconds with latencies around 50 - 100 msecs. Am I OK? Could you share your benchamrk files and the size aprox of the responses? I am assume that you are using really hughe documents or a very small amount of HW for the benchmark to have the figures in the table My use case is very simple. 2 microservice communicating with HTTP REST APIs. Very simple / response protocol (No like browser with CSS; images, javascript, etc). Request are POST with very small JSON and responses are JSON around 4KB. Regarding your suggestions: Why to use the same number of connections?
I do NOT think that HTTP 1.1 with pipelining is the right choice to send requests NOT related between them due due to the ordering required by HTTP 1.1 pipelining (could make sense for browsers, BUT most of the browser are using actually a pool of HTTP1.1 connections.) Anyhow I tested it and result are also better than HTTP/2 The benchmark I am running shows that in order to communicate with a Quarkus micro, is more efficient using a big enough pool of HTTP 1.1 connections than using an smaller pool of fat HTTP 2 connections. I tried with multiple combinations of HTTP/2 streams and number of connections without any success (to discard that TCP flow control could be related). Here more efficient means less replicas of the microservice are needed to handle the same amount of load (and of top of that latencies are better). I already reported the results from the profiling I did (with my limited capability and considering that I do NOT know the code) and I reported 3 things:
This is the hyperfoil file I used for HTTP/2 name: chevaConstantRate
For HTTP 1.1 I replaced by sharedConnections: 100 I also tried HTTP 1.1 (with pipelining) Summary of the results: Quarkus version: 3.6.3 - Openjdk 17.0.9 Benchamrk running for 3 mins for each config (several warmup rounds) HTTP/2 (10 connection / max 100 stream per connection) HTTP 1.1 (100 connections) HTTP 1.1 pipelining limit (10 connections / 100 pipelining limit per connection) Summary: |
Support of pipelining for Hyperfoil in http 1.1 is sadly broken (I have to fix it yet, given that I am a project committer), hence I suggests to ignore its results. Related
The results from @cjbooms seems to agree that they have degraded performance in Quarkus v3 at a point that the http 1.1 performance (rps) are better than http 2 (300 vs 240), which doesn't seem to disagree with your numbers: http 2 isn't faster in v3. I agree anyway that the use cases could be very different and not comparable, anyway.
Because the way Netty handle parallelism/concurrency with streams vs physical connections and the way HOL can bite the streams in case a single response isn't sent in one go, causing others to be queued up. The more physical connections the more real concurrency exist, unless Netty can chunk responses, allowing interleaving them. The reason why I was asking the number of cores, constraining to one, was to rule out that http 2 physical connections were correct assigned to different physical cores, granting some parallelism. Http 1.1 by default always do it, while with http 2 I am not sure (meaning; I don't know). Returning on the topic: hope this week before Christmas to have a look at your reproducer and report any finding. |
Please @cjbooms could you create a new issue which report just the comment about the http 2 performance degradation if compared to V2? I would like to keep those issues separated to save being confused while looking at both. |
Support of pipelining for Hyperfoil in http 1.1 is sadly broken (I have to fix it yet, given that I am a project committee). I saw you in some tickets . It is a great tool !!!!
Tested with Quarkus 3.32 and 3.6.3 (Both same behaviour and heavy amount of exceptions) |
Thanks and happy you have used it!
I should check if the changes are in (likely) and what they were meant to solve |
for a better comparison between HTTP/1 and H2 I think you should lower the number of max concurrent stream per connection, specially if you are using a small number of H2 connection (10), instead increase the number of H2 connection and decrease the max number of concurrent stream, e.g. you could try 100 H2 connections with a max concurrent stream 10. A small number of connection (compared to the number of cores) will put more load on some core than others, using more connections with small max stream tends to spread the load in a better way. of course this is a recommendation for a benchmark. |
@chevaris
And it has shows few low hanging fruits, reported at #37835 and others, more complex, reported at eclipse-vertx/vert.x#5047. Currently the vertx 4.x branch already contains eclipse-vertx/vert.x@841d6fb and eclipse-vertx/vert.x@4845904 which already address some evident cost due to Another low hanging fruit (although still not obvious to fix), is the pseudo-header lookup and validation cost in Http2's Netty which seems to happen too much and too often in the HPACK's decode paths (ie While the last standing 3 differences are yet:
Which make clear that the whole "stream" concept in Http 2 doesn't come for free and it has its costs, especially for cases as simple as these ones, but clearly some of the overhead could be removed. Generally speaking the best we could improve directly within vertx (hence by consequence, for quarkus) has already been done for what was detected as a problem. If you have the chance to compile vertx 4.x and run the experiment you're use to run, you can verify that things are getting in a better shape; it will take to roll a new release before the changes will be visibile to Quarkus, but it's a matter of time. |
@chevaris @cjbooms update on this: I have further progressed into "fixing" the performance differences between HTTP 1.1 and 2 and found many others small/big changes, sent directly to Netty eg
Some already merged and others in the process of being reviewed. Additionally, others related a deficiency to scale eg netty/netty#13741 My take on HTTP 2 is that, under realistic and correct usage, is a great protocol to reduce the required physical connections and improve the network usage (thanks to HPack caching/encoding), but in cases where:
It adds an inherent cost of managing the streams, including distributing fairly their traffic, coalescing writes and creating them in the hot path, which makes HTTP 1.1 just faster, in its peak performance. This has been a surprising fact to me, but it is what it is. Just adding this, but take it with the grain of salt: the overall improvement in peak cpu saving has been around 35-40% applying all fixes to quarkus |
Really thanks a lot for the very detailed work on this and the support!!!! I think you made a very good summary, and as you commented in some cases the stream handling could less performant that using extra connections. I actually got better results decreasing number of streams and using more connections as suggested here. I think is really very good to see all the improvements coming making vertx /quarkus HTTP2 stack even better (It is already great compared with other options ). The more than I use it , the more than I like it. |
Description
BACKGROUND
I have implemented a Quarkus based microservice that is targeting to replace a Spring Boot implementation
Microservice receives POST (JSON) requests and provide answer with JSON.
LIMITATION WITH HTTP/2
We have observed that latencies when using HTTP/2 are worse that when we are using HTTP 1.1 (0,5 msecs aprox per request). CPU usage is also higher (between 5-10%). Obviously in HTTP/2 connections the number of connections needed to sustain the same throughput is much lower (multiplexing in HTTP2)
This is NOT happening in Spring (Jetty) implementation in which HTTP/2 latencies are aprox the same compared with Spring Boot HTTP 1.1.
GOAL OF THIS TICKET
Purpose of the ticket is to check why HTTP/2 latencies are worse (at least in microservices with long living connections) compared with HTTP 1.1 and provide a fix
INITIAL ANALYSIS (IN CASE COULD HELP)
After some analysis we have found a difference in Quarkus HTTP/2 compared with (QUarkus HTTP 1.1 or Spring Jetty HTTP/2) that could explain the performance drop (worse latency)
We have captured packets for each implementation (Images attached). This is the result. QUarkus HTTP/2 is using one extra message compared with other implementations. Any reason for that? At least in this use case, I do NOT see the need to avoid sending Headers and response data in the same packet. I can understand that streaming use cases could be different.
1.- Quarkus HTTP/2
Client --------- HTTP2/JSON POST HEADERS + DATA -----------------------> Quarkus server
Quarkus --------- HTTP2 HEADERS (200 OK)-----------------------------------------> Client
Client ----------------------------------------------------------------------------------> ACK
Quarkus --------- HTTP2/JSON DATA (END STREAM) -------------------------------> cLIENT
Client ----------------------------------------------------------------------------------> ACK
2.- Quarkus HTTP/1.1
Client --------- HTTP2/JSON POST HEADERS + DATA -----------------> Quarkus server
Quarkus --------- HTTP2 HEADERS (200 OK) + DATA -------------------------> Client
Client ---------------------------------------------------------------------------> ACK
3.- Spring (Jetty) HTTP/2
Client --------- HTTP2/JSON POST HEADERS + DATA -----------------> Spring server
Spring server --------- HTTP2 HEADERS (200 OK) + DATA (END STREAM)-------> Client
Client ------------------------------------------------------------ --------------> ACK
Reproducer to check network packages
code-with-quarkus.zip
Send traffic with curl or wrk or hyperfoil. Capture with wireshark
curl -v --http2 -d '{"name": "juan"}' -H "Content-Type: application/json" -X POST http://localhost:8080/hello
Implementation ideas
No response
The text was updated successfully, but these errors were encountered: