Save slicing HTTP 2 headers & data #13783

franz1981 · 2024-01-16T16:07:04Z

Motivation:

DefaultHttp2FrameWriter's always create aggregate promises and slices out of headers and data: both could be saved while reducing the amount of pipeline traversals in case the additional cost of creating a sliced buffer surpass the data to be written.

Modifications:

Small header and data could be copied directly in a single buffer, without any need to create aggregate promises.

Result:

Faster small data's writes

franz1981 · 2024-01-16T16:12:20Z

@normanmaurer I'm not quite sure i could reuse the promise, so please take a look if I'm not assuming anything wrong...

The cutoff value has been decided by summing the required bytes for both a new sliced buffer (around 40 bytes ) + outbound buffer's entries (64 or 64 * 2 bytes, depending if headers/data paths) + promise aggregator (64 bytes), which means > 128 bytes.
Choosing 128 as a cutoff value has been conservative but able to capture many existing cases.
@idelpivnitskiy are you aware if this is something useful beyond microbenchmark, in term of cut-off values?

From what I can see, encoded response headers below 128 bytes are quite common, while for the data, depends, but it is still a possible scenario, although more typical in benchmarks IMO.

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java

franz1981 · 2024-01-16T16:24:04Z

This is just a preliminary example of the improvement, given that it doesn't account for the saving of the aggregate promise (which translate in less morphism of call-sites around the promises handling):

before:

Benchmark                                (padding)  (payloadSize)  (pooled)  Mode  Cnt    Score   Error  Units
Http2FrameWriterDataBenchmark.newWriter          0             64      true  avgt   10  112.406 ± 1.096  ns/op
Http2FrameWriterDataBenchmark.newWriter          0             64     false  avgt   10  110.992 ± 0.120  ns/op
Http2FrameWriterDataBenchmark.newWriter          0           1024      true  avgt   10  102.480 ± 1.899  ns/op
Http2FrameWriterDataBenchmark.newWriter          0           1024     false  avgt   10  112.736 ± 5.971  ns/op

now:


Benchmark                                (padding)  (payloadSize)  (pooled)  Mode  Cnt    Score   Error  Units
Http2FrameWriterDataBenchmark.newWriter          0             64      true  avgt   10   91.973 ± 0.343  ns/op
Http2FrameWriterDataBenchmark.newWriter          0             64     false  avgt   10  100.445 ± 0.238  ns/op
Http2FrameWriterDataBenchmark.newWriter          0           1024      true  avgt   10   94.347 ± 0.157  ns/op
Http2FrameWriterDataBenchmark.newWriter          0           1024     false  avgt   10   98.489 ± 0.196  ns/op

which is already ~10-20% improvement just on headers/data writing.

franz1981 · 2024-01-16T16:37:07Z

An additional benefit (specific to Vertx, but can really happen regardless), is that the data written by users could uses heap-based ByteBuf and embedding earlier their content into the direct buffer to send would save performing the copy, see

this code path just disappear with this PR, given that no heap buffer is passed to the transport.

I've added 003d1e29350c47fe50d40456312b25d5e91720da to ignore the cutoff limits if the data is held in an array buffer, so we would perform the copy earlier, saving creating the aggregate promise and the additional pipeline traversal

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java

franz1981 · 2024-01-16T18:56:45Z

I think I have a leak when the data buffer is not sent anymore across the transport, and be released after flushed. I should release it right after copied

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java

franz1981 · 2024-01-17T07:11:39Z

@ejona86 @carl-mastrangelo do you have any chance (or any pointer to make me try it) to verify/validate some of the changes I am sending recently, performance-wise? I'm thinking about gRpc over HTTP 2 and having something different from our Quarkus/Vertx stack would help a lot

franz1981 · 2024-01-22T19:48:51Z

@bryce-anderson this is using a very similar optimization although I couldn't push it as far as http 1.1 and make it produce a single buffer overall (we don't even presize the header here :/)

carl-mastrangelo · 2024-01-23T18:26:28Z

Alas, I have moved on from the gRPC team, and don't have access to sample http2 traffic.

franz1981 · 2024-01-23T18:29:51Z

Thanks @carl-mastrangelo to have answered; if there is some Google users interested into grpc Netty, please reach me out here

normanmaurer · 2024-01-24T23:09:16Z

@franz1981 Please let me know once this is ready again for review

franz1981 · 2024-01-26T20:14:58Z

Yep, @normanmaurer this should be ready to go as well

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java

normanmaurer · 2024-01-31T21:44:28Z

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java

+                    ctx.write(buf, promise);
+                } catch (Throwable t) {
+                    promise.setFailure(t);
+                    PlatformDependent.throwException(t);


why re throw ? feels wrong

Goog point; I've blindly copied from the original code this one iirc, but as you said, doesn't look right 👍

I don't feel yet comfortable to change it yet, i see that while throwing the exception it allow to correctly release in the caller the expected "leaky" data. I would leave it as it is, unless you want me to address the original code behaviour now (which is fine, just need to know you opinion here)

bryce-anderson

Just a few comments that crossed my mind, I haven't fully looked through this yet.
The PromiseAggregator instances got me wondering: could this same goal be achieved using a CompositeByteBuf? Perhaps that is just passing the buck so to speak, but if the transport implementations are intelligently handling those types of buffers we can avoid the pipeline traversals easily.

bryce-anderson · 2024-02-02T00:33:05Z

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java

+                    if (paddingBytes > 0) {
+                        // this is quite fast assuming small padding
+                        // see https://github.com/netty/netty/pull/13693
+                        buf.writeZero(paddingBytes);


Interesting. Can we use that down in writeHeadersPartsInternal as well?

Yep, given that is very tiny implementation detail, I assumed to leverage on it just on the hottest path ie single buffer, no promise aggregation

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java

franz1981 · 2024-02-02T09:21:46Z

@bryce-anderson

could this same goal be achieved using a CompositeByteBuf

That will both help with traversal, saving aggregate promises and saving entries on the outbound buffer, which is something I would explore regardless but, I still believe it tries to leverage writev on the transport, which is a shame to me (@normanmaurer am I right? ) when components are such small.
To solve/improve this we could think (unless already exists, but I don't think so) about an opt-in mechanism on the transport to avoid using writev with many small sized buffers (could be either from composite or others buffer types) cause is not as efficient as it could.

While what this pr deliver is another additional improvement (without introducing new buffer types); it immediately release the buffer which contains the data, embedding its content, making it available again to be re-used and saving the inherent additional heap cost of allocating the slice, amortizing it in the buffer which embedd it.

franz1981 · 2024-02-05T14:58:44Z

PTAL @bryce-anderson @normanmaurer

Related the suggestion to use CompositeByteBuf I would like to talk about this in a separate issue, given that it is something I wished to do in long time (together to some smart aggregation of small composite buffers), but requires some attention given how epoll works ie iovec max prevent sending in one go a direct composite buffer with "too many" components.
Not only, in the current moment I don't think composite buffers to work well with tiny buffers: every transport translate composite buffers in writev-like syscalls, which I much doubt are efficient with too small buffers.
Moreover, specifically with epoll; I have replaced one year ago write with send syscalls because the former was slightly more costly and I think the same applies to writev, compared to send

Motivation: DefaultHttp2FrameWriter's always create aggregate promises and slices out of headers and data: both could be saved while reducing the amount of pipeline traversals in case the additional cost of creating a sliced buffer surpass the data to be written. Modifications: Small header and data could be copied directly in a single buffer, without any need to create aggregate promises. Result: Faster small data's writes

franz1981 · 2024-02-10T13:31:20Z

@normanmaurer I see that the original code was throwing exceptions, while not expected AND not correctly releasing multiple retained header buffers; I could try fix the former but the latter would requires a separate pr really, given that seems a much more annoying and complex technical debt, which didn't yet bitten us.

franz1981 · 2024-02-15T14:02:52Z

mmm @normanmaurer what happened for 33f5c56? :P

normanmaurer · 2024-02-15T15:27:16Z

@franz1981 what you mean ?

franz1981 · 2024-02-15T15:29:00Z

@normanmaurer I've received notification of Merge branch '4.1' into 4.1_no_slice changes, but IDK what it is :P

normanmaurer · 2024-02-15T15:30:29Z

@franz1981 ah this was just to bring it up to date with current 4.1

franz1981 · 2024-02-16T06:31:23Z

@normanmaurer there is something more you want me to take a look for this before getting another review round?

As said, I am not very confident the original code was correctly handling leaks here, but is really tons of code and we should address it separately (eg all the header retained slices around, the thrown you noticed in some earlier comment) - ideally first, if we think them to be critic.

What I could do is to try address them already for the fast-paths I've introduced, but will create the weird situation, where the optimized (although common enough, I suppose) new stuff is more correct than what exists from some time already...

normanmaurer · 2024-02-16T11:09:16Z

@franz1981 just hold of a bit... I will try to find time to review this one first

franz1981 · 2024-02-25T20:23:25Z

@bryce-anderson if you want to give it another shot, I have tried implementing your suggestion to further reduce the number of allocations

franz1981 · 2024-03-04T15:38:42Z

Any other concerns @bryce-anderson or @normanmaurer ?
I can try reducing the amount of changes if it helps

franz1981 · 2024-04-11T15:36:14Z

@normanmaurer @chrisvest any news for this folks?
Not urgent but the performance effects was rather positives....

normanmaurer · 2024-04-20T19:15:13Z

@ejona86 @idelpivnitskiy @bryce-anderson can you check again ?

franz1981 requested review from normanmaurer and idelpivnitskiy January 16, 2024 16:07

franz1981 force-pushed the 4.1_no_slice branch from cb324a0 to bcfaf83 Compare January 16, 2024 16:13

franz1981 commented Jan 16, 2024

View reviewed changes

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java Outdated Show resolved Hide resolved

franz1981 force-pushed the 4.1_no_slice branch from 3faca11 to 003d1e2 Compare January 16, 2024 16:49

franz1981 commented Jan 16, 2024

View reviewed changes

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java Outdated Show resolved Hide resolved

franz1981 commented Jan 16, 2024

View reviewed changes

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java Outdated Show resolved Hide resolved

franz1981 mentioned this pull request Jan 17, 2024

Improve HTTP/2 performance quarkusio/quarkus#34473

Closed

franz1981 requested a review from ejona86 January 17, 2024 08:39

franz1981 force-pushed the 4.1_no_slice branch from 83232f4 to 4438bed Compare January 17, 2024 08:42

franz1981 mentioned this pull request Jan 17, 2024

Reduce HTTP 1.1 Full msg pipeline traversals #13785

Merged

franz1981 requested a review from bryce-anderson January 22, 2024 19:47

franz1981 commented Jan 26, 2024

View reviewed changes

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java Outdated Show resolved Hide resolved

normanmaurer requested changes Jan 29, 2024

View reviewed changes

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java Show resolved Hide resolved

franz1981 force-pushed the 4.1_no_slice branch 2 times, most recently from 66cca39 to 157a4a4 Compare January 31, 2024 16:43

franz1981 commented Jan 31, 2024

View reviewed changes

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java Show resolved Hide resolved

franz1981 commented Jan 31, 2024

View reviewed changes

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java Show resolved Hide resolved

franz1981 commented Jan 31, 2024

View reviewed changes

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java Show resolved Hide resolved

franz1981 commented Jan 31, 2024

View reviewed changes

codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java Show resolved Hide resolved

normanmaurer requested changes Jan 31, 2024

View reviewed changes

bryce-anderson reviewed Feb 2, 2024

View reviewed changes

franz1981 force-pushed the 4.1_no_slice branch from 8de883e to aab1682 Compare February 5, 2024 14:47

franz1981 force-pushed the 4.1_no_slice branch from aab1682 to 1db28a0 Compare February 5, 2024 15:09

Merge branch '4.1' into 4.1_no_slice

33f5c56

normanmaurer requested review from bryce-anderson and normanmaurer February 16, 2024 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save slicing HTTP 2 headers & data #13783

Save slicing HTTP 2 headers & data #13783

franz1981 commented Jan 16, 2024

franz1981 commented Jan 16, 2024 •

edited

franz1981 commented Jan 16, 2024

franz1981 commented Jan 16, 2024 •

edited

franz1981 commented Jan 16, 2024 •

edited

franz1981 commented Jan 17, 2024

franz1981 commented Jan 22, 2024

carl-mastrangelo commented Jan 23, 2024

franz1981 commented Jan 23, 2024

normanmaurer commented Jan 24, 2024

franz1981 commented Jan 26, 2024

normanmaurer Jan 31, 2024

franz1981 Feb 1, 2024

franz1981 Feb 1, 2024 •

edited

bryce-anderson left a comment

bryce-anderson Feb 2, 2024

franz1981 Feb 2, 2024

franz1981 commented Feb 2, 2024

franz1981 commented Feb 5, 2024 •

edited

franz1981 commented Feb 10, 2024

franz1981 commented Feb 15, 2024

normanmaurer commented Feb 15, 2024

franz1981 commented Feb 15, 2024

normanmaurer commented Feb 15, 2024

franz1981 commented Feb 16, 2024 •

edited

normanmaurer commented Feb 16, 2024

franz1981 commented Feb 25, 2024

franz1981 commented Mar 4, 2024

franz1981 commented Apr 11, 2024

normanmaurer commented Apr 20, 2024

Save slicing HTTP 2 headers & data #13783

Are you sure you want to change the base?

Save slicing HTTP 2 headers & data #13783

Conversation

franz1981 commented Jan 16, 2024

franz1981 commented Jan 16, 2024 • edited

franz1981 commented Jan 16, 2024

franz1981 commented Jan 16, 2024 • edited

franz1981 commented Jan 16, 2024 • edited

franz1981 commented Jan 17, 2024

franz1981 commented Jan 22, 2024

carl-mastrangelo commented Jan 23, 2024

franz1981 commented Jan 23, 2024

normanmaurer commented Jan 24, 2024

franz1981 commented Jan 26, 2024

normanmaurer Jan 31, 2024

Choose a reason for hiding this comment

franz1981 Feb 1, 2024

Choose a reason for hiding this comment

franz1981 Feb 1, 2024 • edited

Choose a reason for hiding this comment

bryce-anderson left a comment

Choose a reason for hiding this comment

bryce-anderson Feb 2, 2024

Choose a reason for hiding this comment

franz1981 Feb 2, 2024

Choose a reason for hiding this comment

franz1981 commented Feb 2, 2024

franz1981 commented Feb 5, 2024 • edited

franz1981 commented Feb 10, 2024

franz1981 commented Feb 15, 2024

normanmaurer commented Feb 15, 2024

franz1981 commented Feb 15, 2024

normanmaurer commented Feb 15, 2024

franz1981 commented Feb 16, 2024 • edited

normanmaurer commented Feb 16, 2024

franz1981 commented Feb 25, 2024

franz1981 commented Mar 4, 2024

franz1981 commented Apr 11, 2024

normanmaurer commented Apr 20, 2024

franz1981 commented Jan 16, 2024 •

edited

franz1981 commented Jan 16, 2024 •

edited

franz1981 commented Jan 16, 2024 •

edited

franz1981 Feb 1, 2024 •

edited

franz1981 commented Feb 5, 2024 •

edited

franz1981 commented Feb 16, 2024 •

edited