New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save slicing HTTP 2 headers & data #13783
base: 4.1
Are you sure you want to change the base?
Conversation
@normanmaurer I'm not quite sure i could reuse the promise, so please take a look if I'm not assuming anything wrong... The cutoff value has been decided by summing the required bytes for both a new sliced buffer (around 40 bytes ) + outbound buffer's entries (64 or 64 * 2 bytes, depending if headers/data paths) + promise aggregator (64 bytes), which means > 128 bytes. From what I can see, encoded response headers below 128 bytes are quite common, while for the data, depends, but it is still a possible scenario, although more typical in benchmarks IMO. |
cb324a0
to
bcfaf83
Compare
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Outdated
Show resolved
Hide resolved
This is just a preliminary example of the improvement, given that it doesn't account for the saving of the aggregate promise (which translate in less morphism of call-sites around the promises handling): before:
now:
which is already ~10-20% improvement just on headers/data writing. |
An additional benefit (specific to Vertx, but can really happen regardless), is that the data written by users could uses heap-based this code path just disappear with this PR, given that no heap buffer is passed to the transport. I've added 003d1e29350c47fe50d40456312b25d5e91720da to ignore the cutoff limits if the data is held in an array buffer, so we would perform the copy earlier, saving creating the aggregate promise and the additional pipeline traversal |
3faca11
to
003d1e2
Compare
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Outdated
Show resolved
Hide resolved
I think I have a leak when the data buffer is not sent anymore across the transport, and be released after flushed. I should release it right after copied |
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Outdated
Show resolved
Hide resolved
@ejona86 @carl-mastrangelo do you have any chance (or any pointer to make me try it) to verify/validate some of the changes I am sending recently, performance-wise? I'm thinking about gRpc over HTTP 2 and having something different from our Quarkus/Vertx stack would help a lot |
83232f4
to
4438bed
Compare
@bryce-anderson this is using a very similar optimization although I couldn't push it as far as http 1.1 and make it produce a single buffer overall (we don't even presize the header here :/) |
Alas, I have moved on from the gRPC team, and don't have access to sample http2 traffic. |
Thanks @carl-mastrangelo to have answered; if there is some Google users interested into grpc Netty, please reach me out here |
@franz1981 Please let me know once this is ready again for review |
Yep, @normanmaurer this should be ready to go as well |
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Outdated
Show resolved
Hide resolved
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Show resolved
Hide resolved
66cca39
to
157a4a4
Compare
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Show resolved
Hide resolved
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Show resolved
Hide resolved
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Show resolved
Hide resolved
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Show resolved
Hide resolved
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Outdated
Show resolved
Hide resolved
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Outdated
Show resolved
Hide resolved
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Outdated
Show resolved
Hide resolved
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Show resolved
Hide resolved
ctx.write(buf, promise); | ||
} catch (Throwable t) { | ||
promise.setFailure(t); | ||
PlatformDependent.throwException(t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why re throw ? feels wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goog point; I've blindly copied from the original code this one iirc, but as you said, doesn't look right 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel yet comfortable to change it yet, i see that while throwing the exception it allow to correctly release in the caller the expected "leaky" data. I would leave it as it is, unless you want me to address the original code behaviour now (which is fine, just need to know you opinion here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments that crossed my mind, I haven't fully looked through this yet.
The PromiseAggregator
instances got me wondering: could this same goal be achieved using a CompositeByteBuf
? Perhaps that is just passing the buck so to speak, but if the transport implementations are intelligently handling those types of buffers we can avoid the pipeline traversals easily.
if (paddingBytes > 0) { | ||
// this is quite fast assuming small padding | ||
// see https://github.com/netty/netty/pull/13693 | ||
buf.writeZero(paddingBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Can we use that down in writeHeadersPartsInternal
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, given that is very tiny implementation detail, I assumed to leverage on it just on the hottest path ie single buffer, no promise aggregation
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Show resolved
Hide resolved
codec-http2/src/main/java/io/netty/handler/codec/http2/DefaultHttp2FrameWriter.java
Show resolved
Hide resolved
That will both help with traversal, saving aggregate promises and saving entries on the outbound buffer, which is something I would explore regardless but, I still believe it tries to leverage writev on the transport, which is a shame to me (@normanmaurer am I right? ) when components are such small. While what this pr deliver is another additional improvement (without introducing new buffer types); it immediately release the buffer which contains the data, embedding its content, making it available again to be re-used and saving the inherent additional heap cost of allocating the slice, amortizing it in the buffer which embedd it. |
8de883e
to
aab1682
Compare
PTAL @bryce-anderson @normanmaurer Related the suggestion to use |
Motivation: DefaultHttp2FrameWriter's always create aggregate promises and slices out of headers and data: both could be saved while reducing the amount of pipeline traversals in case the additional cost of creating a sliced buffer surpass the data to be written. Modifications: Small header and data could be copied directly in a single buffer, without any need to create aggregate promises. Result: Faster small data's writes
aab1682
to
1db28a0
Compare
@normanmaurer I see that the original code was throwing exceptions, while not expected AND not correctly releasing multiple retained header buffers; I could try fix the former but the latter would requires a separate pr really, given that seems a much more annoying and complex technical debt, which didn't yet bitten us. |
mmm @normanmaurer what happened for 33f5c56? :P |
@franz1981 what you mean ? |
@normanmaurer I've received notification of |
@franz1981 ah this was just to bring it up to date with current 4.1 |
@normanmaurer there is something more you want me to take a look for this before getting another review round? As said, I am not very confident the original code was correctly handling leaks here, but is really tons of code and we should address it separately (eg all the header retained slices around, the thrown you noticed in some earlier comment) - ideally first, if we think them to be critic. What I could do is to try address them already for the fast-paths I've introduced, but will create the weird situation, where the optimized (although common enough, I suppose) new stuff is more correct than what exists from some time already... |
@franz1981 just hold of a bit... I will try to find time to review this one first |
@bryce-anderson if you want to give it another shot, I have tried implementing your suggestion to further reduce the number of allocations |
Any other concerns @bryce-anderson or @normanmaurer ? |
@normanmaurer @chrisvest any news for this folks? |
@ejona86 @idelpivnitskiy @bryce-anderson can you check again ? |
Motivation:
DefaultHttp2FrameWriter's always create aggregate promises and slices out of headers and data: both could be saved while reducing the amount of pipeline traversals in case the additional cost of creating a sliced buffer surpass the data to be written.
Modifications:
Small header and data could be copied directly in a single buffer, without any need to create aggregate promises.
Result:
Faster small data's writes