New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce HTTP 1.1 Full msg pipeline traversals #13785
Conversation
This is similar to #13783, but I'm still working on thresholds and to simplify the code. Please let me know what you think about the idea behind it @normanmaurer @idelpivnitskiy My concerns related this change are related the heap buffer "unbounded" copy strategy:
The latter is more related the motivation behind this change:
In short, the cutoff value could be slightly bigger (1KB or a proportion vs the estimated headers?) for heap buffers, in order to avoid the first problem and still guarantee heap buffers to not have too much advantage over direct ones. |
24204eb
to
d98344b
Compare
@normanmaurer @chrisvest I've collected few data from a simple hello world benchmark, where the stack is using a byte[] backed On my local box, I've created a quarkus instance (which is using vertx and netty under the hood), using 4 I/O threads ( === event loops) with a very simple end point, like https://github.com/franz1981/quarkus-profiling-workshop/blob/master/src/main/java/profiling/workshop/greeting/GreetingResource.java but adding the The load generation is using $ h2load http://localhost:8080/hello -c 10 -m 10 -D 60s -t 2 --h1 To make sure it has enough cpu to push the server to the limit. before this change I got:
while, after:
which is an improvement of ~17% (!!). The flamegraphs and profiling data indeed report a similar story, showing that both the encoding and the promise success on response writing has halved their relative costs. I can make the CUTOFF value a static final and if you agree, this is good to go |
@franz1981 this is amazing! I like it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions but in general I think it's a sound strategy.
codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectEncoder.java
Outdated
Show resolved
Hide resolved
codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectEncoder.java
Outdated
Show resolved
Hide resolved
codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectEncoder.java
Outdated
Show resolved
Hide resolved
d98344b
to
43f0b1f
Compare
@bryce-anderson PTAL I've limited the copy size till 12.5% of the estimated header size, which means that if the estimation contains tons of headers (> 2K, which seems unlikely, we can still afford to embedd > 256 bytes content in) we limit the cost of a wrong estimation. I would prefer to raise the value in case of heap buffers, because they benefit a lot more from embedding the content, but I cannot see how to do it without loosing space in case of a wrong estimation... I can provide another commit to raise the limit to ~1K (or 512 bytes?) just for heap buffers if you agree, wdyt @normanmaurer ? The point is that request/responses act very differently, in common scenarios: "usually" requests tends to have more headers and less content, while responses, the opposite, unless it's a POST/PUT request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can provide another commit to raise the limit to ~1K (or 512 bytes?) just for heap buffers if you agree, wdyt @normanmaurer ?
@franz1981, I'll defer to you and @normanmaurer.
codec-http/src/main/java/io/netty/handler/codec/http/HttpObjectEncoder.java
Outdated
Show resolved
Hide resolved
Motivation: HttpObjectEncoder can already embed the data into the header buffer, enabling MessageToMessageEncoder to save allocating promise combiners and reducing the number of pipeline traversal. Sadly this happens only if the header estimation leave enough room in the header buffer. Modifications: Enhance the ability to embed the full msg content into the header buffer if under specific thresholds or for heap buffers, which will likely be copied into direct ones regardless, later one, anticipating the copy. Result: Faster small or heap data's writes
611964d
to
23fb903
Compare
With 23fb903 I've fixed a broken (from long time) benchmark as well, and I've compared before/after, getting before:
after:
which correctly show that:
In short, this benchmark shows that is beneficial to save creating the combiner, which we kind of know already, but is nice to have something which make it evident, although very artificial. |
PTAL @bryce-anderson @normanmaurer @He-Pin this could be good to go |
I got an idea to help "expert" users: what if I create a protected method which users could extend to decide by their own if account for the content or not? I could expose a method to know if the operation has succeeded actually, or there is no way to have a feedback... Conversely I can to add new sys property to make it configurable eg by exposing the current threshold values too: this last one could be enough, although it applies to both heap/direct, while possibly users which arrive to such low level tunable prefer to have more control over it. Any suggestion? |
I would just not do it for now... Let's keep things simple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@franz1981, I really like how this shook out: simple but very effective.
@franz1981 thanks a lot, this is great! Can you do a PR against main as well ? |
Yep @normanmaurer this could be done easily..for other changes I did, I was more confident newer JDK versions would have been fine, and didn't want to pollute the code with cheap tricks, but this one is different... |
Reduce HTTP 1.1 Full msg pipeline traversals Motivation: HttpObjectEncoder can already embed the data into the header buffer, enabling MessageToMessageEncoder to save allocating promise combiners and reducing the number of pipeline traversal. Sadly this happens only if the header estimation leave enough room in the header buffer. Modifications: Enhance the ability to embed the full msg content into the header buffer if under specific thresholds or for heap buffers, which will likely be copied into direct ones regardless, later one, anticipating the copy. Result: Faster small or heap data's writes
Reduce HTTP 1.1 Full msg pipeline traversals
Motivation:
HttpObjectEncoder can already embed the data into the header buffer, enabling MessageToMessageEncoder to save allocating promise combiners and reducing the number of pipeline traversal. Sadly this happens only if the header estimation leave enough room in the header buffer.
Modifications:
Enhance the ability to embed the full msg content into the header buffer if under specific thresholds or for heap buffers, which will likely be copied into direct ones regardless, later one, anticipating the copy.
Result:
Faster small or heap data's writes