Faster query string decoder #13473

franz1981 · 2023-07-07T12:26:06Z

Motivation:

decodeParams is not accounting for common decoding scenarios
making it highly unpredictable.

Modifications:

Arrange control flow to isolated predictable from unpredictable
checks to improve CPU usage (stalled frontend cycles)

Result:

Faster query decoding

franz1981 · 2023-07-07T12:26:38Z

@vietj too

franz1981 · 2023-07-07T12:30:39Z

These are the numbers with these changes:

Benchmark                                                Mode  Cnt   Score   Error   Units
QueryStringDecoderBenchmark.mixedDecoding               thrpt   40   3.278 ± 0.043  ops/us
QueryStringDecoderBenchmark.noDecoding                  thrpt   40  12.922 ± 0.041  ops/us
QueryStringDecoderBenchmark.nonStandardDecoding         thrpt   40   4.339 ± 0.014  ops/us
QueryStringDecoderBenchmark.onlyDecoding                thrpt   40   3.804 ± 0.015  ops/us
QueryStringDecoderBenchmark.singleResourceOneParameter  thrpt   40  23.210 ± 0.404  ops/us

compared to before:

Benchmark                                                Mode  Cnt   Score   Error   Units
QueryStringDecoderBenchmark.mixedDecoding               thrpt   40   2.641 ± 0.170  ops/us
QueryStringDecoderBenchmark.noDecoding                  thrpt   40  11.413 ± 0.112  ops/us
QueryStringDecoderBenchmark.nonStandardDecoding         thrpt   40   3.219 ± 0.058  ops/us
QueryStringDecoderBenchmark.onlyDecoding                thrpt   40   3.239 ± 0.114  ops/us
QueryStringDecoderBenchmark.singleResourceOneParameter  thrpt   40  22.001 ± 0.404  ops/us

This is with

VM version: JDK 17.0.7, Java HotSpot(TM) 64-Bit Server VM, 17.0.7+8-LTS-224

and

AMD Ryzen™ 9 7950X × 32@4.5 GHz

given that the most costy operation there is the component decoding, the improvements are ~5-30%: not bad, but in some next PR I can improve the rest too.

franz1981 · 2023-07-07T13:47:24Z

@viet I've added a new inteface (bad name, to improve), which could save users which have to perform
single shot decoding, to collect (and likely copy too) the decoded parameters in the provided linked map (IIRC Vertx prefer a MultiMap).

wdyt?

@chrisvest I'm opened to ideas to provide a decent API here: impl wise I've preferred to do some "dirty trick"
with parameters/return types to avoid copying the core algorithm twice.

Perf-wise these is what we get with the new API:

Benchmark                                                     Mode  Cnt   Score   Error   Units
QueryStringDecoderBenchmark.mixedDecodingNoMap               thrpt   40   3.722 ± 0.070  ops/us
QueryStringDecoderBenchmark.noDecodingNoMap                  thrpt   40  20.763 ± 0.040  ops/us
QueryStringDecoderBenchmark.nonStandardDecodingNoMap         thrpt   40   5.231 ± 0.017  ops/us
QueryStringDecoderBenchmark.onlyDecodingNoMap                thrpt   40   4.157 ± 0.019  ops/us
QueryStringDecoderBenchmark.singleResourceOneParameterNoMap  thrpt   40  38.208 ± 0.855  ops/us

Which remove the map allocation from the equation (that's unfair: users will likely have their own bespoke map there!)

franz1981 · 2023-07-07T13:51:08Z

@chrisvest when the API and the PR is stable I'll write/modify the existing tests to use it.

franz1981 · 2023-07-07T17:26:08Z

I Will perform some assembly analysis because some of the numbers are really weird and I suspect there is something going on inlining side.

codec-http/src/main/java/io/netty/handler/codec/http/QueryStringDecoder.java

franz1981 · 2023-07-08T08:55:26Z

The same results on my Intel machine (Intel® Core™ i7-9850H × 12@4.4 Ghz instead are (on JDK 17):

This pr:

Benchmark                                                Mode  Cnt   Score   Error   Units
QueryStringDecoderBenchmark.mixedDecoding               thrpt   20   1.517 ± 0.036  ops/us
QueryStringDecoderBenchmark.noDecoding                  thrpt   20   7.212 ± 0.095  ops/us
QueryStringDecoderBenchmark.nonStandardDecoding         thrpt   20   2.055 ± 0.020  ops/us
QueryStringDecoderBenchmark.onlyDecoding                thrpt   20   1.918 ± 0.031  ops/us
QueryStringDecoderBenchmark.singleResourceOneParameter  thrpt   20  13.406 ± 0.336  ops/us

4.1:

Benchmark                                                Mode  Cnt   Score   Error   Units
QueryStringDecoderBenchmark.mixedDecoding               thrpt   20   1.400 ± 0.018  ops/us
QueryStringDecoderBenchmark.noDecoding                  thrpt   20   6.143 ± 0.054  ops/us
QueryStringDecoderBenchmark.nonStandardDecoding         thrpt   20   1.677 ± 0.027  ops/us
QueryStringDecoderBenchmark.onlyDecoding                thrpt   20   1.837 ± 0.023  ops/us
QueryStringDecoderBenchmark.singleResourceOneParameter  thrpt   20  11.653 ± 0.138  ops/us

that are similar.
With JDK 8 instead, the cost is more biased around decoding name/value parameter and the importance of parsing is lessen (meaning that their contribution on the performance improvement is slower).
I will perform some more deeper analysis here, although the existing improvement is already decent IMO

franz1981 · 2023-07-17T15:27:44Z

@chrisvest I didn't overloaded the static methods to perform parameter decoding yet: do you think it to be a nice addition?
I've supposed it wasn't a widely adopted feature, really, but useful, so didn't wanted to make it "heavy" by adding many different methods.

Test-wise, I've added an additional assertParameters sanity check for all existing tests which perform some form of parameter validation, but it is using the existing instance method only.
I can add calling the other static ones, which perform some null checks,let me know your opinion here

franz1981 · 2023-07-18T14:48:28Z

I have another improvement coming from the decodeComponent part, hold on!

vietj · 2023-07-20T12:41:09Z

It would be great if we could have a static way of populating existing data structures, like

  public interface Populator<T> {

    void setQueryParam(String name, String value, T map);

  }

  public class MapPopulator implements Populator<Map<String, String>> {
    @Override
    public void setQueryParam(String name, String value, Map<String, String> map) {
      map.put(name, value);
    }
  }

franz1981 · 2023-07-20T13:40:59Z

thanks @vietj for the feedback, it's a great idea indeed!

franz1981 · 2023-07-28T15:40:28Z

After reading the different versions of the assembly produced by the benchmark I believe is it worthy to analyze what happen if the input sequence makes the JIT to decide for a different layout and order of the evaluated conditions (for the decoder state machine).

Hence, I've added a parameter to "confuse" JIT, and these are the numbers...
This PR:

Benchmark                                                       (confuseJIT)   Mode  Cnt   Score   Error   Units
QueryStringDecoderBenchmark.mixedDecodingNoMap                         false  thrpt   40   3.531 ± 0.081  ops/us
QueryStringDecoderBenchmark.mixedDecodingNoMap                          true  thrpt   40   3.393 ± 0.043  ops/us
QueryStringDecoderBenchmark.noDecodingNoMap                            false  thrpt   40  21.158 ± 0.311  ops/us
QueryStringDecoderBenchmark.noDecodingNoMap                             true  thrpt   40  18.403 ± 0.460  ops/us
QueryStringDecoderBenchmark.nonStandardDecodingNoMap                   false  thrpt   40   5.163 ± 0.302  ops/us
QueryStringDecoderBenchmark.nonStandardDecodingNoMap                    true  thrpt   40   4.104 ± 0.101  ops/us
QueryStringDecoderBenchmark.onlyDecodingNoMap                          false  thrpt   40   4.192 ± 0.007  ops/us
QueryStringDecoderBenchmark.onlyDecodingNoMap                           true  thrpt   40   3.775 ± 0.068  ops/us
QueryStringDecoderBenchmark.singleResourceOneParameterNoMap            false  thrpt   40  37.600 ± 0.190  ops/us
QueryStringDecoderBenchmark.singleResourceOneParameterNoMap             true  thrpt   40  30.313 ± 0.167  ops/us
QueryStringDecoderBenchmark.singleResourceThreeParametersNoMap         false  thrpt   40  14.004 ± 0.106  ops/us
QueryStringDecoderBenchmark.singleResourceThreeParametersNoMap          true  thrpt   40  12.960 ± 0.265  ops/us

4.1 but changed to work with the *NoMap benchmarks:

Benchmark                                                       (confuseJIT)   Mode  Cnt   Score   Error   Units
QueryStringDecoderBenchmark.mixedDecodingNoMap                         false  thrpt   40   3.242 ± 0.096  ops/us
QueryStringDecoderBenchmark.mixedDecodingNoMap                          true  thrpt   40   2.543 ± 0.108  ops/us
QueryStringDecoderBenchmark.noDecodingNoMap                            false  thrpt   40  15.523 ± 0.135  ops/us
QueryStringDecoderBenchmark.noDecodingNoMap                             true  thrpt   40  17.761 ± 0.951  ops/us
QueryStringDecoderBenchmark.nonStandardDecodingNoMap                   false  thrpt   40   4.318 ± 0.063  ops/us
QueryStringDecoderBenchmark.nonStandardDecodingNoMap                    true  thrpt   40   3.530 ± 0.070  ops/us
QueryStringDecoderBenchmark.onlyDecodingNoMap                          false  thrpt   40   3.219 ± 0.312  ops/us
QueryStringDecoderBenchmark.onlyDecodingNoMap                           true  thrpt   40   2.810 ± 0.090  ops/us
QueryStringDecoderBenchmark.singleResourceOneParameterNoMap            false  thrpt   40  38.327 ± 0.335  ops/us
QueryStringDecoderBenchmark.singleResourceOneParameterNoMap             true  thrpt   40  26.263 ± 1.522  ops/us
QueryStringDecoderBenchmark.singleResourceThreeParametersNoMap         false  thrpt   40   9.730 ± 0.045  ops/us
QueryStringDecoderBenchmark.singleResourceThreeParametersNoMap          true  thrpt   40  10.295 ± 0.674  ops/us

In short: the improvement is almost there, in many/most of the cases.
What I've noticed still by inspecting the assembly is that the main decoding method requires some spilling to happen
which could be saved, but the code is already very complex as it is, so I won't go down that rabbit hole.

I'll now impl the changes suggested by @vietj and we're ready to go.

franz1981 · 2023-08-11T21:14:10Z

303b2b2f2c9dacafb1044e0dd7a25e34a0fa0e3b has delivered another interesting speedup in almost all cases but nonStandardDecodingNoMap without confusingJIT

Benchmark                                                       (confuseJIT)   Mode  Cnt   Score   Error   Units
QueryStringDecoderBenchmark.mixedDecodingNoMap                         false  thrpt   40   3.408 ± 0.116  ops/us
QueryStringDecoderBenchmark.mixedDecodingNoMap                          true  thrpt   40   3.685 ± 0.114  ops/us
QueryStringDecoderBenchmark.noDecodingNoMap                            false  thrpt   40  21.339 ± 0.236  ops/us
QueryStringDecoderBenchmark.noDecodingNoMap                             true  thrpt   40  21.085 ± 0.678  ops/us
QueryStringDecoderBenchmark.nonStandardDecodingNoMap                   false  thrpt   40   4.856 ± 0.178  ops/us
QueryStringDecoderBenchmark.nonStandardDecodingNoMap                    true  thrpt   40   4.667 ± 0.041  ops/us
QueryStringDecoderBenchmark.onlyDecodingNoMap                          false  thrpt   40   4.084 ± 0.065  ops/us
QueryStringDecoderBenchmark.onlyDecodingNoMap                           true  thrpt   40   3.959 ± 0.141  ops/us
QueryStringDecoderBenchmark.singleResourceOneParameterNoMap            false  thrpt   40  38.688 ± 0.342  ops/us
QueryStringDecoderBenchmark.singleResourceOneParameterNoMap             true  thrpt   40  35.615 ± 0.295  ops/us
QueryStringDecoderBenchmark.singleResourceThreeParametersNoMap         false  thrpt   40  14.442 ± 0.163  ops/us
QueryStringDecoderBenchmark.singleResourceThreeParametersNoMap          true  thrpt   40  13.737 ± 0.512  ops/us

this seems worthy especially in case JIT branch profiling is polluted

franz1981 · 2023-08-21T06:50:09Z

@vietj @chrisvest the only downside of unifying the map/callback (with context) approaches is that it always allocate an empty linked map in case no parameters are found, while on 4.1 it just return an empty (singleton) one.
Any suggestion is appreciated to restore the previous behaviour and/or decide that is not worthy...

vietj · 2023-08-21T14:00:42Z

@franz1981 can you lazy create the collector and add a new method to create the collector on the parameter collector ?

interface ParameterCollector<C> {
  C newCollector();
  ...
}

Otherwise use similar to Map.compute(K key, BiFunction<? super K, ? super V, ? extends V> remappingFunction) where you pass null first time and the collector is responsible to create the instance and return it to subsquently use it

normanmaurer · 2023-08-22T09:20:13Z

codec-http/src/main/java/io/netty/handler/codec/http/QueryStringDecoder.java

@@ -191,16 +187,61 @@ public String path() {
        return path;
    }

+    @UnstableApi


I think you can remove this annotation as its package-private anyway

normanmaurer · 2023-08-22T09:21:14Z

codec-http/src/main/java/io/netty/handler/codec/http/QueryStringDecoder.java

+    /**
+     * Decodes {@link #uri()} parameters, reporting them in the provided {@code collector}'s map.
+     */
+    public void decodeParameters(Map<String, List<String>> parameters) {


I think renaming the param makes things more clear

Suggested change

public void decodeParameters(Map<String, List<String>> parameters) {

public void decodeParameters(Map<String, List<String>> out) {

normanmaurer · 2023-08-22T09:22:07Z

codec-http/src/main/java/io/netty/handler/codec/http/QueryStringDecoder.java

+     * and just store/report/filter them assuming random ordering, instead.
+     */
+    public interface ParameterCollector<C> {
+        void accept(String name, String value, C accumulator);


Suggested change

void accept(String name, String value, C accumulator);

void collect(String name, String value, C accumulator);

normanmaurer · 2023-08-22T09:22:43Z

codec-http/src/main/java/io/netty/handler/codec/http/QueryStringDecoder.java

+    }
+
+    private static int getFirstEscaped(String s, int from, int toExcluded, boolean isPath) {
+        int cutOff = !isPath? '+' : '%';


Let's make this easier to read

Suggested change

int cutOff = !isPath? '+' : '%';

int cutOff = isPath? '%' : '+' ;

normanmaurer · 2023-08-22T09:24:08Z

codec-http/src/main/java/io/netty/handler/codec/http/QueryStringDecoder.java

+                public void accept(String name, String value, Map<String, List<String>> accumulator) {
+                    List<String> values = accumulator.get(name);
+                    if (values == null) {
+                        values = new ArrayList<String>(1);


is 1 a good default ? Maybe use 4 so we dont need to expand directly

I've no idea - I would expect that query params usually have a single value for each param, most of the time, rather then more, but I have not much exp in it

normanmaurer · 2023-08-22T09:24:41Z

codec-http/src/main/java/io/netty/handler/codec/http/QueryStringDecoder.java

+        return -1;
+    }
+
+    private static int skipIf(String s, int from, int to, int ch) {


Suggested change

private static int skipIf(String s, int from, int to, int ch) {

private static int skipIf(CharSequence s, int from, int to, int ch) {

normanmaurer · 2023-08-22T09:24:49Z

codec-http/src/main/java/io/netty/handler/codec/http/QueryStringDecoder.java

    }

-    private static boolean addParam(String s, int nameStart, int valueStart, int valueEnd,
-                                    Map<String, List<String>> params, Charset charset) {
+    private static int indexOf(String s, int from, int to, int ch) {


Suggested change

private static int indexOf(String s, int from, int to, int ch) {

private static int indexOf(CharSequence s, int from, int to, int ch) {

normanmaurer · 2023-08-22T09:27:44Z

codec-http/src/test/java/io/netty/handler/codec/http/LegacyQueryStringDecoder.java

+        return strBuf.toString();
+    }
+
+    private static int findPathEndIndex(String uri) {


Suggested change

private static int findPathEndIndex(String uri) {

private static int findPathEndIndex(CharSequence uri) {

Motivation: decodeParams is not accounting for common decoding scenarios making it highly unpredictable. Modifications: Arrange control flow to isolated predictable from unpredictable checks to improve CPU usage (stalled frontend cycles) Result: Faster query decoding

normanmaurer · 2024-02-21T09:33:15Z

@franz1981 what is the status of this PR ?

franz1981 · 2024-02-25T20:25:06Z

Sorry @normanmaurer I have parked this due to others which required more attention; I will change it again to unify some of the existing code paths

franz1981 requested review from normanmaurer and chrisvest July 7, 2023 12:26

franz1981 force-pushed the batch_query_decoder branch 2 times, most recently from 63da020 to cd1100d Compare July 7, 2023 13:42

chrisvest reviewed Jul 7, 2023

View reviewed changes

franz1981 force-pushed the batch_query_decoder branch 5 times, most recently from 3e6605c to a821f4e Compare July 9, 2023 14:02

franz1981 force-pushed the batch_query_decoder branch from 97ad30f to 1107a07 Compare July 17, 2023 15:24

franz1981 force-pushed the batch_query_decoder branch from 1107a07 to b3b27c3 Compare July 21, 2023 14:22

franz1981 force-pushed the batch_query_decoder branch from a7a971f to 7a6842f Compare July 28, 2023 15:42

franz1981 mentioned this pull request Aug 11, 2023

Optimize method HttpStatusClass.valueOf(int) #13543

Merged

chrisvest approved these changes Aug 17, 2023

View reviewed changes

normanmaurer requested changes Aug 22, 2023

View reviewed changes

franz1981 added 2 commits September 19, 2023 05:57

Address Norman's comment

329d1d5

franz1981 force-pushed the batch_query_decoder branch from 5275a10 to 329d1d5 Compare September 19, 2023 04:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster query string decoder #13473

Faster query string decoder #13473

franz1981 commented Jul 7, 2023

franz1981 commented Jul 7, 2023

franz1981 commented Jul 7, 2023 •

edited

franz1981 commented Jul 7, 2023 •

edited

franz1981 commented Jul 7, 2023

franz1981 commented Jul 7, 2023

franz1981 commented Jul 8, 2023 •

edited

franz1981 commented Jul 17, 2023

franz1981 commented Jul 18, 2023

vietj commented Jul 20, 2023 •

edited

franz1981 commented Jul 20, 2023

franz1981 commented Jul 28, 2023

franz1981 commented Aug 11, 2023

franz1981 commented Aug 21, 2023

vietj commented Aug 21, 2023

normanmaurer Aug 22, 2023

normanmaurer Aug 22, 2023

normanmaurer Aug 22, 2023

normanmaurer Aug 22, 2023

normanmaurer Aug 22, 2023

franz1981 Sep 19, 2023

normanmaurer Aug 22, 2023

normanmaurer Aug 22, 2023

normanmaurer Aug 22, 2023

normanmaurer commented Feb 21, 2024 •

edited

franz1981 commented Feb 25, 2024

	public void decodeParameters(Map<String, List<String>> parameters) {
	public void decodeParameters(Map<String, List<String>> out) {

	void accept(String name, String value, C accumulator);
	void collect(String name, String value, C accumulator);

	int cutOff = !isPath? '+' : '%';
	int cutOff = isPath? '%' : '+' ;

	private static int skipIf(String s, int from, int to, int ch) {
	private static int skipIf(CharSequence s, int from, int to, int ch) {

	private static int indexOf(String s, int from, int to, int ch) {
	private static int indexOf(CharSequence s, int from, int to, int ch) {

	private static int findPathEndIndex(String uri) {
	private static int findPathEndIndex(CharSequence uri) {

Faster query string decoder #13473

Are you sure you want to change the base?

Faster query string decoder #13473

Conversation

franz1981 commented Jul 7, 2023

franz1981 commented Jul 7, 2023

franz1981 commented Jul 7, 2023 • edited

franz1981 commented Jul 7, 2023 • edited

franz1981 commented Jul 7, 2023

franz1981 commented Jul 7, 2023

franz1981 commented Jul 8, 2023 • edited

franz1981 commented Jul 17, 2023

franz1981 commented Jul 18, 2023

vietj commented Jul 20, 2023 • edited

franz1981 commented Jul 20, 2023

franz1981 commented Jul 28, 2023

franz1981 commented Aug 11, 2023

franz1981 commented Aug 21, 2023

vietj commented Aug 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

normanmaurer commented Feb 21, 2024 • edited

franz1981 commented Feb 25, 2024

franz1981 commented Jul 7, 2023 •

edited

franz1981 commented Jul 7, 2023 •

edited

franz1981 commented Jul 8, 2023 •

edited

vietj commented Jul 20, 2023 •

edited

normanmaurer commented Feb 21, 2024 •

edited