Applying SWAR Technique to `AsciiString` #13522

jchrys · 2023-07-31T21:23:27Z

Currently AsciiString utilizes naive iterative approach for its indexOf and 'finding first lower case and upper case' methods.
However, we can improve the performance of these methods by implementing the SWAR technique, similar to the approach used in ByteBuf.

The text was updated successfully, but these errors were encountered:

jchrys · 2023-07-31T22:45:03Z

Hello, @franz1981 @normanmaurer @chrisvest. I was investigating the ByteBuf.indexOf implementation and noticed that we could apply a similar approach to optimize the search methods in AsciiString. What do you think about this?

jchrys · 2023-08-01T02:49:47Z

I've just run a benchmark by roughly applying it.

AWS EC2 c4.2xlarge, CPU(s): 8, Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz, openjdk 17.0.8, Ubuntu 22.04.2 LTS

Benchmark                                (logPermutations)  (size)   Mode  Cnt       Score      Error   Units
AsciiStringIndexOfBenchmark.indexOf                      4       7  thrpt   10  134235.176 ±   59.290  ops/ms
AsciiStringIndexOfBenchmark.indexOf                      4      16  thrpt   10   84131.521 ±   76.425  ops/ms
AsciiStringIndexOfBenchmark.indexOf                      4      23  thrpt   10   77528.532 ±  306.039  ops/ms
AsciiStringIndexOfBenchmark.indexOf                      4      32  thrpt   10   67820.444 ±  365.438  ops/ms

AsciiStringIndexOfBenchmark.swarindexOf                  4       7  thrpt   10  145399.712 ± 1033.746  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                  4      16  thrpt   10   96064.934 ±  340.456  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                  4      23  thrpt   10   81721.631 ±   58.957  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                  4      32  thrpt   10   74168.100 ±  141.869  ops/ms

AsciiStringIndexOfBenchmark.indexOf                     11       7  thrpt   10  123494.290 ± 2517.067  ops/ms
AsciiStringIndexOfBenchmark.indexOf                     11      16  thrpt   10   78763.717 ± 2119.599  ops/ms
AsciiStringIndexOfBenchmark.indexOf                     11      23  thrpt   10   70824.950 ± 1041.296  ops/ms
AsciiStringIndexOfBenchmark.indexOf                     11      32  thrpt   10   55250.589 ±  393.550  ops/ms

AsciiStringIndexOfBenchmark.swarindexOf                 11       7  thrpt   10  142837.744 ± 2055.200  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                 11      16  thrpt   10   94393.279 ±   75.500  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                 11      23  thrpt   10   80677.253 ±  113.065  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                 11      32  thrpt   10   73366.954 ±  359.819  ops/ms

chrisvest · 2023-08-01T05:17:48Z

@jchrys The performance gains are probably limited by the strings being so short. I think most ASCII strings are short, though. How are you doing multi byte loads without unsafe?

jchrys · 2023-08-01T05:30:20Z

@chrisvest I didn't actually employ multi byte loads when noUnsafe=true. I only attached them to ensure there was no performance loss in noUnsafe=true case. but it seems that it might cause confusion. let me remove it.

chrisvest · 2023-08-01T05:47:52Z

Ok, makes sense. Yeah, I think this looks worthwhile. Feel free to open a PR and ping me. 👍

jchrys · 2023-08-01T06:05:04Z

Absolutely! I'll ping you once it's ready. Thanks!

franz1981 · 2023-08-01T06:08:56Z

Well done @jchrys I see you have absorbed the idea to stress the input sequence to understand if an approach is worthwhile :P
I would check the numbers in the ByteBuf version, to understand if there's something off here: I expect nearly double performance quite soon, if made right

jchrys · 2023-08-01T06:42:29Z

@franz1981 Thanks a lot! I will certainly look into it. I truly appreciate your advice!

jchrys linked a pull request Aug 7, 2023 that will close this issue

[DRAFT]Enhance Performance of AsciiString Methods #13534

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying SWAR Technique to `AsciiString` #13522

Applying SWAR Technique to `AsciiString` #13522

jchrys commented Jul 31, 2023

jchrys commented Jul 31, 2023 •

edited

jchrys commented Aug 1, 2023 •

edited

chrisvest commented Aug 1, 2023

jchrys commented Aug 1, 2023 •

edited

chrisvest commented Aug 1, 2023

jchrys commented Aug 1, 2023

franz1981 commented Aug 1, 2023

jchrys commented Aug 1, 2023 •

edited

Applying SWAR Technique to AsciiString #13522

Applying SWAR Technique to AsciiString #13522

Comments

jchrys commented Jul 31, 2023

jchrys commented Jul 31, 2023 • edited

jchrys commented Aug 1, 2023 • edited

chrisvest commented Aug 1, 2023

jchrys commented Aug 1, 2023 • edited

chrisvest commented Aug 1, 2023

jchrys commented Aug 1, 2023

franz1981 commented Aug 1, 2023

jchrys commented Aug 1, 2023 • edited

Applying SWAR Technique to `AsciiString` #13522

Applying SWAR Technique to `AsciiString` #13522

jchrys commented Jul 31, 2023 •

edited

jchrys commented Aug 1, 2023 •

edited

jchrys commented Aug 1, 2023 •

edited

jchrys commented Aug 1, 2023 •

edited