Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying SWAR Technique to AsciiString #13522

Open
jchrys opened this issue Jul 31, 2023 · 8 comments · May be fixed by #13534
Open

Applying SWAR Technique to AsciiString #13522

jchrys opened this issue Jul 31, 2023 · 8 comments · May be fixed by #13534

Comments

@jchrys
Copy link
Contributor

jchrys commented Jul 31, 2023

Currently AsciiString utilizes naive iterative approach for its indexOf and 'finding first lower case and upper case' methods.
However, we can improve the performance of these methods by implementing the SWAR technique, similar to the approach used in ByteBuf.

@jchrys
Copy link
Contributor Author

jchrys commented Jul 31, 2023

Hello, @franz1981 @normanmaurer @chrisvest. I was investigating the ByteBuf.indexOf implementation and noticed that we could apply a similar approach to optimize the search methods in AsciiString. What do you think about this?

@jchrys
Copy link
Contributor Author

jchrys commented Aug 1, 2023

I've just run a benchmark by roughly applying it.

AWS EC2 c4.2xlarge, CPU(s): 8, Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz, openjdk 17.0.8, Ubuntu 22.04.2 LTS

Benchmark                                (logPermutations)  (size)   Mode  Cnt       Score      Error   Units
AsciiStringIndexOfBenchmark.indexOf                      4       7  thrpt   10  134235.176 ±   59.290  ops/ms
AsciiStringIndexOfBenchmark.indexOf                      4      16  thrpt   10   84131.521 ±   76.425  ops/ms
AsciiStringIndexOfBenchmark.indexOf                      4      23  thrpt   10   77528.532 ±  306.039  ops/ms
AsciiStringIndexOfBenchmark.indexOf                      4      32  thrpt   10   67820.444 ±  365.438  ops/ms

AsciiStringIndexOfBenchmark.swarindexOf                  4       7  thrpt   10  145399.712 ± 1033.746  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                  4      16  thrpt   10   96064.934 ±  340.456  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                  4      23  thrpt   10   81721.631 ±   58.957  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                  4      32  thrpt   10   74168.100 ±  141.869  ops/ms

AsciiStringIndexOfBenchmark.indexOf                     11       7  thrpt   10  123494.290 ± 2517.067  ops/ms
AsciiStringIndexOfBenchmark.indexOf                     11      16  thrpt   10   78763.717 ± 2119.599  ops/ms
AsciiStringIndexOfBenchmark.indexOf                     11      23  thrpt   10   70824.950 ± 1041.296  ops/ms
AsciiStringIndexOfBenchmark.indexOf                     11      32  thrpt   10   55250.589 ±  393.550  ops/ms

AsciiStringIndexOfBenchmark.swarindexOf                 11       7  thrpt   10  142837.744 ± 2055.200  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                 11      16  thrpt   10   94393.279 ±   75.500  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                 11      23  thrpt   10   80677.253 ±  113.065  ops/ms
AsciiStringIndexOfBenchmark.swarindexOf                 11      32  thrpt   10   73366.954 ±  359.819  ops/ms

@chrisvest
Copy link
Contributor

@jchrys The performance gains are probably limited by the strings being so short. I think most ASCII strings are short, though. How are you doing multi byte loads without unsafe?

@jchrys
Copy link
Contributor Author

jchrys commented Aug 1, 2023

@chrisvest I didn't actually employ multi byte loads when noUnsafe=true. I only attached them to ensure there was no performance loss in noUnsafe=true case. but it seems that it might cause confusion. let me remove it.

@chrisvest
Copy link
Contributor

Ok, makes sense. Yeah, I think this looks worthwhile. Feel free to open a PR and ping me. 👍

@jchrys
Copy link
Contributor Author

jchrys commented Aug 1, 2023

Absolutely! I'll ping you once it's ready. Thanks!

@franz1981
Copy link
Contributor

Well done @jchrys I see you have absorbed the idea to stress the input sequence to understand if an approach is worthwhile :P
I would check the numbers in the ByteBuf version, to understand if there's something off here: I expect nearly double performance quite soon, if made right

@jchrys
Copy link
Contributor Author

jchrys commented Aug 1, 2023

@franz1981 Thanks a lot! I will certainly look into it. I truly appreciate your advice!

@jchrys jchrys linked a pull request Aug 7, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants