Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take advantage of avx512 instructions when available #85

Open
tkaitchuck opened this issue May 22, 2021 · 1 comment
Open

Take advantage of avx512 instructions when available #85

tkaitchuck opened this issue May 22, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@tkaitchuck
Copy link
Owner

Using the instruction: https://doc.rust-lang.org/nightly/core/arch/x86_64/fn._mm512_aesdec_epi128.html
It should be possible to get 4x the throughput on large strings.

Note: Currently very few processors support this.

@tkaitchuck tkaitchuck added the enhancement New feature or request label May 22, 2021
@SchrodingerZhu
Copy link

SchrodingerZhu commented Oct 19, 2022

@tkaitchuck
I have an ongoing investigation at:
SchrodingerZhu@c117213

This utilizes 256bit SIMD registers that both relatively new intel and amd cpus support.
I haven't run the benchmark yet. But according to my previous experience with VPCLMULQDQ (which is used in CRC64 calculation), such change should be able to bring speed up.

Also notice that if you really want to use 512bit registers, just unroll more loops and similar tricks apply.


As you may have noticed that the commit above was a little bit messy. This is because I noticed some potential bugs within rust's core that makes the code generation bad for Zen 3 CPUs. I will sync the info once I reported the issue to rust's stdarch library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants