Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of Concept: VAES Support #144

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

SchrodingerZhu
Copy link

The idea is to add use VAES instruction to scan wider length each loop iteration.
(I also tested scan the same length per loop with less instruction, but it does not speed up at all).

We can gain 100% speed up.

Without VAES

aeshash/string          time:   [59.425 ns 59.475 ns 59.523 ns]                           
aeshash/wider-string    time:   [1.2053 µs 1.2062 µs 1.2075 µs]   

With VAES:

aeshash/string          time:   [52.435 ns 52.511 ns 52.604 ns]                           
                        change: [-11.669% -11.523% -11.383%] (p = 0.00 < 0.05)
                        Performance has improved.
aeshash/wider-string    time:   [531.93 ns 532.18 ns 532.47 ns]                                  
                        change: [-55.856% -55.809% -55.765%] (p = 0.00 < 0.05)
                        Performance has improved.

Notice:

  • aeshash/wider-string is same as aeshash/string except that its lengths set has larger data point.
  • vaes passed all quality tests but its hash value may not be compatible with non-aes targets.

@SchrodingerZhu SchrodingerZhu reopened this Nov 6, 2022
@SchrodingerZhu
Copy link
Author

All SMHasher tests passed with latest fix

@SchrodingerZhu
Copy link
Author

SchrodingerZhu commented Nov 6, 2022

On EPYC 7773X, one can get the following results with this patch:

# AHASH64
Average      - 34.263 bytes/cycle - 98028.08 MiB/sec @ 3 ghz
# WYHASH
Average      - 10.938 bytes/cycle - 31293.76 MiB/sec @ 3 ghz
# XXH3
Average      - 20.607 bytes/cycle - 58957.09 MiB/sec @ 3 ghz
# T1HA0_AES_AVX2
Average      - 32.612 bytes/cycle - 93304.43 MiB/sec @ 3 ghz

Without VAES feature:

Average      - 17.718 bytes/cycle - 50690.56 MiB/sec @ 3 ghz

current[2] = current[2].aesenc(blocks[2]);
current[3] = current[3].aesenc(blocks[3]);
sum[0] = sum[0].shuffle_and_add(blocks[0]);
sum[1] = sum[1].shuffle_and_add(blocks[1]);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps sum[0]

Comment on lines +46 to +47
# Use VAES extension if possible. The hash value may be incompatible with NON-VAES targets
vaes = []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use cfg to detect the feature and don't need a feature declared here.

Comment on lines +57 to +59
// Rust is confused with targets supporting VAES without AVX512 extensions.
// We need to manually specify the underlying intrinsic; otherwise the compiler
// will have trouble inlining the code.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a link to an issue on this?

target_feature = "avx512vaes",
not(miri)
))]
if data.len() > 128 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding another 'if' I think a cleaner way to handle this would be to add a function to factor out a method in operations. Like there could be aesenc_x4 which then uses cfg to provide one implementation or the other depending on if the cpu instruction is available.

@tkaitchuck
Copy link
Owner

Thanks for putting this together.
Would you like to collaborate working in a single branch?
I don't have such a CPU to test on, which has held me back on this feature.

@SchrodingerZhu
Copy link
Author

@tkaitchuck Sorry for the long delay. I was held back by some other stuffs. I would like to push this forward. Any suggestion for what to do next?

@SchrodingerZhu
Copy link
Author

I will give this another try in a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants