New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keccak: add aarch64
ASM implementation for f1600
#23
Conversation
The provenance of this code appears to be the Linux kernel and therefore it's GPLv2 licensed, which is quite restrictive. We generally look for the most liberal licenses possible even making exceptions for our normal Apache 2.0+MIT for assembly. Perhaps you could look at the eXtended Keccak Code Package (XKCP)? There appears to be an implementation here, although I'm not quite clear on if it uses I believe that code is in the public domain, although it doesn't have a specific public domain dedication (e.g. CC0) outside the toplevel |
Regardless of what implementation we go with, it would also be good to have Here's an example of how we do them elsewhere: https://github.com/RustCrypto/block-ciphers/blob/4334b85/.github/workflows/aes.yml#L222-L249 |
It's basically the ARM reference implementation (see gigantic pdf): The only difference between the reference and this one / the Linux kernel one is that the registers are renamed so they are consecutive v0-v24. This makes loading/storing a bit easier. The loading/storing code here is original (the Linux one does some more clever tricks for block processing). |
@recmo okay, can you make the code match the ARM reference manual, add a reference to its origin and remove references to the Linux kernel? That should hopefully address the IPR issues. |
As far as I can tell (not the most readable source) it's not using any of the |
Ah, and here is an XKCP one, just a different repo: https://github.com/XKCP/K12/blob/master/lib/ARMv8Asha3/KeccakP-1600-ARMv8Asha3.S#L69 |
The problem with the ARM reference version (and why everyone changes it) is that it doesn't loop nicely. The output registers are in a different order than the input. I can change it to the register convention from XKCP. Does that work? |
Sure, that's fine |
Should we consider this change MSRV-compatible? Also, have you tried to use the arch intrinsics? If there is no substantial difference in performance between generated assembly, I think we should prefer an intrinsics-based implementation behind nightly-only crate feature (or configuration parameter). It also should help with keeping state in registers between rounds. |
@newpavlov to avoid MSRV incompatibilities we can have an Re: intrinsics, my guess would be the ones needed are unstable, where |
Yeah, I will be fine with introducing an |
I did look at them and the required instructions have intrinsics. But they require nightly+experimental while the asm block will work on stable. For example: https://doc.rust-lang.org/stable/core/arch/aarch64/fn.veor3q_u64.html
Good point! We can change it to intrinsics once they are stable. |
86e21d2
to
6e7a854
Compare
6e7a854
to
0f586b0
Compare
Finally got the cross test to build with the On stable it tests the assembly version: https://github.com/RustCrypto/sponges/actions/runs/3353380335/jobs/5556112925#step:6:91 On MSRV it falls back to non-assembly without complaints: https://github.com/RustCrypto/sponges/actions/runs/3353380335/jobs/5556112899#step:6:90 I assume this is because MSRV doesn't support |
@recmo
Would you be interested in prototyping it in a separate PR? We can have both backends simultaneously. It could be interesting to compare their performance. |
That's easy enough to bump. Something else it'd be good to add to this PR is runtime feature detection using the The You can look at the |
Going to go ahead and merge this. I'll take care of the |
Thanks! |
aarch64
ASM implementation for f1600
This uses ARMv8.4-A FEAT_SHA3 instructions which can be found on recent Apple processors. (tested on M1, but ARMv8.4 is as early as A13, but I can't find information if A13 implements the SHA3 extensions)