Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keccak: add aarch64 ASM implementation for f1600 #23

Merged
merged 3 commits into from Nov 1, 2022

Conversation

recmo
Copy link
Contributor

@recmo recmo commented Oct 29, 2022

This uses ARMv8.4-A FEAT_SHA3 instructions which can be found on recent Apple processors. (tested on M1, but ARMv8.4 is as early as A13, but I can't find information if A13 implements the SHA3 extensions)

@tarcieri
Copy link
Member

tarcieri commented Oct 29, 2022

The provenance of this code appears to be the Linux kernel and therefore it's GPLv2 licensed, which is quite restrictive. We generally look for the most liberal licenses possible even making exceptions for our normal Apache 2.0+MIT for assembly.

Perhaps you could look at the eXtended Keccak Code Package (XKCP)?

https://github.com/XKCP/XKCP

There appears to be an implementation here, although I'm not quite clear on if it uses FEAT_SHA3 or not:

https://github.com/XKCP/XKCP/blob/master/lib/low/KeccakP-1600-times2/SIMD128/KeccakP-1600-times2-SIMD128.c

I believe that code is in the public domain, although it doesn't have a specific public domain dedication (e.g. CC0) outside the toplevel LICENSE file.

@tarcieri
Copy link
Member

Regardless of what implementation we go with, it would also be good to have cross tests in CI.

Here's an example of how we do them elsewhere:

https://github.com/RustCrypto/block-ciphers/blob/4334b85/.github/workflows/aes.yml#L222-L249

@recmo
Copy link
Contributor Author

recmo commented Oct 29, 2022

The provenance of this code appears to be the Linux kernel and therefore it's GPLv2 licensed, which is quite restrictive. We generally look for the most liberal licenses possible even making exceptions for our normal Apache 2.0+MIT for assembly.

It's basically the ARM reference implementation (see gigantic pdf):

Screenshot 2022-10-29 at 13 42 53

The only difference between the reference and this one / the Linux kernel one is that the registers are renamed so they are consecutive v0-v24. This makes loading/storing a bit easier. The loading/storing code here is original (the Linux one does some more clever tricks for block processing).

@tarcieri
Copy link
Member

@recmo okay, can you make the code match the ARM reference manual, add a reference to its origin and remove references to the Linux kernel?

That should hopefully address the IPR issues.

@recmo
Copy link
Contributor Author

recmo commented Oct 29, 2022

Perhaps you could look at the eXtended Keccak Code Package (XKCP)?

As far as I can tell (not the most readable source) it's not using any of the FEAT_SHA3 codes. I can check openssl and supercop implementations.

@recmo
Copy link
Contributor Author

recmo commented Oct 29, 2022

@recmo
Copy link
Contributor Author

recmo commented Oct 29, 2022

The problem with the ARM reference version (and why everyone changes it) is that it doesn't loop nicely. The output registers are in a different order than the input.

I can change it to the register convention from XKCP. Does that work?

@tarcieri
Copy link
Member

Sure, that's fine

@newpavlov
Copy link
Member

Should we consider this change MSRV-compatible? asm! was stabilized in 1.59, which is much bigger than MSRV of this crate and, more importantly, than MSRV of sha3. The sha3 target feature is unstable, so IIUC it should not be enabled by target-cpu=native, but strictly speaking sha3 may get enabled on pre-1.59 toolchains.

Also, have you tried to use the arch intrinsics? If there is no substantial difference in performance between generated assembly, I think we should prefer an intrinsics-based implementation behind nightly-only crate feature (or configuration parameter). It also should help with keeping state in registers between rounds.

@tarcieri
Copy link
Member

@newpavlov to avoid MSRV incompatibilities we can have an asm feature.

Re: intrinsics, my guess would be the ones needed are unstable, where asm! is stable

@newpavlov
Copy link
Member

Yeah, I will be fine with introducing an asm feature with MSRV note until the necessary intrinsics get stabilized. Ideally we would have both implementations (asm and intrinsics based), but the latter could be added in a separate PR.

@recmo
Copy link
Contributor Author

recmo commented Oct 29, 2022

Also, have you tried to use the arch intrinsics?

I did look at them and the required instructions have intrinsics. But they require nightly+experimental while the asm block will work on stable.

For example: https://doc.rust-lang.org/stable/core/arch/aarch64/fn.veor3q_u64.html

It also should help with keeping state in registers between rounds.

Good point! We can change it to intrinsics once they are stable.

@recmo recmo force-pushed the keccak-aarch64-sha3 branch 10 times, most recently from 86e21d2 to 6e7a854 Compare October 29, 2022 23:41
@recmo
Copy link
Contributor Author

recmo commented Oct 29, 2022

Finally got the cross test to build with the SHA3 feature (it requires upgrading cross to a later version than cross-install provides and forcing it enabled through RUSTFLAGS)

On stable it tests the assembly version:

https://github.com/RustCrypto/sponges/actions/runs/3353380335/jobs/5556112925#step:6:91

On MSRV it falls back to non-assembly without complaints:

https://github.com/RustCrypto/sponges/actions/runs/3353380335/jobs/5556112899#step:6:90

I assume this is because MSRV doesn't support target_feature = "sha3". But this raises the question if there are rust versions in between MSRV and stable that support target_feature = "sha3" but do not support asm! or those opcodes.

@newpavlov
Copy link
Member

@recmo
As discussed earlier, you should gate the functionality on the asm feature. Do not forget to add a note about bumped MSRV. It could look like this.

Good point! We can change it to intrinsics once they are stable.

Would you be interested in prototyping it in a separate PR? We can have both backends simultaneously. It could be interesting to compare their performance.

@tarcieri
Copy link
Member

it requires upgrading cross to a later version than cross-install provides

That's easy enough to bump.

Something else it'd be good to add to this PR is runtime feature detection using the cpufeatures crate.

The sha3 feature is already supported on ARM, so it shouldn't be too hard to add, and can replace the compile-time gating with selecting the best available implementation at runtime (with automatic compile-time gating if the appropriate target features are enabled).

You can look at the aes and polyval crates as examples which perform this sort of runtime feature gating on ARM platforms.

@tarcieri
Copy link
Member

tarcieri commented Nov 1, 2022

Going to go ahead and merge this. I'll take care of the cpufeatures integration (I have an M1 I can test it on)

@tarcieri tarcieri merged commit b2d1e84 into RustCrypto:master Nov 1, 2022
@recmo
Copy link
Contributor Author

recmo commented Nov 2, 2022

Thanks!

@tarcieri tarcieri mentioned this pull request Nov 14, 2022
@tarcieri tarcieri changed the title Add aarch64 asm implementation keccak_f1600 keccak: add aarch64 ASM implementation for f1600 Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants