Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keccak: add asm feature; use cpufeatures on aarch64 #24

Merged
merged 2 commits into from Nov 13, 2022

Conversation

tarcieri
Copy link
Member

@tarcieri tarcieri commented Nov 12, 2022

Gates asm support (added in #23) under a crate feature.

Uses the cpufeatures crate to detect the presence of the sha3 extension for ARMv8 CPUs, automatically falling back to a software implementation if it isn't available.

cc @recmo

@tarcieri tarcieri force-pushed the keccak/armv8-asm-runtime-feature-detection branch 2 times, most recently from 1fb975f to 9f2fc58 Compare November 12, 2022 22:54
Gates `asm` support under a crate feature.

Uses the `cpufeatures` crate to detect the presence of the `sha3`
extension for ARMv8 CPUs, automatically falling back to a software
implementation if it isn't available.
@tarcieri tarcieri force-pushed the keccak/armv8-asm-runtime-feature-detection branch from 9f2fc58 to 2af43c0 Compare November 12, 2022 22:55
Comment on lines +21 to +22
[target.'cfg(target_arch = "aarch64")'.dependencies]
cpufeatures = "0.2"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate this is a hard dependency on aarch64, although I couldn't figure out how to gate it better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about gating on feature or configuration flag inside the cfg statement?

Copy link
Member Author

@tarcieri tarcieri Nov 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work:

[target.'cfg(all(target_arch = "aarch64", feature = "asm"))'.dependencies]
cpufeatures = "0.2"

It prints this warning:

warning: Found feature = ... in target.'cfg(...)'.dependencies. This key is not supported for selecting dependencies and will not work as expected. Use the [features] section instead: https://doc.rust-lang.org/cargo/reference/features.html

This however, seems to work:

[target.'cfg(all(keccak_asm, target_arch = "aarch64"))'.dependencies]
cpufeatures = "0.2"

So that's a possible argument for using a cfg attribute for gating rather than a Cargo feature.

@@ -14,5 +14,9 @@ categories = ["cryptography", "no-std"]
readme = "README.md"

[features]
asm = [] # Use optimized assembly when available (currently only ARMv8)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this perhaps be a cfg! attribute rather than a crate feature?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I am not sure. Application developers would probably want to have it enabled by default without any additional steps from users. It's a slightly different situation from the aes crate where configuration flags are used mostly for testing backends.

pub use aarch64_sha3::keccak_f1600 as f1600;
#[cfg(all(target_arch = "aarch64", feature = "asm"))]
pub fn f1600(state: &mut [u64; PLEN]) {
if armv8_sha3_intrinsics::get() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@newpavlov there's not really a way to make this interface work with an init token. Does it seem ok to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not ideal, but should be fine. One potential alternative is to expose an unsafe gated f1600_aarch64_asm function and do the switch inside sha3.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that works. I can update the PR.

Copy link
Member

@newpavlov newpavlov Nov 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not saying a separate function is a better approach. Just floating an idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems nice to have to me, especially for the sha3 use case.

It's unsafe and #[target_feature(enable = "sha3")]-gated, which should prevent casual misuse.

@tarcieri
Copy link
Member Author

After the discussion about crate features vs cfg attributes, I could still go either way. They both have pros and cons.

I guess I'm weakly in favor of a feature just for consistency with how sha1 and sha2 provide an asm feature.

When the `asm` feature is enabled on `aarch64` targets, exposes
`f1600_asm` that relies on ARMv8 `sha3` hardware intrinsics.
@tarcieri
Copy link
Member Author

tarcieri commented Nov 13, 2022

8e40212 makes f1600_asm a pub unsafe function with a #[target_feature(enable = "sha3")] attribute, re-exported from the toplevel

@tarcieri
Copy link
Member Author

Gonna merge this.

@newpavlov will wait for your approval for a release

@tarcieri tarcieri merged commit a687839 into master Nov 13, 2022
@tarcieri tarcieri deleted the keccak/armv8-asm-runtime-feature-detection branch November 13, 2022 22:15
@newpavlov
Copy link
Member

newpavlov commented Nov 13, 2022

I think f1600_asm is too generic. Imagine hypothetical future implementation based on XKCP. Since the function is arch specific, I think it should combine arch name and target feature, something like f1600_arch64_sha3 or f1600_armv8_sha3. Also I am not sure it's worth to define cpufeatures::new!(armv8_sha3_intrinsics, "sha3"); in keccak, sha3 looks like a better place for it.

@tarcieri
Copy link
Member Author

tarcieri commented Nov 13, 2022

I think f1600_asm is too generic. Imagine hypothetical future implementation based on XKCP.

...but this is the XKCP ASM implementation. I can add target info to the name though. Edit: opened #26.

I think it should combine arch name and target feature, something like f1600_arch64_sha3 or f1600_armv8_sha3

But if you're thinking of having an intrinsics implementation, it would be ambiguous? Hence why to include *_asm in the name.

armv8_sha3 is redundant IMO. There are no other relevant ARMv8 target features to use as intrinsics. We can include it for clarity I guess, but in that case I think it should probably be f1600_armv8_sha3_asm.

@tarcieri
Copy link
Member Author

Also I am not sure it's worth to define cpufeatures::new!(armv8_sha3_intrinsics, "sha3"); in keccak, sha3 looks like a better place for it.

I think it's nice to have the default API optionally accelerated for non-SHA-3 applications of Keccak.

sha3 can also do gating the exact same way, calling into keccak_p as a fallback but with token-based initialization, so having the cpufeatures-based gating doesn't hurt there IMO.

@newpavlov
Copy link
Member

But if you're thinking of having an intrinsics implementation, it would be ambiguous?

I don't think so. Ideally, we would use intrinsics here, but since they are currently unstable we rely on asm! instead.

@newpavlov
Copy link
Member

newpavlov commented Nov 14, 2022

sha3 can also do gating the exact same way, calling into keccak_p as a fallback but with token-based initialization, so having the cpufeatures-based gating doesn't hurt there IMO.

The problem is that it makes cpufeatures a hard dependency for keccak on ARM, even though the new function could be in theory used without runtime detection. In other words, this crate does not use any runtime detection by itself, so cpufeatures in it looks out of place to me.

@tarcieri
Copy link
Member Author

That's going to be a problem with sha3 as well. AFAICT the only way to avoid it is switching to a cfg attribute for activation.

@newpavlov
Copy link
Member

Yes, but at least sha3 does runtime detection with the feature enabled, so it's a better place to depend on cpufeatures. In future after MSRV bump it will use runtime detection unconditionally on ARM.

@tarcieri
Copy link
Member Author

tarcieri commented Nov 14, 2022

...at which point the concern about a hard dependency on cpufeatures is moot?

I'm confused what you're concerned about. IMO the main problem now is there's a dependency on cpufeatures even if the asm feature isn't enabled, and that will go away when it's "always on".

The only alternative to using cpufeatures in keccak is having every single downstream crate which consumes it implement the same aarch64-specific cpufeatues gating, which seems wrong to me. Why duplicate that logic on a crate-by-crate basis? It would seem to create quite a maintenance burden if we change anything.

The reason for doing it in sha3 as well is so it can use an init token for a minor efficiency boost we haven't implemented or measured yet.

@newpavlov
Copy link
Member

newpavlov commented Nov 14, 2022

Ah... I thought you've removed runtime detection in f1600 in favor of exposing f1600_armv8_sha3_asm.

My idea was to completely remove runtime detection and let user crates (such as sha3) to decide whether to use runtime or compile-time detection. I am not sure it makes sense to expose the intrinsics-based function, if f1600 uses runtime detection under the hood.

In other words, I am fine with either one of the following options:

  • Use runtime detection in f1600 and keep f1600_armv8_sha3_asm and armv8_sha3_intrinsics private.
  • Expose f1600_armv8_sha3_asm without armv8_sha3_intrinsics (i.e. keccak will not depend on cpufeatures). Keep f1600 strictly software-based.

But not with a mix of them.

@tarcieri
Copy link
Member Author

The current configuration permits the following:

  • f1600 autodetects the sha3 feature but doesn't use an init token
  • f1600_armv8_sha3_asm + keccak_p(state, u64::KECCAK_F_ROUND_COUNT) allow downstream crates to select between the optimized hardware implementation and a pure software implementation using an init token

I don't see a disadvantage of this approach, especially with sha3 as the main downstream consumer. Where is the issue?

@newpavlov
Copy link
Member

Saying that you should use keccak_p(state, u64::KECCAK_F_ROUND_COUNT) as a software implementation of f1600 is... an obscure solution. It's quite probable that unaware users would try to combine f1600 and f1600_armv8_sha3_asm, which obviously will be sub-optimal. At the very least we should introduce f1600_soft.

@tarcieri
Copy link
Member Author

But the point of having an f1600 with CPU feature autodetection is so those downstream users don't have to care or propagate the autodetection code themselves.

The point of using keccak_p in this case is for an unproven microoptimization in the sha3 crate.

Perhaps we should just punt on exposing it at all and leave all of the feature detection in the keccak crate, and circle back if and when someone demonstrates the performance benefit is worth it.

@newpavlov
Copy link
Member

Perhaps we should just punt on exposing it at all and leave all of the feature detection in the keccak crate, and circle back if and when someone demonstrates the performance benefit is worth it.

Yes, it's the first option listed earlier.

tarcieri added a commit that referenced this pull request Nov 14, 2022
tarcieri added a commit that referenced this pull request Nov 14, 2022
tarcieri added a commit to RustCrypto/hashes that referenced this pull request Nov 15, 2022
Transitively enables the `asm` feture in the `keccak` crate which was
added in RustCrypto/sponges#24.
tarcieri added a commit to RustCrypto/hashes that referenced this pull request Nov 15, 2022
Transitively enables the `asm` feture in the `keccak` crate which was
added in RustCrypto/sponges#24.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants