Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to assembly from OpenSSL #5

Closed
newpavlov opened this issue Nov 21, 2018 · 21 comments
Closed

Migrate to assembly from OpenSSL #5

newpavlov opened this issue Nov 21, 2018 · 21 comments

Comments

@newpavlov
Copy link
Member

Initially crates in this repository were a simple proof-of-work experiment and assembly from Project Nayuki was taken without much consideration to performance. But some people use asm feature and expect performance on par with OpenSSL which is obviously not true today.

So I think we should migrate to OpenSSL assembly and maybe rename this repository to openssl-asm or something. This will require some code refactoring as OpenSSL hash assembly processes several blocks at a time and for the best performance we will have to accommodate for it. My initial experiments with MD5 haven't found any difficult problems.

The main question I currently have is: should we pack Perl scripts into crates or generated assembly? First option will result in a lighter crates (especially considering various possible "flavors" which Perl scripts support), but will require Perl to be installed on the system. For Linux it's not a big deal, but I think for Windows it can be a problem.

@tarcieri
Copy link
Member

Preserving the Perl will make it easier to track upstream changes. It sure would be nice to get rid of it though.

It just does templating, so if it can be adapted to some other templating system or syntax, possibly as a mechanical translation from the Perl, I'd be a fan of that.

@newpavlov
Copy link
Member Author

newpavlov commented Nov 21, 2018

I thought about adding openssl repo as a submodule and using a small bash script which will regenerate assembly for all crates. So time to time we will manually update submodule, execute the script, and check if there any changes.

@tarcieri
Copy link
Member

@newpavlov as a minimum viable solution to removing the Perl dependency, that sounds fine to me.

@tarcieri
Copy link
Member

Here's an alternative idea that sounds fairly difficult and likely not worth it, but kind of fun:

I think it might be possible to convert the Perl templates (through a similar mechanical preprocessing step on the upstream files from the openssl repo with a checked-in result) into some other template, be it just a format! string or what have you, and then include_str! it, replacing the Perl templating with a combination of CPU architecture-dependent conditional compilation and Rust macros.

@newpavlov
Copy link
Member Author

It will be essentially a limited Perl-to-Rust convertor, so indeed it's probably not worth the trouble. :)

@tarcieri
Copy link
Member

tarcieri commented Nov 21, 2018

I think it's a lot easier than that, but it would essentially involve rewriting the library-level parts of the Perl templating system as e.g. Rust macros, in addition to extracting and converting/munging the asm templates.

Still seems like a lot of work.

@PrismaPhonic
Copy link

I would like to see some assembly that takes advantage of avx2. Currently, the Golang SHA-1 library does: https://github.com/golang/go/blob/master/src/crypto/sha1/sha1block_amd64.s

This uses the same algorithm as the Linux kernel: https://github.com/torvalds/linux/blob/master/arch/x86/crypto/sha1_avx2_x86_64_asm.S

@newpavlov
Copy link
Member Author

@PrismaPhonic
IIUC OpenSSL assembly (sha1, sha256) does account for AVX2 instruction set, so it's matter of generating appropriate assembly files. Although I haven't compared performance with the files linked by you.

@PrismaPhonic
Copy link

PrismaPhonic commented Nov 6, 2019

For sure. I know I'm just now stepping into this conversation and am not even a contributor (yet), but I personally would prefer avoiding Perl scripts altogether. I would vote for sticking to raw assembly files, and check Rust side about which instruction set should be used.

EDIT: My reasoning behind that is that combining perl & asm makes for extremely unreadable code. I'm frankly a bit shocked that the openssl devs decided to use that approach. I'm not sure there exists a less readable way of generating asm.

@newpavlov
Copy link
Member Author

newpavlov commented Jun 11, 2020

It looks like at least for hash functions OpenSSL assembly handles dynamic dispatch itself by reading data from OPENSSL_ia32cap_P, which contains CPUID results (on ARM it's OPENSSL_armcap_P). I think it will be a bit tricky to setup this constant correctly, but should be doable.

@joshtriplett
Copy link

joshtriplett commented Mar 22, 2021

Quick note: from a licensing point of view, this needs some care to make sure that it's safe to do. OpenSSL has relicensed to Apache 2.0, but the asm modules in OpenSSL appear to be additionally dual-licensed under BSD and GPLv2. So, if you want to integrate that assembly, you'd need to change the license of this crate accordingly (BSD-3-Clause OR Apache-2.0 would work).

@tarcieri
Copy link
Member

tarcieri commented Mar 22, 2021

@joshtriplett yeah. See recent licensing discussion on #32 (review)

@Byron
Copy link

Byron commented Apr 3, 2021

I would like to join this discussion out of my interest to beat Git when resolving packs on ARM (or AArch64 specifically), which right now doesn't seem possible when using only 4 performance cores. As gitoxide is already faster on Intel I am super eager to see by how much it will be faster on ARM/AArch64.

That said, I am currently working on this PR to help clean up ARM/AArch64 intrinsics which I understand might be one way to speed things up here, even though ultimately it might not be used based on what I gather from this discussion:

  • OpenSSL scripts could be used to generate ASM for all kinds of architectures and feature sets. Using OpenSSL as submodule could help benefitting from future improvements and allow executing their templating engine to generate assembly.
  • Generated assembly would be stored in the crate
  • The crate could use compile time (i.e. architecture specific) and runtime selection mechanisms to pick suitable assembly to execute. I imagine it like the crate being compiled for ARM (i.e. #[cfg(target_argh = "arm")] would select from ARM assembly that it included as text to then decide using runtime mechanism which version to use, i.e. V6, V7 or V8.
  • Some crate internals would have to change as OpenSSL handles multiple blocks at once, an MD5 experiment has shown that this isn't a hard problem.

The above would make most sense to me, and hope you can make corrections and chime in to clear a path forward, maybe to make contributions possible.

Thanks a lot!

@tarcieri
Copy link
Member

tarcieri commented Apr 3, 2021

I think we're in a good place now to attempt adding a core::arch-based implementation of SHA-256 which uses the ARMv8 Cryptography Extensions.

I have an M1 and can take a look. No OpenSSL-based ASM would be required for this, just wrapping the underlying intrinsics in pure Rust.

Likewise, it'd probably still be better to use core::arch for an AVX2 implementation as well, although as a stopgap we could potentially use the OpenSSL-sourced assembly.

As I noted earlier, while OpenSSL provides high performance implementations across a number of architectures which is nice, one of the big problems with it is the upstream assembly code is generated by a Perl-based templating system which it would be nice to avoid, especially as a mandatory prerequisite of the build process.

@Byron
Copy link

Byron commented Apr 3, 2021

I love that, especially with an immediate way forward with something that is part of the ecosystem. It looks like there are SHA1 instructions as well in V8 which I assume AArch64 is based on, that should do the trick for me.

@tarcieri
Copy link
Member

tarcieri commented Apr 3, 2021

I took a look at implementing SHA-256 using the ARMv8 Cryptography Extensions:

https://gist.github.com/tarcieri/414a3300072160f372b5d93ccfce280b

Unfortunately it's nightly-only as it requires feature(stdsimd), so in the interim (as @newpavlov has pointed out in the past) we still need an assembly implementation to support stable Rust.

As it were, there are also a couple NEON intrinsics that are/were missing:

Regarding adding vst1q_u32, here's a PR that added vld1q_u32 for reference: https://github.com/rust-lang/stdarch/pull/899/files (edit: I'm told this is the old way of doing it, and new intrinsics should be added to neon.spec instead)

@Byron
Copy link

Byron commented Apr 20, 2021

I don't know if it matters but hope that the merge of this stdarch PR helps with this endeavour once the intrinsics trickle down to stable.

@newpavlov
Copy link
Member Author

newpavlov commented Apr 24, 2021

@Byron
Crates in this repository are about wrapping implementations written in assembly. Intrinsic-based implementation reside in the main crates (e.g. see sha-1, sha2, aes and others).

@tarcieri
Copy link
Member

Opened a tracking issue for stdarch intrinsic-based implementations here: RustCrypto/hashes#257

@tarcieri
Copy link
Member

Regarding SHA2 on x86 platforms, #41 (using Intel-provided BSD licensed assembly contributed to the Linux kernel) looks like a nice alternative

@newpavlov
Copy link
Member Author

We decided to deprecate these crates. We could add ASM backends based on OpenSSL code in future, but such PRs should be opened in the hashes repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants