Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize neon loadu_128/storeu_128 #384

Merged
merged 3 commits into from Mar 12, 2024
Merged

Conversation

divinity76
Copy link
Contributor

vld1q_u8 and vst1q_u8 has no alignment requirements.

This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input, from 13920 nanoseconds down to 13800 nanoseconds (approx)

vld1q_u8 and vst1q_u8 has no alignment requirements.

This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input, from 13920 nanoseconds down to 13800 nanoseconds (approx)
@divinity76 divinity76 changed the title slightly optimize neon loadu_128/storeu_128 optimize neon loadu_128/storeu_128 Feb 9, 2024
divinity76 added a commit to divinity76/php-src that referenced this pull request Feb 9, 2024
vld1q_u8 and vst1q_u8 has no alignment requirements.

This improves performance on Oracle Cloud's VM.Standard.A1.Flex by 1.15% on a 16*1024 input,
 from 13920 nanoseconds down to 13800 nanoseconds (approx)

ref BLAKE3-team/BLAKE3#384
@oconnor663 oconnor663 merged commit 58bea0b into BLAKE3-team:master Mar 12, 2024
50 checks passed
@oconnor663
Copy link
Member

I see a ~1% improvement on the Graviton2 CPU on my AWS instance too. Thanks!

oconnor663 added a commit that referenced this pull request Mar 12, 2024
Changes since 1.5.0:
- The Rust crate is now compatible with Miri.
- ~1% performance improvement on Arm NEON contributed by @divinity76 (#384).
- Various fixes and improvements in the CMake build.
oconnor663 added a commit that referenced this pull request Mar 12, 2024
Changes since 1.5.0:
- The Rust crate is now compatible with Miri.
- ~1% performance improvement on Arm NEON contributed by @divinity76 (#384).
- Various fixes and improvements in the CMake build.
- The MSRV of b3sum is now 1.74.1. (The MSRV of the library crate is
  unchanged, 1.66.1.)
@oconnor663
Copy link
Member

Released as part of v1.5.1.

@divinity76
Copy link
Contributor Author

divinity76 commented Mar 31, 2024

I wonder if this might have made big endian work too 🤔 (doesn't really matter, nothing runs big endian)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants