Skip to content

Improve codegen for mixing in length #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 27, 2022

Conversation

stepantubanov
Copy link
Contributor

@stepantubanov stepantubanov commented Feb 25, 2022

Minor detail, but produces +17% speed improvement for small strings/byte slices (x86-64).

Measured on Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz (Skylake).

RUSTFLAGS="-C target-cpu=native" cargo bench aeshash/string

Before:
aeshash/string          time:   [69.371 ns 69.966 ns 70.507 ns]

After:
aeshash/string          time:   [56.352 ns 56.830 ns 57.309 ns]                           
                        change: [-18.211% -17.441% -16.670%] (p = 0.00 < 0.05)

Before (relevant part):

   	mov	  rsi, qword ptr [rbx + 16]	# length
   	vmovq	  rax, xmm1			# enc[0]
   	add	  rax, rsi
   	vmovq	  xmm3, rax
   	vpblendd  xmm11, xmm3, xmm1, 12

In order to add the length it extracts enc[0] into general-purpose register, adds there and then moves it back to vector register (and uses blend).

After:

   	vmovq	xmm3, qword ptr [rbx + 16]	# length
   	vpaddq	xmm11, xmm3, xmm1		# enc[0] += length

LLVM able to recognize there is no need to go through GRP and blend.

@tkaitchuck tkaitchuck self-requested a review February 26, 2022 06:09
@@ -146,6 +146,31 @@ pub(crate) fn aesdec(value: u128, xor: u128) -> u128 {
}
}

#[inline(always)]
pub(crate) fn add_in_length(enc: &mut u128, len: u64) {
#[cfg(all(target_feature = "sse2", not(miri)))]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to sse2 and below it refers to sss3, but it looks like these were both meant to be the same so as to provide two alternative paths.

use core::arch::x86_64::*;

unsafe {
let enc = std::ptr::addr_of_mut!(*enc);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't assume std here.


unsafe {
let enc = std::ptr::addr_of_mut!(*enc);
let len = _mm_cvtsi64_si128(len as i64);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this 64 bit only? It is giving a compile error on i686.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, 64-bit. I've added target_arch to the cfg on this branch.

@stepantubanov
Copy link
Contributor Author

@tkaitchuck Thanks for the review, I've just pushed updated diff with the fixes.

@tkaitchuck tkaitchuck merged commit fbd7485 into tkaitchuck:master Feb 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants