Improve codegen for mixing in length #112

stepantubanov · 2022-02-25T15:17:38Z

Minor detail, but produces +17% speed improvement for small strings/byte slices (x86-64).

Measured on Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz (Skylake).

RUSTFLAGS="-C target-cpu=native" cargo bench aeshash/string

Before:
aeshash/string          time:   [69.371 ns 69.966 ns 70.507 ns]

After:
aeshash/string          time:   [56.352 ns 56.830 ns 57.309 ns]                           
                        change: [-18.211% -17.441% -16.670%] (p = 0.00 < 0.05)

Before (relevant part):

   	mov	  rsi, qword ptr [rbx + 16]	# length
   	vmovq	  rax, xmm1			# enc[0]
   	add	  rax, rsi
   	vmovq	  xmm3, rax
   	vpblendd  xmm11, xmm3, xmm1, 12

In order to add the length it extracts enc[0] into general-purpose register, adds there and then moves it back to vector register (and uses blend).

After:

   	vmovq	xmm3, qword ptr [rbx + 16]	# length
   	vpaddq	xmm11, xmm3, xmm1		# enc[0] += length

LLVM able to recognize there is no need to go through GRP and blend.

tkaitchuck · 2022-02-26T06:12:18Z

src/operations.rs

@@ -146,6 +146,31 @@ pub(crate) fn aesdec(value: u128, xor: u128) -> u128 {
    }
 }

+#[inline(always)]
+pub(crate) fn add_in_length(enc: &mut u128, len: u64) {
+    #[cfg(all(target_feature = "sse2", not(miri)))]


This refers to sse2 and below it refers to sss3, but it looks like these were both meant to be the same so as to provide two alternative paths.

tkaitchuck · 2022-02-26T07:35:32Z

src/operations.rs

+        use core::arch::x86_64::*;
+
+        unsafe {
+            let enc = std::ptr::addr_of_mut!(*enc);


You can't assume std here.

tkaitchuck · 2022-02-26T07:37:18Z

src/operations.rs

+
+        unsafe {
+            let enc = std::ptr::addr_of_mut!(*enc);
+            let len = _mm_cvtsi64_si128(len as i64);


Is this 64 bit only? It is giving a compile error on i686.

Yep, 64-bit. I've added target_arch to the cfg on this branch.

stepantubanov · 2022-02-26T10:19:42Z

@tkaitchuck Thanks for the review, I've just pushed updated diff with the fixes.

tkaitchuck self-requested a review February 26, 2022 06:09

tkaitchuck reviewed Feb 26, 2022

View reviewed changes

stepantubanov force-pushed the opt-add-length-codegen branch from f0c91a6 to 7f1ca8f Compare February 26, 2022 10:14

Improve codegen for mixing in length

8594bd7

stepantubanov force-pushed the opt-add-length-codegen branch from 7f1ca8f to 8594bd7 Compare February 26, 2022 10:16

tkaitchuck merged commit fbd7485 into tkaitchuck:master Feb 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve codegen for mixing in length #112

Improve codegen for mixing in length #112

stepantubanov commented Feb 25, 2022 •

edited

Loading

Uh oh!

tkaitchuck Feb 26, 2022

Uh oh!

tkaitchuck Feb 26, 2022

Uh oh!

tkaitchuck Feb 26, 2022

Uh oh!

stepantubanov Feb 26, 2022

Uh oh!

stepantubanov commented Feb 26, 2022

Uh oh!

Improve codegen for mixing in length #112

Improve codegen for mixing in length #112

Conversation

stepantubanov commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tkaitchuck Feb 26, 2022

Choose a reason for hiding this comment

Uh oh!

tkaitchuck Feb 26, 2022

Choose a reason for hiding this comment

Uh oh!

tkaitchuck Feb 26, 2022

Choose a reason for hiding this comment

Uh oh!

stepantubanov Feb 26, 2022

Choose a reason for hiding this comment

Uh oh!

stepantubanov commented Feb 26, 2022

Uh oh!

stepantubanov commented Feb 25, 2022 •

edited

Loading