Skip to content

Commit

Permalink
fix incorrect output / undefined behavior in Windows SSE2 assembly
Browse files Browse the repository at this point in the history
The SSE2 patch introduced xmm10 as a temporary register for one of the
rotations, but xmm6-xmm15 are callee-save registers on Windows, and
SSE4.1 was only saving the registers it used. The minimal fix is to use
one of the saved registers instead of xmm10.

See BLAKE3-team#206.
  • Loading branch information
oconnor663 committed Nov 5, 2021
1 parent d4a3209 commit 1e50092
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 12 deletions.
12 changes: 6 additions & 6 deletions blake3_sse2_x86-64_windows_gnu.S
Expand Up @@ -2137,10 +2137,10 @@ _blake3_compress_in_place_sse2:
por xmm9, xmm8
movdqa xmm8, xmm7
punpcklqdq xmm8, xmm5
movdqa xmm10, xmm6
movdqa xmm14, xmm6
pand xmm8, xmmword ptr [PBLENDW_0x3F_MASK+rip]
pand xmm10, xmmword ptr [PBLENDW_0xC0_MASK+rip]
por xmm8, xmm10
pand xmm14, xmmword ptr [PBLENDW_0xC0_MASK+rip]
por xmm8, xmm14
pshufd xmm8, xmm8, 0x78
punpckhdq xmm5, xmm7
punpckldq xmm6, xmm5
Expand Down Expand Up @@ -2268,10 +2268,10 @@ blake3_compress_xof_sse2:
por xmm9, xmm8
movdqa xmm8, xmm7
punpcklqdq xmm8, xmm5
movdqa xmm10, xmm6
movdqa xmm14, xmm6
pand xmm8, xmmword ptr [PBLENDW_0x3F_MASK+rip]
pand xmm10, xmmword ptr [PBLENDW_0xC0_MASK+rip]
por xmm8, xmm10
pand xmm14, xmmword ptr [PBLENDW_0xC0_MASK+rip]
por xmm8, xmm14
pshufd xmm8, xmm8, 0x78
punpckhdq xmm5, xmm7
punpckldq xmm6, xmm5
Expand Down
12 changes: 6 additions & 6 deletions blake3_sse2_x86-64_windows_msvc.asm
Expand Up @@ -2139,10 +2139,10 @@ _blake3_compress_in_place_sse2 PROC
por xmm9, xmm8
movdqa xmm8, xmm7
punpcklqdq xmm8, xmm5
movdqa xmm10, xmm6
movdqa xmm14, xmm6
pand xmm8, xmmword ptr [PBLENDW_0x3F_MASK]
pand xmm10, xmmword ptr [PBLENDW_0xC0_MASK]
por xmm8, xmm10
pand xmm14, xmmword ptr [PBLENDW_0xC0_MASK]
por xmm8, xmm14
pshufd xmm8, xmm8, 78H
punpckhdq xmm5, xmm7
punpckldq xmm6, xmm5
Expand Down Expand Up @@ -2271,10 +2271,10 @@ _blake3_compress_xof_sse2 PROC
por xmm9, xmm8
movdqa xmm8, xmm7
punpcklqdq xmm8, xmm5
movdqa xmm10, xmm6
movdqa xmm14, xmm6
pand xmm8, xmmword ptr [PBLENDW_0x3F_MASK]
pand xmm10, xmmword ptr [PBLENDW_0xC0_MASK]
por xmm8, xmm10
pand xmm14, xmmword ptr [PBLENDW_0xC0_MASK]
por xmm8, xmm14
pshufd xmm8, xmm8, 78H
punpckhdq xmm5, xmm7
punpckldq xmm6, xmm5
Expand Down

0 comments on commit 1e50092

Please sign in to comment.