Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/lz4block: Copy literals of <=48 bytes through XMM registers in amd64 decoder #161

Merged
merged 1 commit into from Jan 31, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
20 changes: 12 additions & 8 deletions internal/lz4block/decode_amd64.s
Expand Up @@ -157,24 +157,28 @@ copy_literal:
CMPQ BX, R8
JA err_short_buf

// whats a good cut off to call memmove?
CMPQ CX, $16
// Copy matches of <=48 bytes through the XMM registers.
CMPQ CX, $48
JGT memmove_lit

// if len(dst[di:]) < 16
// if len(dst[di:]) < 48
MOVQ R8, AX
SUBQ DI, AX
CMPQ AX, $16
CMPQ AX, $48
JLT memmove_lit

// if len(src[si:]) < 16
MOVQ R9, AX
SUBQ SI, AX
CMPQ AX, $16
// if len(src[si:]) < 48
MOVQ R9, BX
SUBQ SI, BX
CMPQ BX, $48
JLT memmove_lit

MOVOU (SI), X0
MOVOU 16(SI), X1
MOVOU 32(SI), X2
MOVOU X0, (DI)
MOVOU X1, 16(DI)
MOVOU X2, 32(DI)

ADDQ CX, SI
ADDQ CX, DI
Expand Down