Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/lz4block: Short literal copying in arm64 decoder #162

Merged
merged 1 commit into from Feb 5, 2022

Conversation

greatroar
Copy link
Contributor

This borrows a trick from the amd64 decoder: copy short literals by always copying 16 bytes, using the space beyond the literal's length as scratch space (but only when we're >=16 bytes from the end of src, dst).

Benchmark results on RPi4B:

name                old speed      new speed      delta
UncompressPg1661-4   101MB/s ± 1%   138MB/s ± 0%  +36.46%  (p=0.000 n=10+9)
UncompressDigits-4   453MB/s ± 0%   525MB/s ± 0%  +15.81%  (p=0.000 n=10+10)
UncompressTwain-4    110MB/s ± 0%   156MB/s ± 0%  +41.42%  (p=0.000 n=10+10)
UncompressRand-4    1.14GB/s ± 1%  1.13GB/s ± 1%     ~     (p=0.075 n=10+10)

Also moved a SUBS to just before the associated branch to allow instruction fusion on ARMs that do that.

This borrows a trick from the amd64 decoder: copy short literals by
always copying 16 bytes, using the space beyond the literal's length as
scratch space (but only when we're >=16 bytes from the end of src, dst).

Benchmark results on RPi4B:

name                old speed      new speed      delta
UncompressPg1661-4   101MB/s ± 1%   138MB/s ± 0%  +36.46%  (p=0.000 n=10+9)
UncompressDigits-4   453MB/s ± 0%   525MB/s ± 0%  +15.81%  (p=0.000 n=10+10)
UncompressTwain-4    110MB/s ± 0%   156MB/s ± 0%  +41.42%  (p=0.000 n=10+10)
UncompressRand-4    1.14GB/s ± 1%  1.13GB/s ± 1%     ~     (p=0.075 n=10+10)

Also moved a SUBS to just before the associated branch to allow
instruction fusion on ARMs that do that.
@pierrec pierrec merged commit 5e2de87 into pierrec:v4 Feb 5, 2022
@pierrec
Copy link
Owner

pierrec commented Feb 5, 2022

Nice one.

@greatroar greatroar deleted the arm64-literal-copy branch February 6, 2022 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants