Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/lz4block: Short literal copying in arm64 decoder #162

Merged
merged 1 commit into from Feb 5, 2022

Commits on Feb 4, 2022

  1. internal/lz4block: Short literal copying in arm64 decoder

    This borrows a trick from the amd64 decoder: copy short literals by
    always copying 16 bytes, using the space beyond the literal's length as
    scratch space (but only when we're >=16 bytes from the end of src, dst).
    
    Benchmark results on RPi4B:
    
    name                old speed      new speed      delta
    UncompressPg1661-4   101MB/s ± 1%   138MB/s ± 0%  +36.46%  (p=0.000 n=10+9)
    UncompressDigits-4   453MB/s ± 0%   525MB/s ± 0%  +15.81%  (p=0.000 n=10+10)
    UncompressTwain-4    110MB/s ± 0%   156MB/s ± 0%  +41.42%  (p=0.000 n=10+10)
    UncompressRand-4    1.14GB/s ± 1%  1.13GB/s ± 1%     ~     (p=0.075 n=10+10)
    
    Also moved a SUBS to just before the associated branch to allow
    instruction fusion on ARMs that do that.
    greatroar committed Feb 4, 2022
    Copy the full SHA
    e99166d View commit details
    Browse the repository at this point in the history