Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/lz4block: Copy literals of <=48 bytes through XMM registers in amd64 decoder #161

Merged
merged 1 commit into from Jan 31, 2022

Commits on Jan 30, 2022

  1. internal/lz4block: Copy literals of <=48 bytes through XMM regs

    name                old speed      new speed      delta
    UncompressPg1661-8  1.15GB/s ± 1%  1.19GB/s ± 1%   +3.39%  (p=0.000 n=10+10)
    UncompressDigits-8  1.89GB/s ± 0%  2.33GB/s ± 1%  +23.46%  (p=0.000 n=9+10)
    UncompressTwain-8   1.19GB/s ± 1%  1.23GB/s ± 0%   +3.43%  (p=0.000 n=10+10)
    UncompressRand-8    3.93GB/s ± 2%  3.96GB/s ± 1%     ~     (p=0.105 n=10+10)
    
    The effect is most pronounced on Digits because 37.4% of its literals
    have lengths 17-48. In Twain and Pg1661, this is <4.1%.
    
    This is faster than copying 32 bytes. At 64 bytes, digits gets faster
    still whlie Twain and Pg1661 get slightly slower.
    greatroar committed Jan 30, 2022
    Copy the full SHA
    257c664 View commit details
    Browse the repository at this point in the history