Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/lz4block: Copy literals of <=48 bytes through XMM registers in amd64 decoder #161

Merged
merged 1 commit into from Jan 31, 2022

Conversation

greatroar
Copy link
Contributor

Another optimization for the amd64 decoder, inspired by one of its comments:

name                old speed      new speed      delta
UncompressPg1661-8  1.15GB/s ± 1%  1.19GB/s ± 1%   +3.39%  (p=0.000 n=10+10)
UncompressDigits-8  1.89GB/s ± 0%  2.33GB/s ± 1%  +23.46%  (p=0.000 n=9+10)
UncompressTwain-8   1.19GB/s ± 1%  1.23GB/s ± 0%   +3.43%  (p=0.000 n=10+10)
UncompressRand-8    3.93GB/s ± 2%  3.96GB/s ± 1%     ~     (p=0.105 n=10+10)

The effect is most pronounced on Digits because 37.4% of its literals have lengths 17-48. In Twain and Pg1661, this is <4.1%.

This is faster than copying 32 bytes. At 64 bytes, digits gets faster still whlie Twain and Pg1661 get slightly slower.

name                old speed      new speed      delta
UncompressPg1661-8  1.15GB/s ± 1%  1.19GB/s ± 1%   +3.39%  (p=0.000 n=10+10)
UncompressDigits-8  1.89GB/s ± 0%  2.33GB/s ± 1%  +23.46%  (p=0.000 n=9+10)
UncompressTwain-8   1.19GB/s ± 1%  1.23GB/s ± 0%   +3.43%  (p=0.000 n=10+10)
UncompressRand-8    3.93GB/s ± 2%  3.96GB/s ± 1%     ~     (p=0.105 n=10+10)

The effect is most pronounced on Digits because 37.4% of its literals
have lengths 17-48. In Twain and Pg1661, this is <4.1%.

This is faster than copying 32 bytes. At 64 bytes, digits gets faster
still whlie Twain and Pg1661 get slightly slower.
@greatroar greatroar changed the title internal/lz4block: Copy literals of <=48 bytes through XMM internal/lz4block: Copy literals of <=48 bytes through XMM registers in amd64 decoder Jan 30, 2022
@pierrec pierrec merged commit 6bd757c into pierrec:v4 Jan 31, 2022
@greatroar greatroar deleted the amd64-match-copy-48 branch January 31, 2022 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants