Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: Improve throughput of SpeedBestCompression encoder #699

Merged
merged 1 commit into from Nov 26, 2022

Conversation

greatroar
Copy link
Contributor

Lifted ofCode and mlCode computations out of match.estBits, so that method will be inlined into its only caller. Also some changes to eliminate a branch: the last if block becomes two CMOVs/CSELs on amd64 and arm64.

name                              old speed      new speed       delta
Encoder_EncodeAllSimple/best-8    11.1MB/s ± 1%   16.9MB/s ± 1%  +52.23%  (p=0.000 n=10+10)
Encoder_EncodeAllSimple4K/best-8  8.41MB/s ± 1%  10.95MB/s ± 0%  +30.20%  (p=0.000 n=10+10)

name                              old alloc/op   new alloc/op    delta
Encoder_EncodeAllSimple/best-8       20.0B ± 0%      18.0B ± 0%  -10.00%  (p=0.002 n=8+10)
Encoder_EncodeAllSimple4K/best-8     2.00B ± 0%      2.00B ± 0%     ~     (all equal)

Lifted ofCode and mlCode computations out of match.estBits, so that
method will be inlined into its only caller. Also some changes to
eliminate a branch: the last if block becomes two CMOVs/CSELs on
amd64 and arm64.

name                              old speed      new speed       delta
Encoder_EncodeAllSimple/best-8    11.1MB/s ± 1%   16.9MB/s ± 1%  +52.23%  (p=0.000 n=10+10)
Encoder_EncodeAllSimple4K/best-8  8.41MB/s ± 1%  10.95MB/s ± 0%  +30.20%  (p=0.000 n=10+10)

name                              old alloc/op   new alloc/op    delta
Encoder_EncodeAllSimple/best-8       20.0B ± 0%      18.0B ± 0%  -10.00%  (p=0.002 n=8+10)
Encoder_EncodeAllSimple4K/best-8     2.00B ± 0%      2.00B ± 0%     ~     (all equal)
Copy link
Owner

@klauspost klauspost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! Thanks!

@klauspost klauspost merged commit 2878205 into klauspost:master Nov 26, 2022
greatroar added a commit to greatroar/compress that referenced this pull request Nov 26, 2022
This reverts commit 2878205.

Turns out that the speedup was due to a bug. In:

        offset := uint32(m.rep)
        if offset < 0 {
                offset = uint32(m.s-m.offset) + 3
        }

offset is never < 0 because it's cast to uint32 too early.
klauspost pushed a commit that referenced this pull request Nov 27, 2022
This reverts commit 2878205.

Turns out that the speedup was due to a bug. In:

        offset := uint32(m.rep)
        if offset < 0 {
                offset = uint32(m.s-m.offset) + 3
        }

offset is never < 0 because it's cast to uint32 too early.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants