Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flate: Improve decompression speed 5-10% #483

Merged
merged 2 commits into from Feb 1, 2022
Merged

Conversation

klauspost
Copy link
Owner

@klauspost klauspost commented Jan 31, 2022

benchmark                               old ns/op     new ns/op     delta
BenchmarkDecodeDigitsSpeed1e4-32        49461         44204         -10.63%
BenchmarkDecodeDigitsSpeed1e5-32        520488        509001        -2.21%
BenchmarkDecodeDigitsSpeed1e6-32        5152811       5000738       -2.95%
BenchmarkDecodeDigitsDefault1e4-32      50983         47693         -6.45%
BenchmarkDecodeDigitsDefault1e5-32      494800        488243        -1.33%
BenchmarkDecodeDigitsDefault1e6-32      4990322       4752297       -4.77%
BenchmarkDecodeDigitsCompress1e4-32     49973         43992         -11.97%
BenchmarkDecodeDigitsCompress1e5-32     515033        467616        -9.21%
BenchmarkDecodeDigitsCompress1e6-32     5128402       4659296       -9.15%
BenchmarkDecodeTwainSpeed1e4-32         51740         48324         -6.60%
BenchmarkDecodeTwainSpeed1e5-32         532690        513209        -3.66%
BenchmarkDecodeTwainSpeed1e6-32         5304535       5129081       -3.31%
BenchmarkDecodeTwainDefault1e4-32       50613         48007         -5.15%
BenchmarkDecodeTwainDefault1e5-32       488404        476945        -2.35%
BenchmarkDecodeTwainDefault1e6-32       4881062       4710812       -3.49%
BenchmarkDecodeTwainCompress1e4-32      49583         45632         -7.97%
BenchmarkDecodeTwainCompress1e5-32      458843        445645        -2.88%
BenchmarkDecodeTwainCompress1e6-32      4544787       4392530       -3.35%
BenchmarkDecodeRandomSpeed1e4-32        298           305           +2.21%
BenchmarkDecodeRandomSpeed1e5-32        1909          1909          +0.00%
BenchmarkDecodeRandomSpeed1e6-32        19987         19809         -0.89%

benchmark                               old MB/s     new MB/s     speedup
BenchmarkDecodeDigitsSpeed1e4-32        202.18       226.23       1.12x
BenchmarkDecodeDigitsSpeed1e5-32        192.13       196.46       1.02x
BenchmarkDecodeDigitsSpeed1e6-32        194.07       199.97       1.03x
BenchmarkDecodeDigitsDefault1e4-32      196.15       209.68       1.07x
BenchmarkDecodeDigitsDefault1e5-32      202.10       204.82       1.01x
BenchmarkDecodeDigitsDefault1e6-32      200.39       210.42       1.05x
BenchmarkDecodeDigitsCompress1e4-32     200.11       227.31       1.14x
BenchmarkDecodeDigitsCompress1e5-32     194.16       213.85       1.10x
BenchmarkDecodeDigitsCompress1e6-32     194.99       214.62       1.10x
BenchmarkDecodeTwainSpeed1e4-32         193.27       206.94       1.07x
BenchmarkDecodeTwainSpeed1e5-32         187.73       194.85       1.04x
BenchmarkDecodeTwainSpeed1e6-32         188.52       194.97       1.03x
BenchmarkDecodeTwainDefault1e4-32       197.58       208.30       1.05x
BenchmarkDecodeTwainDefault1e5-32       204.75       209.67       1.02x
BenchmarkDecodeTwainDefault1e6-32       204.87       212.28       1.04x
BenchmarkDecodeTwainCompress1e4-32      201.68       219.14       1.09x
BenchmarkDecodeTwainCompress1e5-32      217.94       224.39       1.03x
BenchmarkDecodeTwainCompress1e6-32      220.03       227.66       1.03x
BenchmarkDecodeRandomSpeed1e4-32        33551.69     32828.68     0.98x
BenchmarkDecodeRandomSpeed1e5-32        52391.84     52395.57     1.00x
BenchmarkDecodeRandomSpeed1e6-32        50031.69     50482.80     1.01x

("Random" is just memcopies, so unaffected by this)

```
benchmark                               old ns/op     new ns/op     delta
BenchmarkDecodeDigitsSpeed1e4-32        49461         44204         -10.63%
BenchmarkDecodeDigitsSpeed1e5-32        520488        509001        -2.21%
BenchmarkDecodeDigitsSpeed1e6-32        5152811       5000738       -2.95%
BenchmarkDecodeDigitsDefault1e4-32      50983         47693         -6.45%
BenchmarkDecodeDigitsDefault1e5-32      494800        488243        -1.33%
BenchmarkDecodeDigitsDefault1e6-32      4990322       4752297       -4.77%
BenchmarkDecodeDigitsCompress1e4-32     49973         43992         -11.97%
BenchmarkDecodeDigitsCompress1e5-32     515033        467616        -9.21%
BenchmarkDecodeDigitsCompress1e6-32     5128402       4659296       -9.15%
BenchmarkDecodeTwainSpeed1e4-32         51740         48324         -6.60%
BenchmarkDecodeTwainSpeed1e5-32         532690        513209        -3.66%
BenchmarkDecodeTwainSpeed1e6-32         5304535       5129081       -3.31%
BenchmarkDecodeTwainDefault1e4-32       50613         48007         -5.15%
BenchmarkDecodeTwainDefault1e5-32       488404        476945        -2.35%
BenchmarkDecodeTwainDefault1e6-32       4881062       4710812       -3.49%
BenchmarkDecodeTwainCompress1e4-32      49583         45632         -7.97%
BenchmarkDecodeTwainCompress1e5-32      458843        445645        -2.88%
BenchmarkDecodeTwainCompress1e6-32      4544787       4392530       -3.35%
BenchmarkDecodeRandomSpeed1e4-32        298           305           +2.21%
BenchmarkDecodeRandomSpeed1e5-32        1909          1909          +0.00%
BenchmarkDecodeRandomSpeed1e6-32        19987         19809         -0.89%

benchmark                               old MB/s     new MB/s     speedup
BenchmarkDecodeDigitsSpeed1e4-32        202.18       226.23       1.12x
BenchmarkDecodeDigitsSpeed1e5-32        192.13       196.46       1.02x
BenchmarkDecodeDigitsSpeed1e6-32        194.07       199.97       1.03x
BenchmarkDecodeDigitsDefault1e4-32      196.15       209.68       1.07x
BenchmarkDecodeDigitsDefault1e5-32      202.10       204.82       1.01x
BenchmarkDecodeDigitsDefault1e6-32      200.39       210.42       1.05x
BenchmarkDecodeDigitsCompress1e4-32     200.11       227.31       1.14x
BenchmarkDecodeDigitsCompress1e5-32     194.16       213.85       1.10x
BenchmarkDecodeDigitsCompress1e6-32     194.99       214.62       1.10x
BenchmarkDecodeTwainSpeed1e4-32         193.27       206.94       1.07x
BenchmarkDecodeTwainSpeed1e5-32         187.73       194.85       1.04x
BenchmarkDecodeTwainSpeed1e6-32         188.52       194.97       1.03x
BenchmarkDecodeTwainDefault1e4-32       197.58       208.30       1.05x
BenchmarkDecodeTwainDefault1e5-32       204.75       209.67       1.02x
BenchmarkDecodeTwainDefault1e6-32       204.87       212.28       1.04x
BenchmarkDecodeTwainCompress1e4-32      201.68       219.14       1.09x
BenchmarkDecodeTwainCompress1e5-32      217.94       224.39       1.03x
BenchmarkDecodeTwainCompress1e6-32      220.03       227.66       1.03x
BenchmarkDecodeRandomSpeed1e4-32        33551.69     32828.68     0.98x
BenchmarkDecodeRandomSpeed1e5-32        52391.84     52395.57     1.00x
BenchmarkDecodeRandomSpeed1e6-32        50031.69     50482.80     1.01x
```
@klauspost klauspost merged commit 60b19fa into master Feb 1, 2022
@klauspost klauspost deleted the improve-inflate-speed branch February 1, 2022 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant