Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huff0: asm implementation of Decompress1X #596

Merged
merged 2 commits into from May 23, 2022

Conversation

WojciechMula
Copy link
Contributor

@WojciechMula WojciechMula commented May 13, 2022

Solves #595

Benchmarks go test -run XYZ -run 1X on an Ice Lake machine.

benchmark                                                   old ns/op     new ns/op     delta
BenchmarkCompress1XReuseNone/digits-16                      184588        186723        +1.16%
BenchmarkCompress1XReuseNone/gettysburg-16                  5227          5294          +1.28%
BenchmarkCompress1XReuseNone/twain-16                       654643        631837        -3.48%
BenchmarkCompress1XReuseNone/low-ent.10k-16                 84314         84991         +0.80%
BenchmarkCompress1XReuseNone/superlow-ent-10k-16            34252         34385         +0.39%
BenchmarkCompress1XReuseNone/crash2-16                      1213          1252          +3.22%
BenchmarkCompress1XReuseNone/endzerobits-16                 268           281           +4.89%
BenchmarkCompress1XReuseNone/endnonzero-16                  844           876           +3.83%
BenchmarkCompress1XReuseNone/case1-16                       3904          3946          +1.08%
BenchmarkCompress1XReuseNone/case2-16                       3901          3942          +1.05%
BenchmarkCompress1XReuseNone/case3-16                       3923          3963          +1.02%
BenchmarkCompress1XReuseNone/pngdata.001-16                 167341        167843        +0.30%
BenchmarkCompress1XReuseNone/normcount2-16                  2257          2315          +2.57%
BenchmarkCompress1XReuseAllow/digits-16                     184447        185666        +0.66%
BenchmarkCompress1XReuseAllow/gettysburg-16                 4699          4806          +2.28%
BenchmarkCompress1XReuseAllow/twain-16                      646781        635147        -1.80%
BenchmarkCompress1XReuseAllow/low-ent.10k-16                83972         84630         +0.78%
BenchmarkCompress1XReuseAllow/superlow-ent-10k-16           33855         34139         +0.84%
BenchmarkCompress1XReuseAllow/crash2-16                     889           885           -0.46%
BenchmarkCompress1XReuseAllow/endzerobits-16                260           262           +0.92%
BenchmarkCompress1XReuseAllow/endnonzero-16                 611           618           +1.11%
BenchmarkCompress1XReuseAllow/case1-16                      3205          3171          -1.06%
BenchmarkCompress1XReuseAllow/case2-16                      3164          3161          -0.09%
BenchmarkCompress1XReuseAllow/case3-16                      3201          3168          -1.03%
BenchmarkCompress1XReuseAllow/pngdata.001-16                166807        166828        +0.01%
BenchmarkCompress1XReuseAllow/normcount2-16                 1779          1844          +3.65%
BenchmarkCompress1XReusePrefer/digits-16                    183785        185473        +0.92%
BenchmarkCompress1XReusePrefer/gettysburg-16                3018          3009          -0.30%
BenchmarkCompress1XReusePrefer/twain-16                     637243        631305        -0.93%
BenchmarkCompress1XReusePrefer/low-ent.10k-16               83624         84309         +0.82%
BenchmarkCompress1XReusePrefer/superlow-ent-10k-16          33316         33357         +0.12%
BenchmarkCompress1XReusePrefer/crash2-16                    199           200           +0.45%
BenchmarkCompress1XReusePrefer/endzerobits-16               183           188           +2.34%
BenchmarkCompress1XReusePrefer/endnonzero-16                192           194           +0.99%
BenchmarkCompress1XReusePrefer/case1-16                     299           298           -0.30%
BenchmarkCompress1XReusePrefer/case2-16                     249           252           +1.08%
BenchmarkCompress1XReusePrefer/case3-16                     252           254           +0.63%
BenchmarkCompress1XReusePrefer/pngdata.001-16               162023        161971        -0.03%
BenchmarkCompress1XReusePrefer/normcount2-16                326           326           -0.03%
BenchmarkCompress1XSizes/digits-100-16                      1420          1458          +2.68%
BenchmarkCompress1XSizes/digits-200-16                      1605          1651          +2.87%
BenchmarkCompress1XSizes/digits-500-16                      2145          2178          +1.54%
BenchmarkCompress1XSizes/digits-1000-16                     2997          3067          +2.34%
BenchmarkCompress1XSizes/digits-5000-16                     9778          9836          +0.59%
BenchmarkCompress1XSizes/digits-10000-16                    18312         18499         +1.02%
BenchmarkCompress1XSizes/digits-50000-16                    88397         89414         +1.15%
BenchmarkDecompress1XTable/digits-16                        392200        303088        -22.72%
BenchmarkDecompress1XTable/gettysburg-16                    7671          5701          -25.68%
BenchmarkDecompress1XTable/twain-16                         1250201       851679        -31.88%
BenchmarkDecompress1XTable/low-ent.10k-16                   139365        110457        -20.74%
BenchmarkDecompress1XTable/superlow-ent-10k-16              37111         29501         -20.51%
BenchmarkDecompress1XTable/crash2-16                        670           702           +4.78%
BenchmarkDecompress1XTable/endzerobits-16                   76.7          68.8          -10.31%
BenchmarkDecompress1XTable/endnonzero-16                    468           501           +7.07%
BenchmarkDecompress1XTable/case1-16                         1989          1945          -2.21%
BenchmarkDecompress1XTable/case2-16                         1936          1919          -0.88%
BenchmarkDecompress1XTable/case3-16                         1957          1948          -0.46%
BenchmarkDecompress1XTable/pngdata.001-16                   206514        144385        -30.08%
BenchmarkDecompress1XTable/normcount2-16                    1409          1352          -4.05%
BenchmarkDecompress1XNoTable/digits/100-16                  423           330           -22.09%
BenchmarkDecompress1XNoTable/digits/10000-16                38077         28327         -25.61%
BenchmarkDecompress1XNoTable/digits/262143-16               1043522       802526        -23.09%
BenchmarkDecompress1XNoTable/gettysburg/100-16              416           334           -19.74%
BenchmarkDecompress1XNoTable/gettysburg/10000-16            41724         28560         -31.55%
BenchmarkDecompress1XNoTable/gettysburg/262143-16           1141714       759146        -33.51%
BenchmarkDecompress1XNoTable/twain/100-16                   424           342           -19.40%
BenchmarkDecompress1XNoTable/twain/10000-16                 41842         28652         -31.52%
BenchmarkDecompress1XNoTable/twain/262143-16                1244988       850157        -31.71%
BenchmarkDecompress1XNoTable/low-ent.10k/100-16             441           446           +0.97%
BenchmarkDecompress1XNoTable/low-ent.10k/10000-16           35085         27606         -21.32%
BenchmarkDecompress1XNoTable/low-ent.10k/262143-16          914657        719273        -21.36%
BenchmarkDecompress1XNoTable/superlow-ent-10k/262143-16     920422        718830        -21.90%
BenchmarkDecompress1XNoTable/crash2/100-16                  408           332           -18.49%
BenchmarkDecompress1XNoTable/crash2/10000-16                37059         28189         -23.93%
BenchmarkDecompress1XNoTable/crash2/262143-16               971784        737302        -24.13%
BenchmarkDecompress1XNoTable/endzerobits/100-16             446           448           +0.58%
BenchmarkDecompress1XNoTable/endzerobits/10000-16           35144         27607         -21.45%
BenchmarkDecompress1XNoTable/endzerobits/262143-16          914147        719542        -21.29%
BenchmarkDecompress1XNoTable/endnonzero/100-16              446           448           +0.45%
BenchmarkDecompress1XNoTable/endnonzero/10000-16            35223         27629         -21.56%
BenchmarkDecompress1XNoTable/endnonzero/262143-16           918031        720103        -21.56%
BenchmarkDecompress1XNoTable/case1/100-16                   407           331           -18.71%
BenchmarkDecompress1XNoTable/case1/10000-16                 37955         28301         -25.44%
BenchmarkDecompress1XNoTable/case1/262143-16                991910        739995        -25.40%
BenchmarkDecompress1XNoTable/case2/100-16                   408           338           -17.29%
BenchmarkDecompress1XNoTable/case2/10000-16                 37403         28024         -25.08%
BenchmarkDecompress1XNoTable/case2/262143-16                972229        732974        -24.61%
BenchmarkDecompress1XNoTable/case3/100-16                   418           344           -17.79%
BenchmarkDecompress1XNoTable/case3/10000-16                 37588         28130         -25.16%
BenchmarkDecompress1XNoTable/case3/262143-16                977497        735540        -24.75%
BenchmarkDecompress1XNoTable/pngdata.001/100-16             430           379           -11.92%
BenchmarkDecompress1XNoTable/pngdata.001/10000-16           39719         27614         -30.48%
BenchmarkDecompress1XNoTable/pngdata.001/262143-16          1053768       730571        -30.67%
BenchmarkDecompress1XNoTable/normcount2/100-16              416           330           -20.57%
BenchmarkDecompress1XNoTable/normcount2/10000-16            38625         28498         -26.22%
BenchmarkDecompress1XNoTable/normcount2/262143-16           1008971       745795        -26.08%

@klauspost
Copy link
Owner

Good. I will be mostly afk for about a week.

We would require just to replace a variable shifts with shifts by immediate values.
@WojciechMula WojciechMula marked this pull request as ready for review May 19, 2022 07:08
@klauspost klauspost merged commit e77bf31 into klauspost:master May 23, 2022
@klauspost klauspost deleted the huff0-decompress1x-asm branch May 23, 2022 11:37
@klauspost
Copy link
Owner

klauspost commented May 23, 2022

Nice!

BMI (compared to pure amd64 asm) doesn't show any speedup on my system, but also no worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants