Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: Improve decoder memcopy #637

Merged
merged 2 commits into from Jul 4, 2022
Merged

zstd: Improve decoder memcopy #637

merged 2 commits into from Jul 4, 2022

Conversation

klauspost
Copy link
Owner

Up to 25% faster decodes, depending on contents.

Use s2 memcopier and eliminate a zero check.

benchmark                                                                                       old MB/s     new MB/s     speedup
Benchmark_seqdec_execute/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32        1284.77      1493.64      1.16x
Benchmark_seqdec_execute/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32       1107.87      1580.86      1.43x
Benchmark_seqdec_execute/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32               3947.25      4163.99      1.05x
Benchmark_seqdec_execute/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32        10281.12     10375.47     1.01x
Benchmark_seqdec_execute/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32        8115.99      8862.70      1.09x
Benchmark_seqdec_execute/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32         1578.08      2306.80      1.46x
Benchmark_seqdec_execute/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                  17079.65     15875.41     0.93x
Benchmark_seqdec_execute/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32     2020.09      2077.16      1.03x
Benchmark_seqdec_execute/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                   35781.31     35736.03     1.00x
Benchmark_seqdec_execute/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                    33125.43     32874.37     0.99x
Benchmark_seqdec_execute/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                19394.38     19785.45     1.02x
Benchmark_seqdec_execute/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                   10494.30     10229.09     0.97x
Benchmark_seqdec_execute/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32              7425.77      8034.31      1.08x
Benchmark_seqdec_execute/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                    2855.17      3336.71      1.17x
BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-32                                                    537.74       653.10        1.21x
BenchmarkDecoder_DecoderSmall/geo.protodata.zst-32                                                1500.59      1610.70       1.07x
BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-32                                                 410.13       508.09        1.24x
BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-32                                                   467.83       602.22        1.29x
BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-32                                                 434.53       528.57        1.22x
BenchmarkDecoder_DecoderSmall/alice29.txt.zst-32                                                  433.95       544.60        1.25x
BenchmarkDecoder_DecoderSmall/html_x_4.zst-32                                                     2860.31      3199.64       1.12x
BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-32                                               5336.43      5422.59       1.02x
BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-32                                               12327.10     12324.96      1.00x
BenchmarkDecoder_DecoderSmall/urls.10K.zst-32                                                     660.52       769.09        1.16x
BenchmarkDecoder_DecoderSmall/html.zst-32                                                         1076.67      1286.06       1.19x
BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-32                                                569.30       574.46        1.01x
BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-32                                                       812.16       822.43        1.01x
BenchmarkDecoder_DecodeAll/geo.protodata.zst-32                                                   1943.14      1906.88       0.98x
BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-32                                                    712.27       723.91        1.02x
BenchmarkDecoder_DecodeAll/lcet10.txt.zst-32                                                      688.23       781.85        1.14x
BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-32                                                    702.87       714.37        1.02x
BenchmarkDecoder_DecodeAll/alice29.txt.zst-32                                                     717.44       738.78        1.03x
BenchmarkDecoder_DecodeAll/html_x_4.zst-32                                                        1960.55      1975.63       1.01x
BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-32                                                  5981.50      6118.97       1.02x
BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-32                                                  13140.18     13126.95      1.00x
BenchmarkDecoder_DecodeAll/urls.10K.zst-32                                                        983.71       979.34        1.00x
BenchmarkDecoder_DecodeAll/html.zst-32                                                            1624.80      1585.31       0.98x
BenchmarkDecoder_DecodeAll/comp-data.bin.zst-32                                                   569.84       572.56        1.00x
BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/fastest-32                                  504.31       623.48        1.24x
BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/default-32                                  564.68       723.22        1.28x
BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/better-32                                   615.18       781.33        1.27x
BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/best-32                                     786.17       862.88        1.10x
BenchmarkDecoder_DecodeAllFiles/.tracker.bin/fastest-32                                           12860.99     12908.39      1.00x
BenchmarkDecoder_DecodeAllFiles/.tracker.bin/default-32                                           619.06       626.95        1.01x
BenchmarkDecoder_DecodeAllFiles/.tracker.bin/better-32                                            630.33       628.85        1.00x
BenchmarkDecoder_DecodeAllFiles/.tracker.bin/best-32                                              609.12       616.50        1.01x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-32                              658.22       669.16        1.02x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-32                              723.60       741.86        1.03x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-32                               735.73       750.40        1.02x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-32                                 745.43       764.97        1.03x
BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-32                                                  12801.86     13043.13      1.02x
BenchmarkDecoder_DecodeAllFiles/e.txt/default-32                                                  680.29       683.65        1.00x
BenchmarkDecoder_DecodeAllFiles/e.txt/better-32                                                   739.23       748.08        1.01x
BenchmarkDecoder_DecodeAllFiles/e.txt/best-32                                                     820.16       828.45        1.01x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-32                                      1186.63      1177.03       0.99x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-32                                      1384.74      1383.55       1.00x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-32                                       1104.17      1114.92       1.01x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-32                                         409.59       409.66        1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-32                                         392.32       390.94        1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-32                                         296.47       295.87        1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-32                                          296.52       296.60        1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-32                                            299.85       298.91        1.00x
BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-32                                               988.75       999.28        1.01x
BenchmarkDecoder_DecodeAllFiles/html.txt/default-32                                               987.11       1018.97       1.03x
BenchmarkDecoder_DecodeAllFiles/html.txt/better-32                                                1027.64      1030.76       1.00x
BenchmarkDecoder_DecodeAllFiles/html.txt/best-32                                                  973.41       989.37        1.02x
BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-32                                                 12976.96     12976.25      1.00x
BenchmarkDecoder_DecodeAllFiles/pi.txt/default-32                                                 678.88       680.77        1.00x
BenchmarkDecoder_DecodeAllFiles/pi.txt/better-32                                                  746.38       751.28        1.01x
BenchmarkDecoder_DecodeAllFiles/pi.txt/best-32                                                    823.52       833.27        1.01x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-32                                            2115.58      2106.14       1.00x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-32                                            1767.98      1767.57       1.00x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-32                                             2306.86      2288.16       0.99x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-32                                               1660.52      1667.53       1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-32                                             13027.08     13044.50      1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-32                                             13054.18     13081.06      1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-32                                              13067.23     13066.65      1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-32                                                13079.77     13061.36      1.00x
BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/fastest-32                                 10354.84     11876.83      1.15x
BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/default-32                                 11557.12     13415.35      1.16x
BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/better-32                                  12644.67     14515.52      1.15x
BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/best-32                                    15934.00     17307.06      1.09x
BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/fastest-32                                          35354.57     35307.64      1.00x
BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/default-32                                          11392.27     11353.17      1.00x
BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/better-32                                           11793.77     11733.41      0.99x
BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/best-32                                             11203.91     11174.37      1.00x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-32                             12089.54     12097.65      1.00x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-32                             12604.67     12647.83      1.00x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-32                              13265.79     13275.92      1.00x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-32                                13078.85     13130.26      1.00x
BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-32                                                 52477.17     51848.17      0.99x
BenchmarkDecoder_DecodeAllFilesP/e.txt/default-32                                                 11947.06     11922.24      1.00x
BenchmarkDecoder_DecodeAllFilesP/e.txt/better-32                                                  13184.17     13223.10      1.00x
BenchmarkDecoder_DecodeAllFilesP/e.txt/best-32                                                    14630.26     14702.42      1.00x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-32                                     3013.25      3025.30       1.00x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-32                                     3125.61      2976.92       0.95x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-32                                      3181.68      3162.28       0.99x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-32                                        3351.22      3372.69       1.01x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-32                                        1188.15      1147.96       0.97x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-32                                        1215.39      1156.01       0.95x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-32                                         1219.20      1177.16       0.97x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-32                                           1216.72      1170.21       0.96x
BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-32                                              16901.32     17180.70      1.02x
BenchmarkDecoder_DecodeAllFilesP/html.txt/default-32                                              16819.66     16997.40      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/better-32                                               17805.12     17946.54      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/best-32                                                 16916.87     17294.25      1.02x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-32                                                52314.15     52657.88      1.01x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-32                                                11878.94     11796.12      0.99x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-32                                                 13303.16     13216.13      0.99x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-32                                                   14622.76     14697.47      1.01x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-32                                           34134.48     36542.10      1.07x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-32                                           33589.32     34982.31      1.04x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-32                                            43754.89     44323.18      1.01x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-32                                              32422.22     33882.10      1.05x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-32                                            52706.00     52863.28      1.00x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-32                                            52527.76     52319.50      1.00x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-32                                             52177.25     52506.60      1.01x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-32                                               52443.28     52402.30      1.00x
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32                                               13992.47     14134.26      1.01x
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32                                           34107.95     33812.99      0.99x
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32                                            12012.34     12123.74      1.01x
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32                                              12630.22     13586.02      1.08x
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32                                            12327.02     12374.31      1.00x
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32                                             11932.73     12059.89      1.01x
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32                                                31233.38     36076.61      1.16x
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32                                          97435.31     100702.06     1.03x
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32                                          62247.22     61824.88      0.99x
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32                                                18659.58     18502.10      0.99x
BenchmarkDecoder_DecodeAllParallel/html.zst-32                                                    28464.78     28500.16      1.00x
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32                                           3114.03      3132.86       1.01x
BenchmarkDecoderSilesia/multithreaded-writer-32                                                   1099.69      1059.67       0.96x
BenchmarkDecoderSilesia/multithreaded-writer-himem-32                                             1093.10      1054.67       0.96x
BenchmarkDecoderSilesia/singlethreaded-writer-32                                                  803.85       819.16        1.02x
BenchmarkDecoderSilesia/singlethreaded-writerto-32                                                812.83       828.44        1.02x
BenchmarkDecoderSilesia/singlethreaded-himem-32                                                   813.14       824.41        1.01x
BenchmarkDecoderEnwik9/multithreaded-writer-32                                                    877.55       981.68        1.12x
BenchmarkDecoderEnwik9/multithreaded-writer-himem-32                                              961.20       1013.19       1.05x
BenchmarkDecoderEnwik9/singlethreaded-writer-32                                                   632.07       629.32        1.00x
BenchmarkDecoderEnwik9/singlethreaded-writerto-32                                                 634.62       635.76        1.00x
BenchmarkDecoderEnwik9/singlethreaded-himem-32                                                    763.68       755.70        0.99x
BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-32           1626.86      1658.42       1.02x
BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-himem-32     2299.80      2305.08       1.00x
BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writer-32          1221.34      1207.19       0.99x
BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writerto-32        1236.18      1224.88       0.99x
BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-himem-32           1749.21      1729.03       0.99x
BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-32               839.51       922.30        1.10x
BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-himem-32         1055.54      1093.19       1.04x
BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writer-32              574.91       614.02        1.07x
BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writerto-32            579.19       618.97        1.07x
BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-himem-32               780.67       863.05        1.11x 

Improve memcopy for small matches. Up to 30% increased throughput, depending on input.

```
benchmark                                                                                       old MB/s     new MB/s     speedup
Benchmark_seqdec_execute/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32        1284.77      1525.03      1.19x
Benchmark_seqdec_execute/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32       1107.87      1614.28      1.46x
Benchmark_seqdec_execute/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32               3947.25      4100.49      1.04x
Benchmark_seqdec_execute/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32        10281.12     10316.14     1.00x
Benchmark_seqdec_execute/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32        8115.99      8829.85      1.09x
Benchmark_seqdec_execute/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32         1578.08      2290.47      1.45x
Benchmark_seqdec_execute/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                  17079.65     16716.41     0.98x
Benchmark_seqdec_execute/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32     2020.09      2166.56      1.07x
Benchmark_seqdec_execute/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                   35781.31     35745.53     1.00x
Benchmark_seqdec_execute/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                    33125.43     32785.93     0.99x
Benchmark_seqdec_execute/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                19394.38     19643.49     1.01x
Benchmark_seqdec_execute/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                   10494.30     10653.09     1.02x
Benchmark_seqdec_execute/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32              7425.77      7506.51      1.01x
Benchmark_seqdec_execute/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                    2855.17      3396.09      1.19x
benchmark                                                                                         old MB/s     new MB/s      speedup
BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-32                                                    537.74       651.27        1.21x
BenchmarkDecoder_DecoderSmall/geo.protodata.zst-32                                                1500.59      1610.11       1.07x
BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-32                                                 410.13       505.82        1.23x
BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-32                                                   467.83       601.25        1.29x
BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-32                                                 434.53       530.71        1.22x
BenchmarkDecoder_DecoderSmall/alice29.txt.zst-32                                                  433.95       544.87        1.26x
BenchmarkDecoder_DecoderSmall/html_x_4.zst-32                                                     2860.31      3189.40       1.12x
BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-32                                               5336.43      5437.24       1.02x
BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-32                                               12327.10     12350.86      1.00x
BenchmarkDecoder_DecoderSmall/urls.10K.zst-32                                                     660.52       774.52        1.17x
BenchmarkDecoder_DecoderSmall/html.zst-32                                                         1076.67      1284.53       1.19x
BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-32                                                569.30       576.15        1.01x
BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-32                                                       812.16       813.72        1.00x
BenchmarkDecoder_DecodeAll/geo.protodata.zst-32                                                   1943.14      1933.04       0.99x
BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-32                                                    712.27       715.46        1.00x
BenchmarkDecoder_DecodeAll/lcet10.txt.zst-32                                                      688.23       775.97        1.13x
BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-32                                                    702.87       700.17        1.00x
BenchmarkDecoder_DecodeAll/alice29.txt.zst-32                                                     717.44       720.89        1.00x
BenchmarkDecoder_DecodeAll/html_x_4.zst-32                                                        1960.55      1968.90       1.00x
BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-32                                                  5981.50      6169.12       1.03x
BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-32                                                  13140.18     13145.86      1.00x
BenchmarkDecoder_DecodeAll/urls.10K.zst-32                                                        983.71       988.16        1.00x
BenchmarkDecoder_DecodeAll/html.zst-32                                                            1624.80      1624.92       1.00x
BenchmarkDecoder_DecodeAll/comp-data.bin.zst-32                                                   569.84       570.96        1.00x
BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/fastest-32                                  504.31       622.83        1.24x
BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/default-32                                  564.68       717.57        1.27x
BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/better-32                                   615.18       766.33        1.25x
BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/best-32                                     786.17       857.17        1.09x
BenchmarkDecoder_DecodeAllFiles/.tracker.bin/fastest-32                                           12860.99     12870.57      1.00x
BenchmarkDecoder_DecodeAllFiles/.tracker.bin/default-32                                           619.06       617.54        1.00x
BenchmarkDecoder_DecodeAllFiles/.tracker.bin/better-32                                            630.33       625.20        0.99x
BenchmarkDecoder_DecodeAllFiles/.tracker.bin/best-32                                              609.12       612.50        1.01x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-32                              658.22       659.45        1.00x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-32                              723.60       729.95        1.01x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-32                               735.73       737.52        1.00x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-32                                 745.43       749.55        1.01x
BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-32                                                  12801.86     12967.61      1.01x
BenchmarkDecoder_DecodeAllFiles/e.txt/default-32                                                  680.29       677.69        1.00x
BenchmarkDecoder_DecodeAllFiles/e.txt/better-32                                                   739.23       733.45        0.99x
BenchmarkDecoder_DecodeAllFiles/e.txt/best-32                                                     820.16       825.62        1.01x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-32                                      1186.63      1194.87       1.01x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-32                                      1384.74      1412.45       1.02x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-32                                       1104.17      1107.00       1.00x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-32                                         409.59       409.27        1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-32                                         392.32       391.89        1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-32                                         296.47       296.65        1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-32                                          296.52       296.68        1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-32                                            299.85       295.83        0.99x
BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-32                                               988.75       996.39        1.01x
BenchmarkDecoder_DecodeAllFiles/html.txt/default-32                                               987.11       989.51        1.00x
BenchmarkDecoder_DecodeAllFiles/html.txt/better-32                                                1027.64      1038.21       1.01x
BenchmarkDecoder_DecodeAllFiles/html.txt/best-32                                                  973.41       989.86        1.02x
BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-32                                                 12976.96     13045.11      1.01x
BenchmarkDecoder_DecodeAllFiles/pi.txt/default-32                                                 678.88       674.53        0.99x
BenchmarkDecoder_DecodeAllFiles/pi.txt/better-32                                                  746.38       747.36        1.00x
BenchmarkDecoder_DecodeAllFiles/pi.txt/best-32                                                    823.52       827.84        1.01x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-32                                            2115.58      2121.84       1.00x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-32                                            1767.98      1779.35       1.01x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-32                                             2306.86      2328.47       1.01x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-32                                               1660.52      1684.65       1.01x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-32                                             13027.08     12999.49      1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-32                                             13054.18     13084.25      1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-32                                              13067.23     13099.47      1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-32                                                13079.77     13104.13      1.00x
BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/fastest-32                                 10354.84     11838.70      1.14x
BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/default-32                                 11557.12     13404.78      1.16x
BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/better-32                                  12644.67     14519.37      1.15x
BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/best-32                                    15934.00     17312.77      1.09x
BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/fastest-32                                          35354.57     34836.95      0.99x
BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/default-32                                          11392.27     11275.11      0.99x
BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/better-32                                           11793.77     11771.24      1.00x
BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/best-32                                             11203.91     11142.52      0.99x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-32                             12089.54     11983.77      0.99x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-32                             12604.67     12514.75      0.99x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-32                              13265.79     13152.64      0.99x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-32                                13078.85     12983.91      0.99x
BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-32                                                 52477.17     52657.54      1.00x
BenchmarkDecoder_DecodeAllFilesP/e.txt/default-32                                                 11947.06     11809.75      0.99x
BenchmarkDecoder_DecodeAllFilesP/e.txt/better-32                                                  13184.17     13140.65      1.00x
BenchmarkDecoder_DecodeAllFilesP/e.txt/best-32                                                    14630.26     14718.01      1.01x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-32                                     3013.25      3088.05       1.02x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-32                                     3125.61      3091.48       0.99x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-32                                      3181.68      3034.74       0.95x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-32                                        3351.22      3526.91       1.05x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-32                                        1188.15      1136.88       0.96x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-32                                        1215.39      1193.99       0.98x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-32                                         1219.20      1206.23       0.99x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-32                                           1216.72      1200.26       0.99x
BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-32                                              16901.32     17076.26      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/default-32                                              16819.66     16892.32      1.00x
BenchmarkDecoder_DecodeAllFilesP/html.txt/better-32                                               17805.12     17873.77      1.00x
BenchmarkDecoder_DecodeAllFilesP/html.txt/best-32                                                 16916.87     17184.02      1.02x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-32                                                52314.15     51687.88      0.99x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-32                                                11878.94     11778.57      0.99x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-32                                                 13303.16     13162.44      0.99x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-32                                                   14622.76     14717.80      1.01x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-32                                           34134.48     37031.10      1.08x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-32                                           33589.32     35277.28      1.05x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-32                                            43754.89     44761.13      1.02x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-32                                              32422.22     34107.42      1.05x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-32                                            52706.00     52396.81      0.99x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-32                                            52527.76     52048.36      0.99x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-32                                             52177.25     52688.64      1.01x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-32                                               52443.28     52799.86      1.01x
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32                                               13992.47     13994.15      1.00x
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32                                           34107.95     34221.23      1.00x
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32                                            12012.34     11976.30      1.00x
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32                                              12630.22     13384.70      1.06x
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32                                            12327.02     12251.04      0.99x
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32                                             11932.73     11896.92      1.00x
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32                                                31233.38     36258.56      1.16x
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32                                          97435.31     100317.73     1.03x
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32                                          62247.22     62306.36      1.00x
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32                                                18659.58     18592.14      1.00x
BenchmarkDecoder_DecodeAllParallel/html.zst-32                                                    28464.78     28519.30      1.00x
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32                                           3114.03      3297.01       1.06x
BenchmarkDecoderSilesia/multithreaded-writer-32                                                   1099.69      1104.92       1.00x
BenchmarkDecoderSilesia/multithreaded-writer-himem-32                                             1093.10      1102.98       1.01x
BenchmarkDecoderSilesia/singlethreaded-writer-32                                                  803.85       818.55        1.02x
BenchmarkDecoderSilesia/singlethreaded-writerto-32                                                812.83       828.19        1.02x
BenchmarkDecoderSilesia/singlethreaded-himem-32                                                   813.14       828.32        1.02x
BenchmarkDecoderEnwik9/multithreaded-writer-32                                                    877.55       996.49        1.14x
BenchmarkDecoderEnwik9/multithreaded-writer-himem-32                                              961.20       1036.76       1.08x
BenchmarkDecoderEnwik9/singlethreaded-writer-32                                                   632.07       631.96        1.00x
BenchmarkDecoderEnwik9/singlethreaded-writerto-32                                                 634.62       634.52        1.00x
BenchmarkDecoderEnwik9/singlethreaded-himem-32                                                    763.68       758.40        0.99x
BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-32           1626.86      1730.88       1.06x
BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-himem-32     2299.80      2375.04       1.03x
BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writer-32          1221.34      1221.43       1.00x
BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writerto-32        1236.18      1237.97       1.00x
BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-himem-32           1749.21      1754.96       1.00x
BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-32               839.51       933.63        1.11x
BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-himem-32         1055.54      1100.37       1.04x
BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writer-32              574.91       613.88        1.07x
BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writerto-32            579.19       618.72        1.07x
BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-himem-32               780.67       867.96        1.11x
```
@klauspost
Copy link
Owner Author

After #636 from @greatroar I tried copying the s2 memcopy.

This yields a significant speedup in most cases.

@WojciechMula
Copy link
Contributor

WojciechMula commented Jul 4, 2022

That is a great improvement! I will post later benchmarks from Ice Lake.

There are some nice speed-ups. :) In the case of our commercial sample data, there's almost no change.

benchmark                                                                                  old ns/op     new ns/op     delta
BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                                             3420243       2785972       -18.54%
BenchmarkDecoder_DecoderSmall/geo.protodata.zst-16                                         687982        593174        -13.78%
BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-16                                          11973188      9869380       -17.57%
BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                                            9319551       7357375       -21.05%
BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                                          2882879       2351238       -18.44%
BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                                           3559313       2793625       -21.51%
BenchmarkDecoder_DecoderSmall/html_x_4.zst-16                                              1289054       1139395       -11.61%
BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-16                                        181931        181594        -0.19%
BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                                        132526        129021        -2.64%
BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                                              10232270      8951466       -12.52%
BenchmarkDecoder_DecoderSmall/html.zst-16                                                  785829        668029        -14.99%
BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                                         61849         61598         -0.41%
BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                                                269558        264113        -2.02%
BenchmarkDecoder_DecodeAll/geo.protodata.zst-16                                            64605         60077         -7.01%
BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-16                                             826464        836816        +1.25%
BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                                               772832        677656        -12.32%
BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                                             218810        216881        -0.88%
BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                                              261775        260452        -0.51%
BenchmarkDecoder_DecodeAll/html_x_4.zst-16                                                 243367        213273        -12.37%
BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-16                                           19470         18806         -3.41%
BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                                           11303         11307         +0.04%
BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                                 863419        856596        -0.79%
BenchmarkDecoder_DecodeAll/html.zst-16                                                     69432         66101         -4.80%
BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                                            7685          7659          -0.34%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16                       698699        702047        +0.48%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16                       657808        659895        +0.32%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16                        635773        634547        -0.19%
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16                          638187        639621        +0.22%
BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                                           9239          9236          -0.03%
BenchmarkDecoder_DecodeAllFiles/e.txt/default-16                                           174801        178546        +2.14%
BenchmarkDecoder_DecodeAllFiles/e.txt/better-16                                            157327        154045        -2.09%
BenchmarkDecoder_DecodeAllFiles/e.txt/best-16                                              140692        136319        -3.11%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16                               3664          3538          -3.44%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-16                               3125          3021          -3.33%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-16                                3991          3823          -4.21%
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                                  10861         10936         +0.69%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                                  4228          4210          -0.43%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                                  5978          5943          -0.59%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                                   5965          5928          -0.62%
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                                     5939          5870          -1.16%
BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                                        46840         46469         -0.79%
BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                                        47567         47128         -0.92%
BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                                         45408         44938         -1.04%
BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                                           48173         46881         -2.68%
BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                                          9239          9239          +0.00%
BenchmarkDecoder_DecodeAllFiles/pi.txt/default-16                                          176377        179247        +1.63%
BenchmarkDecoder_DecodeAllFiles/pi.txt/better-16                                           158195        153768        -2.80%
BenchmarkDecoder_DecodeAllFiles/pi.txt/best-16                                             140408        136274        -2.94%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-16                                     27617         26013         -5.81%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-16                                     32903         31606         -3.94%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-16                                      24761         24363         -1.61%
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-16                                        34092         32330         -5.17%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                                      9232          9232          +0.00%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                                      9184          9182          -0.02%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                                       9184          9180          -0.04%
BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                                         9185          9187          +0.02%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16                      85259         83689         -1.84%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16                      84780         82942         -2.17%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16                       78697         77294         -1.78%
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16                         81798         80130         -2.04%
BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                                          1039          1049          +0.96%
BenchmarkDecoder_DecodeAllFilesP/e.txt/default-16                                          23475         23059         -1.77%
BenchmarkDecoder_DecodeAllFilesP/e.txt/better-16                                           20022         19455         -2.83%
BenchmarkDecoder_DecodeAllFilesP/e.txt/best-16                                             16360         15836         -3.20%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16                              589           476           -19.30%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-16                              541           432           -20.09%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-16                               467           479           +2.53%
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                                 855           860           +0.49%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                                 577           573           -0.78%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                                 658           656           -0.21%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                                  653           648           -0.81%
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                                    665           658           -1.04%
BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                                       6928          6843          -1.23%
BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                                       7020          6923          -1.38%
BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                                        6673          6586          -1.30%
BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                                          7016          6899          -1.67%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                                         1039          1046          +0.67%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-16                                         23609         23246         -1.54%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-16                                          19911         19325         -2.94%
BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-16                                            16353         15835         -3.17%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-16                                    3566          3565          -0.03%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-16                                    3625          3603          -0.61%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-16                                     2848          2824          -0.84%
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-16                                       3799          3781          -0.47%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                                     1040          1039          -0.10%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                                     1042          1043          +0.10%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                                      1034          1040          +0.58%
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                                        1049          1061          +1.14%
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                                        34647         34006         -1.85%
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16                                    8738          8517          -2.53%
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16                                     110366        108192        -1.97%
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                                       91375         84007         -8.06%
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                                     27882         27334         -1.97%
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                                      36699         35899         -2.18%
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                                         30928         30682         -0.80%
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16                                   2616          2508          -4.13%
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                                   1249          1257          +0.64%
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                                         95195         95337         +0.15%
BenchmarkDecoder_DecodeAllParallel/html.zst-16                                             9326          9159          -1.79%
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                                    1058          1052          -0.57%

benchmark                                                                                  old MB/s     new MB/s     speedup
BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-16                                             431.13       529.28       1.23x
BenchmarkDecoder_DecoderSmall/geo.protodata.zst-16                                         1378.97      1599.37      1.16x
BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-16                                          321.96       390.59       1.21x
BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-16                                            366.33       464.03       1.27x
BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-16                                          347.37       425.92       1.23x
BenchmarkDecoder_DecoderSmall/alice29.txt.zst-16                                           341.84       435.53       1.27x
BenchmarkDecoder_DecoderSmall/html_x_4.zst-16                                              2542.02      2875.91      1.13x
BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-16                                        4502.81      4511.15      1.00x
BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-16                                        7430.57      7632.40      1.03x
BenchmarkDecoder_DecoderSmall/urls.10K.zst-16                                              548.92       627.46       1.14x
BenchmarkDecoder_DecoderSmall/html.zst-16                                                  1042.47      1226.29      1.18x
BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-16                                         527.22       529.37       1.00x
BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-16                                                683.79       697.88       1.02x
BenchmarkDecoder_DecodeAll/geo.protodata.zst-16                                            1835.58      1973.94      1.08x
BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-16                                             583.04       575.83       0.99x
BenchmarkDecoder_DecodeAll/lcet10.txt.zst-16                                               552.19       629.75       1.14x
BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-16                                             572.09       577.18       1.01x
BenchmarkDecoder_DecodeAll/alice29.txt.zst-16                                              580.99       583.94       1.01x
BenchmarkDecoder_DecodeAll/html_x_4.zst-16                                                 1683.05      1920.55      1.14x
BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-16                                           5259.34      5444.93      1.04x
BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-16                                           10890.30     10886.38     1.00x
BenchmarkDecoder_DecodeAll/urls.10K.zst-16                                                 813.15       819.62       1.01x
BenchmarkDecoder_DecodeAll/html.zst-16                                                     1474.82      1549.14      1.05x
BenchmarkDecoder_DecodeAll/comp-data.bin.zst-16                                            530.40       532.18       1.00x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-16                       555.27       552.62       1.00x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-16                       589.78       587.92       1.00x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-16                        610.22       611.40       1.00x
BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-16                          607.92       606.55       1.00x
BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-16                                           10823.83     10827.85     1.00x
BenchmarkDecoder_DecodeAllFiles/e.txt/default-16                                           572.10       560.10       0.98x
BenchmarkDecoder_DecodeAllFiles/e.txt/better-16                                            635.64       649.18       1.02x
BenchmarkDecoder_DecodeAllFiles/e.txt/best-16                                              710.80       733.60       1.03x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-16                               1123.24      1163.36      1.04x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-16                               1317.04      1362.41      1.03x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-16                                1031.41      1076.72      1.04x
BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-16                                  378.96       376.39       0.99x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-16                                  366.14       367.73       1.00x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-16                                  258.95       260.47       1.01x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-16                                   259.52       261.14       1.01x
BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-16                                     260.65       263.71       1.01x
BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-16                                        949.55       957.13       1.01x
BenchmarkDecoder_DecodeAllFiles/html.txt/default-16                                        935.03       943.75       1.01x
BenchmarkDecoder_DecodeAllFiles/html.txt/better-16                                         979.50       989.74       1.01x
BenchmarkDecoder_DecodeAllFiles/html.txt/best-16                                           923.28       948.72       1.03x
BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-16                                          10823.55     10824.59     1.00x
BenchmarkDecoder_DecodeAllFiles/pi.txt/default-16                                          566.98       557.91       0.98x
BenchmarkDecoder_DecodeAllFiles/pi.txt/better-16                                           632.15       650.35       1.03x
BenchmarkDecoder_DecodeAllFiles/pi.txt/best-16                                             712.23       733.84       1.03x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-16                                     1853.90      1968.25      1.06x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-16                                     1556.07      1619.97      1.04x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-16                                      2067.78      2101.52      1.02x
BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-16                                        1501.84      1583.67      1.05x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-16                                      10832.18     10832.50     1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-16                                      10888.46     10891.68     1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-16                                       10889.17     10893.90     1.00x
BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-16                                         10887.28     10885.42     1.00x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-16                      4550.44      4635.78      1.02x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-16                      4576.11      4677.52      1.02x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-16                       4929.88      5019.35      1.02x
BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-16                         4742.94      4841.66      1.02x
BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-16                                          96258.29     95359.40     0.99x
BenchmarkDecoder_DecodeAllFilesP/e.txt/default-16                                          4259.98      4336.91      1.02x
BenchmarkDecoder_DecodeAllFilesP/e.txt/better-16                                           4994.65      5140.34      1.03x
BenchmarkDecoder_DecodeAllFilesP/e.txt/best-16                                             6112.48      6314.78      1.03x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-16                              6985.59      8656.83      1.24x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-16                              7607.90      9521.75      1.25x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-16                               8814.97      8597.98      0.98x
BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-16                                 4812.57      4788.82      1.00x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-16                                 2681.86      2702.90      1.01x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-16                                 2353.17      2358.19      1.00x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-16                                  2371.29      2390.80      1.01x
BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-16                                    2328.26      2352.45      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-16                                       6420.09      6499.22      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/default-16                                       6336.10      6424.62      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/better-16                                        6665.12      6752.76      1.01x
BenchmarkDecoder_DecodeAllFilesP/html.txt/best-16                                          6339.65      6447.23      1.02x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-16                                         96242.41     95626.58     0.99x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-16                                         4235.80      4302.00      1.02x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-16                                          5022.44      5174.70      1.03x
BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-16                                            6115.32      6315.40      1.03x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-16                                    14358.81     14360.34     1.00x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-16                                    14123.84     14211.15     1.01x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-16                                     17978.35     18130.61     1.01x
BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-16                                       13478.06     13542.70     1.00x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-16                                     96117.34     96263.32     1.00x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-16                                     95968.57     95835.86     1.00x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-16                                      96673.38     96199.72     1.00x
BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-16                                        95308.24     94228.38     0.99x
BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                                        5319.98      5420.20      1.02x
BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16                                    13572.17     13924.07     1.03x
BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16                                     4366.02      4453.77      1.02x
BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16                                       4670.38      5079.98      1.09x
BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16                                     4489.62      4579.56      1.02x
BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16                                      4144.22      4236.55      1.02x
BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                                         13243.57     13349.87     1.01x
BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16                                   39137.58     40830.84     1.04x
BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16                                   98576.26     97931.36     0.99x
BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                                         7375.26      7364.26      1.00x
BenchmarkDecoder_DecodeAllParallel/html.zst-16                                             10980.01     11180.79     1.02x
BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16                                    3850.91      3873.21      1.01x

@klauspost
Copy link
Owner Author

Running fuzz test on these two changes for a couple of hours.

@klauspost
Copy link
Owner Author

@WojciechMula Great to see it is a win across platforms, and no regressions.

4 hours of fuzz testing makes it seem fine.

@klauspost klauspost merged commit b16a9af into master Jul 4, 2022
@klauspost klauspost deleted the zstd-improve-memcopy branch July 4, 2022 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants