Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zstd: Improve decoder memcopy #637

Merged
merged 2 commits into from Jul 4, 2022
Merged

zstd: Improve decoder memcopy #637

merged 2 commits into from Jul 4, 2022

Commits on Jul 4, 2022

  1. zstd: Improve decoder memcopy

    Improve memcopy for small matches. Up to 30% increased throughput, depending on input.
    
    ```
    benchmark                                                                                       old MB/s     new MB/s     speedup
    Benchmark_seqdec_execute/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32        1284.77      1525.03      1.19x
    Benchmark_seqdec_execute/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32       1107.87      1614.28      1.46x
    Benchmark_seqdec_execute/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32               3947.25      4100.49      1.04x
    Benchmark_seqdec_execute/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32        10281.12     10316.14     1.00x
    Benchmark_seqdec_execute/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32        8115.99      8829.85      1.09x
    Benchmark_seqdec_execute/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32         1578.08      2290.47      1.45x
    Benchmark_seqdec_execute/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                  17079.65     16716.41     0.98x
    Benchmark_seqdec_execute/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32     2020.09      2166.56      1.07x
    Benchmark_seqdec_execute/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                   35781.31     35745.53     1.00x
    Benchmark_seqdec_execute/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                    33125.43     32785.93     0.99x
    Benchmark_seqdec_execute/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                19394.38     19643.49     1.01x
    Benchmark_seqdec_execute/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                   10494.30     10653.09     1.02x
    Benchmark_seqdec_execute/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32              7425.77      7506.51      1.01x
    Benchmark_seqdec_execute/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                    2855.17      3396.09      1.19x
    benchmark                                                                                         old MB/s     new MB/s      speedup
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-32                                                    537.74       651.27        1.21x
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-32                                                1500.59      1610.11       1.07x
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-32                                                 410.13       505.82        1.23x
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-32                                                   467.83       601.25        1.29x
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-32                                                 434.53       530.71        1.22x
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-32                                                  433.95       544.87        1.26x
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-32                                                     2860.31      3189.40       1.12x
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-32                                               5336.43      5437.24       1.02x
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-32                                               12327.10     12350.86      1.00x
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-32                                                     660.52       774.52        1.17x
    BenchmarkDecoder_DecoderSmall/html.zst-32                                                         1076.67      1284.53       1.19x
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-32                                                569.30       576.15        1.01x
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-32                                                       812.16       813.72        1.00x
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-32                                                   1943.14      1933.04       0.99x
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-32                                                    712.27       715.46        1.00x
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-32                                                      688.23       775.97        1.13x
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-32                                                    702.87       700.17        1.00x
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-32                                                     717.44       720.89        1.00x
    BenchmarkDecoder_DecodeAll/html_x_4.zst-32                                                        1960.55      1968.90       1.00x
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-32                                                  5981.50      6169.12       1.03x
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-32                                                  13140.18     13145.86      1.00x
    BenchmarkDecoder_DecodeAll/urls.10K.zst-32                                                        983.71       988.16        1.00x
    BenchmarkDecoder_DecodeAll/html.zst-32                                                            1624.80      1624.92       1.00x
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-32                                                   569.84       570.96        1.00x
    BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/fastest-32                                  504.31       622.83        1.24x
    BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/default-32                                  564.68       717.57        1.27x
    BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/better-32                                   615.18       766.33        1.25x
    BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/best-32                                     786.17       857.17        1.09x
    BenchmarkDecoder_DecodeAllFiles/.tracker.bin/fastest-32                                           12860.99     12870.57      1.00x
    BenchmarkDecoder_DecodeAllFiles/.tracker.bin/default-32                                           619.06       617.54        1.00x
    BenchmarkDecoder_DecodeAllFiles/.tracker.bin/better-32                                            630.33       625.20        0.99x
    BenchmarkDecoder_DecodeAllFiles/.tracker.bin/best-32                                              609.12       612.50        1.01x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-32                              658.22       659.45        1.00x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-32                              723.60       729.95        1.01x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-32                               735.73       737.52        1.00x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-32                                 745.43       749.55        1.01x
    BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-32                                                  12801.86     12967.61      1.01x
    BenchmarkDecoder_DecodeAllFiles/e.txt/default-32                                                  680.29       677.69        1.00x
    BenchmarkDecoder_DecodeAllFiles/e.txt/better-32                                                   739.23       733.45        0.99x
    BenchmarkDecoder_DecodeAllFiles/e.txt/best-32                                                     820.16       825.62        1.01x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-32                                      1186.63      1194.87       1.01x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-32                                      1384.74      1412.45       1.02x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-32                                       1104.17      1107.00       1.00x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-32                                         409.59       409.27        1.00x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-32                                         392.32       391.89        1.00x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-32                                         296.47       296.65        1.00x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-32                                          296.52       296.68        1.00x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-32                                            299.85       295.83        0.99x
    BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-32                                               988.75       996.39        1.01x
    BenchmarkDecoder_DecodeAllFiles/html.txt/default-32                                               987.11       989.51        1.00x
    BenchmarkDecoder_DecodeAllFiles/html.txt/better-32                                                1027.64      1038.21       1.01x
    BenchmarkDecoder_DecodeAllFiles/html.txt/best-32                                                  973.41       989.86        1.02x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-32                                                 12976.96     13045.11      1.01x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/default-32                                                 678.88       674.53        0.99x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/better-32                                                  746.38       747.36        1.00x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/best-32                                                    823.52       827.84        1.01x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-32                                            2115.58      2121.84       1.00x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-32                                            1767.98      1779.35       1.01x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-32                                             2306.86      2328.47       1.01x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-32                                               1660.52      1684.65       1.01x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-32                                             13027.08     12999.49      1.00x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-32                                             13054.18     13084.25      1.00x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-32                                              13067.23     13099.47      1.00x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-32                                                13079.77     13104.13      1.00x
    BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/fastest-32                                 10354.84     11838.70      1.14x
    BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/default-32                                 11557.12     13404.78      1.16x
    BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/better-32                                  12644.67     14519.37      1.15x
    BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/best-32                                    15934.00     17312.77      1.09x
    BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/fastest-32                                          35354.57     34836.95      0.99x
    BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/default-32                                          11392.27     11275.11      0.99x
    BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/better-32                                           11793.77     11771.24      1.00x
    BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/best-32                                             11203.91     11142.52      0.99x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-32                             12089.54     11983.77      0.99x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-32                             12604.67     12514.75      0.99x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-32                              13265.79     13152.64      0.99x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-32                                13078.85     12983.91      0.99x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-32                                                 52477.17     52657.54      1.00x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/default-32                                                 11947.06     11809.75      0.99x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/better-32                                                  13184.17     13140.65      1.00x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/best-32                                                    14630.26     14718.01      1.01x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-32                                     3013.25      3088.05       1.02x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-32                                     3125.61      3091.48       0.99x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-32                                      3181.68      3034.74       0.95x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-32                                        3351.22      3526.91       1.05x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-32                                        1188.15      1136.88       0.96x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-32                                        1215.39      1193.99       0.98x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-32                                         1219.20      1206.23       0.99x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-32                                           1216.72      1200.26       0.99x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-32                                              16901.32     17076.26      1.01x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/default-32                                              16819.66     16892.32      1.00x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/better-32                                               17805.12     17873.77      1.00x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/best-32                                                 16916.87     17184.02      1.02x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-32                                                52314.15     51687.88      0.99x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-32                                                11878.94     11778.57      0.99x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-32                                                 13303.16     13162.44      0.99x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-32                                                   14622.76     14717.80      1.01x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-32                                           34134.48     37031.10      1.08x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-32                                           33589.32     35277.28      1.05x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-32                                            43754.89     44761.13      1.02x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-32                                              32422.22     34107.42      1.05x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-32                                            52706.00     52396.81      0.99x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-32                                            52527.76     52048.36      0.99x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-32                                             52177.25     52688.64      1.01x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-32                                               52443.28     52799.86      1.01x
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32                                               13992.47     13994.15      1.00x
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32                                           34107.95     34221.23      1.00x
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32                                            12012.34     11976.30      1.00x
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32                                              12630.22     13384.70      1.06x
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32                                            12327.02     12251.04      0.99x
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32                                             11932.73     11896.92      1.00x
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32                                                31233.38     36258.56      1.16x
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32                                          97435.31     100317.73     1.03x
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32                                          62247.22     62306.36      1.00x
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32                                                18659.58     18592.14      1.00x
    BenchmarkDecoder_DecodeAllParallel/html.zst-32                                                    28464.78     28519.30      1.00x
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32                                           3114.03      3297.01       1.06x
    BenchmarkDecoderSilesia/multithreaded-writer-32                                                   1099.69      1104.92       1.00x
    BenchmarkDecoderSilesia/multithreaded-writer-himem-32                                             1093.10      1102.98       1.01x
    BenchmarkDecoderSilesia/singlethreaded-writer-32                                                  803.85       818.55        1.02x
    BenchmarkDecoderSilesia/singlethreaded-writerto-32                                                812.83       828.19        1.02x
    BenchmarkDecoderSilesia/singlethreaded-himem-32                                                   813.14       828.32        1.02x
    BenchmarkDecoderEnwik9/multithreaded-writer-32                                                    877.55       996.49        1.14x
    BenchmarkDecoderEnwik9/multithreaded-writer-himem-32                                              961.20       1036.76       1.08x
    BenchmarkDecoderEnwik9/singlethreaded-writer-32                                                   632.07       631.96        1.00x
    BenchmarkDecoderEnwik9/singlethreaded-writerto-32                                                 634.62       634.52        1.00x
    BenchmarkDecoderEnwik9/singlethreaded-himem-32                                                    763.68       758.40        0.99x
    BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-32           1626.86      1730.88       1.06x
    BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-himem-32     2299.80      2375.04       1.03x
    BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writer-32          1221.34      1221.43       1.00x
    BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writerto-32        1236.18      1237.97       1.00x
    BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-himem-32           1749.21      1754.96       1.00x
    BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-32               839.51       933.63        1.11x
    BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-himem-32         1055.54      1100.37       1.04x
    BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writer-32              574.91       613.88        1.07x
    BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writerto-32            579.19       618.72        1.07x
    BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-himem-32               780.67       867.96        1.11x
    ```
    klauspost committed Jul 4, 2022
    Copy the full SHA
    6251e7e View commit details
    Browse the repository at this point in the history
  2. Copy the full SHA
    5e0adf7 View commit details
    Browse the repository at this point in the history