Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET 8 Update: Hardware Intrinsics #1744

Closed
wants to merge 28 commits into from

Conversation

pCYSl5EDgo
Copy link
Contributor

@pCYSl5EDgo pCYSl5EDgo commented Jan 24, 2024

This is a follow up pull request of #988.

Goals

Improve (U?Int16|32|64)|Single|Double|BooleanArrayFormatter with SIMD instruction and make them about twice as fast in .NET 8.
List<T>, ArraySegment<T> and (ReadOnly)?Memory<T>s can also be accelerated by this.

Without SIMD in .NET 6, the performance is generally improved by this proposal.

History

3 years have passed and .NET 7 introduced many convenient SIMD Hardware Intrinsics as I explained in this Japanese article.

SIMD Intrinsics between .NET Core3.1 and .NET 6 required fixed statement and unsafe pointer operation.
In .NET 7, Vector.LoadUnsafe(ref T source) emerged which requires reference of type T.
No, well, you end up having to go through the pseudo-pointer operations with the Unsafe class, but it is an advantage that fixed statement is unnecessary.

.NET 7 also added a lot of crossplatform SIMD instructions. There is no need to write a lot of platform dependent branches any more!

Changes

Finally, I am now able to write code that is (and I hope (must be)) more understandable to others than it was before.

I ran BenchmarkDotnet on my machine and found that this SIMD improvement performed about the same as the previous implementation on short arrays, which SIMD does not do well, and 2 to 5 times better on long arrays, which SIMD does well.

Annotation

This Draft Pull Request is for performance measurement and is not intended to be actually merged.
I will prepare a clean commit log Pull Request when you give the go-ahead.

@pCYSl5EDgo
Copy link
Contributor Author

pCYSl5EDgo commented Jan 24, 2024

Test Codes

My environment. All benchmarks were done without dynamic PGO.

BenchmarkDotNet v0.13.12, Windows 11 (10.0.22621.3007/22H2/2022Update/SunValley2)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK 8.0.101
  [Host] : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
  Scalar : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT
  Vector : .NET 8.0.1 (8.0.123.58001), X64 RyuJIT AVX2
Report column description

Method

  • Old
    • Old implementation from PrimitiveFormatter
  • Simd
    • Simd utilized implementation.

Job

  • Scalar
    • Disabled all Hardware Intrinsics(including SIMD).
  • Vector
    • This is the more common setting.

Setting

0 means length-0 empty array.
1 31 means length-1 array whose values are all 31.
4096 rand means length-4096 array whose values are randomly generated.

Baseline is Old Scalar.
Old Simd is for .NET 6 environment fallback.

ulong[] serialize benchmark report (x5 in .NET8, x3 in .NET6)
Method Job Setting Mean StdDev Median Ratio
Old Scalar 0 69.99 ns 2.672 ns 69.10 ns 1.00
Simd Scalar 0 65.10 ns 1.589 ns 65.46 ns 0.91
Old Vector 0 65.63 ns 1.697 ns 65.67 ns 0.92
Simd Vector 0 65.57 ns 2.616 ns 65.69 ns 0.94
Old Scalar 1 31 68.51 ns 1.829 ns 68.42 ns 1.00
Simd Scalar 1 31 67.83 ns 1.714 ns 68.19 ns 0.99
Old Vector 1 31 68.96 ns 2.999 ns 68.82 ns 1.04
Simd Vector 1 31 69.89 ns 1.454 ns 70.03 ns 1.02
Old Scalar 1 rand 72.91 ns 1.965 ns 72.66 ns 1.00
Simd Scalar 1 rand 68.78 ns 1.508 ns 69.22 ns 0.94
Old Vector 1 rand 71.91 ns 2.296 ns 72.20 ns 0.99
Simd Vector 1 rand 70.58 ns 0.838 ns 70.88 ns 0.97
Old Scalar 3 31 74.68 ns 2.438 ns 74.21 ns 1.00
Simd Scalar 3 31 69.60 ns 1.509 ns 69.77 ns 0.92
Old Vector 3 31 71.66 ns 0.367 ns 71.69 ns 0.94
Simd Vector 3 31 71.68 ns 0.275 ns 71.67 ns 0.94
Old Scalar 3 rand 84.44 ns 1.170 ns 84.56 ns 1.00
Simd Scalar 3 rand 73.85 ns 1.928 ns 73.09 ns 0.88
Old Vector 3 rand 88.19 ns 2.006 ns 87.53 ns 1.05
Simd Vector 3 rand 72.39 ns 0.361 ns 72.30 ns 0.86
Old Scalar 8 31 90.91 ns 1.106 ns 90.98 ns 1.00
Simd Scalar 8 31 74.83 ns 0.446 ns 74.79 ns 0.82
Old Vector 8 31 92.18 ns 0.770 ns 92.16 ns 1.01
Simd Vector 8 31 76.86 ns 2.110 ns 77.51 ns 0.83
Old Scalar 8 rand 113.26 ns 1.413 ns 112.93 ns 1.00
Simd Scalar 8 rand 89.38 ns 1.869 ns 90.17 ns 0.79
Old Vector 8 rand 114.03 ns 1.009 ns 114.25 ns 1.01
Simd Vector 8 rand 77.97 ns 0.398 ns 78.01 ns 0.69
Old Scalar 16 31 117.61 ns 3.644 ns 117.00 ns 1.00
Simd Scalar 16 31 93.51 ns 3.193 ns 93.18 ns 0.80
Old Vector 16 31 126.25 ns 3.977 ns 124.28 ns 1.07
Simd Vector 16 31 85.00 ns 1.931 ns 84.43 ns 0.72
Old Scalar 16 rand 176.77 ns 0.664 ns 176.44 ns 1.00
Simd Scalar 16 rand 113.33 ns 1.419 ns 113.69 ns 0.64
Old Vector 16 rand 183.89 ns 4.498 ns 183.92 ns 1.03
Simd Vector 16 rand 102.02 ns 2.704 ns 101.06 ns 0.58
Old Scalar 31 31 175.91 ns 5.331 ns 174.43 ns 1.00
Simd Scalar 31 31 172.84 ns 1.786 ns 172.41 ns 0.98
Old Vector 31 31 184.64 ns 0.863 ns 184.60 ns 1.04
Simd Vector 31 31 164.80 ns 3.119 ns 163.36 ns 0.93
Old Scalar 31 rand 344.67 ns 1.361 ns 345.21 ns 1.00
Simd Scalar 31 rand 229.80 ns 4.355 ns 231.57 ns 0.67
Old Vector 31 rand 349.79 ns 10.634 ns 348.77 ns 1.04
Simd Vector 31 rand 201.40 ns 2.266 ns 201.88 ns 0.58
Old Scalar 64 31 283.75 ns 1.474 ns 283.66 ns 1.00
Simd Scalar 64 31 238.76 ns 3.792 ns 238.53 ns 0.84
Old Vector 64 31 307.43 ns 9.320 ns 308.81 ns 1.07
Simd Vector 64 31 216.56 ns 4.559 ns 218.48 ns 0.76
Old Scalar 64 rand 665.53 ns 6.890 ns 665.91 ns 1.00
Simd Scalar 64 rand 314.14 ns 3.307 ns 313.13 ns 0.47
Old Vector 64 rand 648.97 ns 12.764 ns 643.44 ns 0.98
Simd Vector 64 rand 267.91 ns 2.130 ns 268.57 ns 0.40
Old Scalar 4096 31 15,686.12 ns 375.212 ns 15,592.89 ns 1.00
Simd Scalar 4096 31 7,471.90 ns 149.382 ns 7,545.45 ns 0.47
Old Vector 4096 31 14,562.62 ns 178.600 ns 14,493.60 ns 0.91
Simd Vector 4096 31 6,570.29 ns 155.094 ns 6,643.96 ns 0.42
Old Scalar 4096 rand 35,522.87 ns 1,241.972 ns 35,233.65 ns 1.00
Simd Scalar 4096 rand 15,208.18 ns 231.761 ns 15,181.23 ns 0.42
Old Vector 4096 rand 35,025.25 ns 1,151.982 ns 34,802.99 ns 0.98
Simd Vector 4096 rand 9,826.98 ns 255.002 ns 9,697.43 ns 0.27
Old Scalar 4194304 31 20,924,322.58 ns 885,923.919 ns 20,725,096.88 ns 1.00
Simd Scalar 4194304 31 8,891,290.06 ns 803,602.793 ns 8,789,537.50 ns 0.43
Old Vector 4194304 31 18,723,995.76 ns 302,360.122 ns 18,699,326.56 ns 0.85
Simd Vector 4194304 31 8,539,723.78 ns 939,235.097 ns 8,610,418.75 ns 0.41
Old Scalar 4194304 rand 59,366,704.76 ns 1,862,838.139 ns 58,912,728.57 ns 1.00
Simd Scalar 4194304 rand 17,214,168.34 ns 429,513.626 ns 17,287,740.62 ns 0.29
Old Vector 4194304 rand 61,629,793.10 ns 1,789,614.370 ns 61,389,257.14 ns 1.04
Simd Vector 4194304 rand 11,208,054.05 ns 686,486.722 ns 11,094,960.94 ns 0.19
long[] serialize benchmark report (x2 in .NET8, x2 in .NET6)
Method Job Setting Mean StdDev Median Ratio
Old Scalar 0 65.14 ns 0.779 ns 65.05 ns 1.00
Simd Scalar 0 67.13 ns 1.508 ns 67.85 ns 1.03
Old Vector 0 60.80 ns 0.534 ns 60.76 ns 0.93
Simd Vector 0 62.80 ns 2.018 ns 62.30 ns 0.97
Old Scalar 1 127 67.11 ns 1.794 ns 67.05 ns 1.00
Simd Scalar 1 127 71.27 ns 2.219 ns 70.98 ns 1.06
Old Vector 1 127 69.55 ns 4.333 ns 68.52 ns 1.05
Simd Vector 1 127 67.91 ns 2.616 ns 67.65 ns 1.01
Old Scalar 1 rand 74.93 ns 3.566 ns 74.71 ns 1.00
Simd Scalar 1 rand 70.71 ns 1.879 ns 71.02 ns 0.91
Old Vector 1 rand 70.88 ns 1.408 ns 71.30 ns 0.91
Simd Vector 1 rand 69.02 ns 1.387 ns 68.60 ns 0.89
Old Scalar 3 127 80.69 ns 5.101 ns 80.02 ns 1.00
Simd Scalar 3 127 73.55 ns 2.445 ns 73.38 ns 0.89
Old Vector 3 127 79.06 ns 3.026 ns 79.39 ns 0.95
Simd Vector 3 127 73.62 ns 2.682 ns 73.46 ns 0.89
Old Scalar 3 rand 88.70 ns 4.579 ns 87.86 ns 1.00
Simd Scalar 3 rand 74.69 ns 1.044 ns 74.86 ns 0.81
Old Vector 3 rand 90.10 ns 4.138 ns 89.13 ns 1.01
Simd Vector 3 rand 77.39 ns 1.371 ns 76.88 ns 0.85
Old Scalar 8 127 105.12 ns 4.543 ns 105.03 ns 1.00
Simd Scalar 8 127 90.91 ns 4.111 ns 89.69 ns 0.87
Old Vector 8 127 109.71 ns 4.468 ns 108.55 ns 1.04
Simd Vector 8 127 82.23 ns 1.997 ns 82.70 ns 0.77
Old Scalar 8 rand 131.81 ns 4.521 ns 132.14 ns 1.00
Simd Scalar 8 rand 89.71 ns 3.512 ns 88.82 ns 0.69
Old Vector 8 rand 126.74 ns 2.146 ns 126.39 ns 0.97
Simd Vector 8 rand 96.39 ns 3.122 ns 97.07 ns 0.73
Old Scalar 16 -31 119.16 ns 5.211 ns 117.72 ns 1.00
Simd Scalar 16 -31 101.12 ns 4.633 ns 99.75 ns 0.85
Old Vector 16 -31 122.22 ns 2.948 ns 121.70 ns 1.05
Simd Vector 16 -31 93.51 ns 2.921 ns 93.06 ns 0.79
Old Scalar 16 rand 193.53 ns 5.188 ns 191.86 ns 1.00
Simd Scalar 16 rand 111.37 ns 3.949 ns 110.87 ns 0.59
Old Vector 16 rand 185.76 ns 3.899 ns 185.50 ns 0.96
Simd Vector 16 rand 115.84 ns 2.542 ns 116.47 ns 0.60
Old Scalar 31 -31 183.63 ns 4.232 ns 184.79 ns 1.00
Simd Scalar 31 -31 210.11 ns 10.421 ns 209.24 ns 1.18
Old Vector 31 -31 193.79 ns 6.432 ns 194.89 ns 1.06
Simd Vector 31 -31 183.49 ns 1.166 ns 183.49 ns 1.00
Old Scalar 31 rand 358.47 ns 11.899 ns 353.05 ns 1.00
Simd Scalar 31 rand 214.19 ns 4.485 ns 215.47 ns 0.59
Old Vector 31 rand 364.41 ns 9.719 ns 360.97 ns 1.01
Simd Vector 31 rand 211.80 ns 1.099 ns 211.63 ns 0.58
Old Scalar 64 -31 310.49 ns 14.622 ns 309.87 ns 1.00
Simd Scalar 64 -31 293.84 ns 6.228 ns 292.89 ns 0.93
Old Vector 64 -31 324.64 ns 12.090 ns 323.19 ns 1.03
Simd Vector 64 -31 249.23 ns 8.075 ns 249.50 ns 0.79
Old Scalar 64 rand 726.27 ns 31.170 ns 718.68 ns 1.00
Simd Scalar 64 rand 329.11 ns 16.282 ns 325.52 ns 0.46
Old Vector 64 rand 754.85 ns 28.949 ns 754.73 ns 1.04
Simd Vector 64 rand 348.46 ns 24.205 ns 341.49 ns 0.48
Old Scalar 4096 -16 15,945.08 ns 530.427 ns 16,127.85 ns 1.00
Simd Scalar 4096 -16 10,063.09 ns 196.491 ns 9,986.64 ns 0.62
Old Vector 4096 -16 16,260.18 ns 88.755 ns 16,286.49 ns 1.02
Simd Vector 4096 -16 8,473.19 ns 36.542 ns 8,471.42 ns 0.53
Old Scalar 4096 rand 58,967.96 ns 1,895.688 ns 59,177.31 ns 1.00
Simd Scalar 4096 rand 24,257.35 ns 444.915 ns 24,158.14 ns 0.41
Old Vector 4096 rand 58,805.08 ns 1,639.041 ns 58,303.19 ns 1.00
Simd Vector 4096 rand 25,285.50 ns 537.606 ns 25,331.67 ns 0.43
Old Scalar 4194304 9 26,224,884.01 ns 622,862.646 ns 26,163,120.31 ns 1.00
Simd Scalar 4194304 9 15,105,698.04 ns 490,680.861 ns 15,231,606.25 ns 0.57
Old Vector 4194304 9 29,538,393.29 ns 822,473.356 ns 29,384,978.12 ns 1.13
Simd Vector 4194304 9 9,891,331.49 ns 122,261.393 ns 9,894,168.75 ns 0.37
Old Scalar 4194304 rand 72,679,997.35 ns 1,981,812.668 ns 72,161,114.29 ns 1.00
Simd Scalar 4194304 rand 30,349,787.27 ns 1,022,468.046 ns 29,938,098.44 ns 0.42
Old Vector 4194304 rand 74,368,988.78 ns 1,237,589.060 ns 74,064,357.14 ns 1.01
Simd Vector 4194304 rand 39,092,093.75 ns 1,107,437.984 ns 38,719,569.23 ns 0.54
float[] serialize benchmark report (x5 in .NET8, x2.5 in .NET6)
Method Job Size Mean StdDev Median Ratio
Old Scalar 0 65.29 ns 1.106 ns 65.01 ns 1.00
Simd Scalar 0 67.84 ns 1.488 ns 67.91 ns 1.04
Old Vector 0 61.63 ns 0.807 ns 61.62 ns 0.94
Simd Vector 0 62.74 ns 3.124 ns 61.61 ns 1.01
Old Scalar 1 67.32 ns 1.429 ns 67.74 ns 1.00
Simd Scalar 1 66.95 ns 1.394 ns 66.79 ns 1.00
Old Vector 1 68.80 ns 2.337 ns 69.06 ns 1.05
Simd Vector 1 68.21 ns 1.656 ns 68.60 ns 1.02
Old Scalar 3 76.84 ns 1.506 ns 77.11 ns 1.00
Simd Scalar 3 67.47 ns 1.463 ns 66.64 ns 0.88
Old Vector 3 80.08 ns 1.519 ns 79.82 ns 1.04
Simd Vector 3 67.39 ns 1.354 ns 66.80 ns 0.88
Old Scalar 4 79.94 ns 1.249 ns 79.72 ns 1.00
Simd Scalar 4 77.36 ns 1.578 ns 77.16 ns 0.97
Old Vector 4 82.58 ns 3.076 ns 82.64 ns 1.06
Simd Vector 4 73.17 ns 1.569 ns 73.62 ns 0.91
Old Scalar 8 100.65 ns 4.886 ns 99.61 ns 1.00
Simd Scalar 8 83.09 ns 7.510 ns 82.33 ns 0.84
Old Vector 8 99.78 ns 3.235 ns 100.27 ns 0.97
Simd Vector 8 72.96 ns 2.312 ns 72.49 ns 0.71
Old Scalar 64 407.14 ns 9.051 ns 412.23 ns 1.00
Simd Scalar 64 237.34 ns 4.365 ns 238.57 ns 0.58
Old Vector 64 401.02 ns 14.873 ns 401.86 ns 1.00
Simd Vector 64 161.22 ns 6.532 ns 159.18 ns 0.41
Old Scalar 1024 5,354.95 ns 136.013 ns 5,275.91 ns 1.00
Simd Scalar 1024 2,063.34 ns 32.061 ns 2,061.02 ns 0.38
Old Vector 1024 5,418.25 ns 151.703 ns 5,321.56 ns 1.01
Simd Vector 1024 982.35 ns 15.404 ns 981.75 ns 0.18
Old Scalar 16777216 107,249,637.78 ns 2,142,287.091 ns 106,914,080.00 ns 1.00
Simd Scalar 16777216 42,108,634.62 ns 389,497.133 ns 42,118,542.31 ns 0.39
Old Vector 16777216 108,293,109.33 ns 1,545,091.859 ns 108,131,660.00 ns 1.01
Simd Vector 16777216 20,572,228.12 ns 856,621.349 ns 20,202,537.50 ns 0.20
double[] serialize benchmark report (x5 in .NET8, x1.5 in .NET6)
Method Job Size Mean StdDev Median Ratio
Old Scalar 0 63.02 ns 0.783 ns 62.88 ns 1.00
Simd Scalar 0 64.00 ns 3.081 ns 62.90 ns 1.00
Old Vector 0 68.16 ns 3.732 ns 68.14 ns 1.08
Simd Vector 0 65.33 ns 3.648 ns 63.77 ns 1.10
Old Scalar 1 71.29 ns 0.942 ns 71.25 ns 1.00
Simd Scalar 1 71.02 ns 2.641 ns 70.12 ns 1.00
Old Vector 1 70.21 ns 3.544 ns 70.66 ns 0.99
Simd Vector 1 79.41 ns 1.521 ns 78.95 ns 1.11
Old Scalar 3 84.85 ns 1.732 ns 84.93 ns 1.00
Simd Scalar 3 77.41 ns 1.028 ns 77.49 ns 0.91
Old Vector 3 83.16 ns 3.105 ns 83.12 ns 0.96
Simd Vector 3 79.81 ns 4.371 ns 78.93 ns 0.99
Old Scalar 8 104.12 ns 3.483 ns 104.40 ns 1.00
Simd Scalar 8 91.48 ns 2.383 ns 91.26 ns 0.88
Old Vector 8 104.90 ns 5.165 ns 104.05 ns 1.04
Simd Vector 8 74.33 ns 1.724 ns 74.09 ns 0.72
Old Scalar 31 294.80 ns 7.931 ns 291.04 ns 1.00
Simd Scalar 31 217.60 ns 6.786 ns 217.12 ns 0.74
Old Vector 31 297.14 ns 10.987 ns 296.20 ns 1.02
Simd Vector 31 161.44 ns 11.067 ns 158.88 ns 0.55
Old Scalar 64 549.67 ns 15.437 ns 541.92 ns 1.00
Simd Scalar 64 320.96 ns 3.973 ns 318.95 ns 0.58
Old Vector 64 518.35 ns 1.928 ns 517.71 ns 0.94
Simd Vector 64 193.36 ns 14.968 ns 189.87 ns 0.36
Old Scalar 1024 7,452.45 ns 72.863 ns 7,422.81 ns 1.00
Simd Scalar 1024 3,573.68 ns 144.283 ns 3,498.03 ns 0.50
Old Vector 1024 6,989.63 ns 329.541 ns 6,970.11 ns 0.98
Simd Vector 1024 1,355.27 ns 60.143 ns 1,344.75 ns 0.18
Old Scalar 16777216 145,684,971.79 ns 1,701,417.374 ns 144,984,033.33 ns 1.00
Simd Scalar 16777216 64,169,903.06 ns 567,589.707 ns 64,013,157.14 ns 0.44
Old Vector 16777216 143,560,861.67 ns 592,202.425 ns 143,690,375.00 ns 0.99
Simd Vector 16777216 31,999,434.67 ns 718,275.640 ns 31,886,793.33 ns 0.22
bool[] deserialize benchmark report (x16~33 in .NET8, x1.4 in .NET6)
Method Job Setting Mean StdDev Median Ratio
Old Scalar 0 25.75 ns 0.079 ns 25.75 ns 1.00
Simd Scalar 0 27.06 ns 0.139 ns 27.03 ns 1.05
Old Vector 0 22.55 ns 0.151 ns 22.53 ns 0.87
Simd Vector 0 24.22 ns 0.283 ns 24.20 ns 0.94
Old Scalar 1 false 34.80 ns 1.171 ns 34.51 ns 1.00
Simd Scalar 1 false 89.14 ns 0.996 ns 88.75 ns 2.52
Old Vector 1 false 32.52 ns 0.697 ns 32.28 ns 0.92
Simd Vector 1 false 87.52 ns 1.975 ns 86.17 ns 2.49
Old Scalar 1 rand 33.22 ns 0.523 ns 33.03 ns 1.00
Simd Scalar 1 rand 88.22 ns 1.864 ns 88.64 ns 2.65
Old Vector 1 rand 33.02 ns 0.516 ns 32.98 ns 0.99
Simd Vector 1 rand 92.95 ns 1.808 ns 92.62 ns 2.80
Old Scalar 1 true 32.80 ns 0.154 ns 32.80 ns 1.00
Simd Scalar 1 true 90.92 ns 1.536 ns 90.92 ns 2.78
Old Vector 1 true 33.37 ns 0.867 ns 32.98 ns 1.03
Simd Vector 1 true 87.48 ns 1.394 ns 86.68 ns 2.67
Old Scalar 3 false 39.15 ns 0.395 ns 39.08 ns 1.00
Simd Scalar 3 false 89.26 ns 1.706 ns 88.51 ns 2.28
Old Vector 3 false 38.24 ns 0.819 ns 37.79 ns 0.97
Simd Vector 3 false 87.62 ns 0.393 ns 87.61 ns 2.24
Old Scalar 3 rand 40.47 ns 0.984 ns 40.66 ns 1.00
Simd Scalar 3 rand 89.56 ns 2.105 ns 88.76 ns 2.22
Old Vector 3 rand 38.41 ns 0.196 ns 38.38 ns 0.96
Simd Vector 3 rand 88.60 ns 1.938 ns 87.92 ns 2.20
Old Scalar 3 true 39.53 ns 0.406 ns 39.44 ns 1.00
Simd Scalar 3 true 88.69 ns 0.628 ns 88.49 ns 2.24
Old Vector 3 true 38.65 ns 1.042 ns 38.07 ns 0.99
Simd Vector 3 true 88.44 ns 0.763 ns 88.33 ns 2.24
Old Scalar 8 rand 53.93 ns 0.743 ns 53.65 ns 1.00
Simd Scalar 8 rand 93.95 ns 0.410 ns 93.90 ns 1.74
Old Vector 8 rand 54.66 ns 1.374 ns 54.17 ns 1.02
Simd Vector 8 rand 93.83 ns 0.992 ns 93.60 ns 1.74
Old Scalar 16 rand 78.94 ns 2.150 ns 77.75 ns 1.00
Simd Scalar 16 rand 108.15 ns 2.663 ns 108.91 ns 1.37
Old Vector 16 rand 84.49 ns 1.136 ns 84.53 ns 1.07
Simd Vector 16 rand 93.82 ns 1.916 ns 94.86 ns 1.19
Old Scalar 31 rand 118.80 ns 0.553 ns 118.86 ns 1.00
Simd Scalar 31 rand 123.83 ns 3.370 ns 122.92 ns 1.03
Old Vector 31 rand 125.06 ns 3.864 ns 123.73 ns 1.05
Simd Vector 31 rand 107.22 ns 2.077 ns 106.96 ns 0.90
Old Scalar 64 rand 209.22 ns 1.002 ns 208.96 ns 1.00
Simd Scalar 64 rand 156.62 ns 0.726 ns 156.72 ns 0.75
Old Vector 64 rand 215.79 ns 5.461 ns 212.97 ns 1.03
Simd Vector 64 rand 94.86 ns 0.498 ns 94.83 ns 0.45
Old Scalar 4096 rand 23,111.92 ns 281.315 ns 23,146.40 ns 1.00
Simd Scalar 4096 rand 15,517.09 ns 61.212 ns 15,505.02 ns 0.67
Old Vector 4096 rand 27,138.43 ns 289.677 ns 27,133.66 ns 1.17
Simd Vector 4096 rand 609.14 ns 25.750 ns 600.60 ns 0.03
Old Scalar 4194304 rand 27,760,274.52 ns 168,809.463 ns 27,764,009.38 ns 1.00
Simd Scalar 4194304 rand 17,409,503.79 ns 124,534.159 ns 17,380,539.06 ns 0.63
Old Vector 4194304 rand 27,986,859.38 ns 237,050.588 ns 27,966,932.81 ns 1.01
Simd Vector 4194304 rand 1,640,623.92 ns 11,880.757 ns 1,638,826.56 ns 0.06
bool[] serialize benchmark report (x10~50 in .NET8, x1.5 in .NET6)
Method Job Setting Mean StdDev Ratio
Old Scalar 0 66.09 ns 1.211 ns 1.00
Simd Scalar 0 65.50 ns 1.037 ns 0.99
Old Vector 0 67.17 ns 0.834 ns 1.02
Simd Vector 0 61.21 ns 0.749 ns 0.93
Old Scalar 1 false 65.33 ns 1.510 ns 1.00
Simd Scalar 1 false 66.50 ns 0.837 ns 1.01
Old Vector 1 false 67.33 ns 0.958 ns 1.03
Simd Vector 1 false 65.64 ns 0.744 ns 1.00
Old Scalar 1 rand 67.14 ns 1.156 ns 1.00
Simd Scalar 1 rand 65.45 ns 0.275 ns 0.97
Old Vector 1 rand 64.25 ns 0.766 ns 0.96
Simd Vector 1 rand 68.19 ns 1.162 ns 1.02
Old Scalar 1 true 64.37 ns 1.679 ns 1.00
Simd Scalar 1 true 63.17 ns 0.795 ns 0.98
Old Vector 1 true 64.01 ns 0.415 ns 1.00
Simd Vector 1 true 63.61 ns 1.138 ns 0.99
Old Scalar 3 false 72.83 ns 0.815 ns 1.00
Simd Scalar 3 false 65.13 ns 1.238 ns 0.89
Old Vector 3 false 74.07 ns 2.977 ns 1.01
Simd Vector 3 false 66.03 ns 2.226 ns 0.89
Old Scalar 3 rand 70.91 ns 1.365 ns 1.00
Simd Scalar 3 rand 67.78 ns 2.794 ns 0.98
Old Vector 3 rand 76.65 ns 2.856 ns 1.09
Simd Vector 3 rand 66.17 ns 1.503 ns 0.93
Old Scalar 3 true 72.37 ns 1.572 ns 1.00
Simd Scalar 3 true 68.74 ns 1.494 ns 0.95
Old Vector 3 true 71.01 ns 1.309 ns 0.98
Simd Vector 3 true 64.31 ns 0.835 ns 0.89
Old Scalar 8 rand 91.16 ns 3.229 ns 1.00
Simd Scalar 8 rand 67.25 ns 0.322 ns 0.72
Old Vector 8 rand 91.80 ns 1.789 ns 0.98
Simd Vector 8 rand 77.76 ns 1.369 ns 0.83
Old Scalar 16 rand 112.32 ns 1.439 ns 1.00
Simd Scalar 16 rand 80.90 ns 0.899 ns 0.72
Old Vector 16 rand 110.83 ns 0.413 ns 0.99
Simd Vector 16 rand 66.99 ns 0.629 ns 0.60
Old Scalar 31 rand 156.33 ns 1.040 ns 1.00
Simd Scalar 31 rand 103.41 ns 1.167 ns 0.66
Old Vector 31 rand 155.98 ns 3.542 ns 0.99
Simd Vector 31 rand 83.68 ns 1.755 ns 0.53
Old Scalar 64 rand 264.60 ns 6.322 ns 1.00
Simd Scalar 64 rand 141.86 ns 2.149 ns 0.54
Old Vector 64 rand 246.02 ns 1.221 ns 0.94
Simd Vector 64 rand 73.41 ns 1.285 ns 0.28
Old Scalar 4096 rand 25,193.17 ns 363.356 ns 1.00
Simd Scalar 4096 rand 16,983.91 ns 331.965 ns 0.67
Old Vector 4096 rand 26,103.29 ns 352.613 ns 1.04
Simd Vector 4096 rand 515.27 ns 4.805 ns 0.02
Old Scalar 4194304 rand 28,274,955.71 ns 333,932.626 ns 1.00
Simd Scalar 4194304 rand 17,967,366.74 ns 69,513.126 ns 0.64
Old Vector 4194304 rand 29,510,686.78 ns 218,391.697 ns 1.04
Simd Vector 4194304 rand 1,882,470.01 ns 34,520.150 ns 0.07
short[] serialize benchmark report (x1.5~10 in .NET8, x2 in .NET6)
Method Job Setting Mean StdDev Median Ratio
Old Scalar 0 64.61 ns 1.384 ns 64.13 ns 1.00
Simd Scalar 0 64.61 ns 2.545 ns 64.15 ns 1.02
Old Vector 0 70.11 ns 8.139 ns 67.58 ns 1.16
Simd Vector 0 65.30 ns 1.888 ns 65.41 ns 1.02
Old Scalar 1 -135 66.73 ns 1.463 ns 66.46 ns 1.00
Simd Scalar 1 -135 68.50 ns 1.420 ns 68.59 ns 1.03
Old Vector 1 -135 67.82 ns 2.208 ns 67.15 ns 1.02
Simd Vector 1 -135 69.37 ns 3.218 ns 69.09 ns 1.04
Old Scalar 1 127 70.01 ns 3.687 ns 69.53 ns 1.00
Simd Scalar 1 127 66.38 ns 2.133 ns 65.78 ns 0.94
Old Vector 1 127 68.21 ns 3.628 ns 68.24 ns 0.98
Simd Vector 1 127 71.52 ns 2.750 ns 71.79 ns 1.02
Old Scalar 1 rand 68.94 ns 2.990 ns 68.58 ns 1.00
Simd Scalar 1 rand 64.38 ns 0.520 ns 64.24 ns 0.89
Old Vector 1 rand 67.15 ns 2.866 ns 65.36 ns 0.98
Simd Vector 1 rand 66.85 ns 1.006 ns 67.00 ns 0.92
Old Scalar 16 -135 133.87 ns 1.937 ns 134.42 ns 1.00
Simd Scalar 16 -135 95.07 ns 3.287 ns 94.19 ns 0.73
Old Vector 16 -135 142.97 ns 5.428 ns 139.88 ns 1.09
Simd Vector 16 -135 100.11 ns 3.995 ns 99.47 ns 0.78
Old Scalar 16 127 138.99 ns 4.343 ns 137.01 ns 1.00
Simd Scalar 16 127 112.29 ns 2.009 ns 112.56 ns 0.81
Old Vector 16 127 134.17 ns 3.990 ns 135.04 ns 0.96
Simd Vector 16 127 73.96 ns 1.730 ns 74.09 ns 0.54
Old Scalar 16 rand 132.50 ns 4.174 ns 130.98 ns 1.00
Simd Scalar 16 rand 100.34 ns 3.669 ns 99.47 ns 0.77
Old Vector 16 rand 145.16 ns 3.897 ns 146.23 ns 1.09
Simd Vector 16 rand 100.83 ns 1.811 ns 101.21 ns 0.76
Old Scalar 3 -135 75.55 ns 2.722 ns 75.44 ns 1.00
Simd Scalar 3 -135 73.09 ns 4.271 ns 72.35 ns 1.00
Old Vector 3 -135 77.18 ns 2.631 ns 77.14 ns 1.02
Simd Vector 3 -135 71.32 ns 2.325 ns 71.04 ns 0.95
Old Scalar 3 127 78.22 ns 4.721 ns 77.05 ns 1.00
Simd Scalar 3 127 75.36 ns 1.798 ns 75.42 ns 0.94
Old Vector 3 127 79.17 ns 1.680 ns 79.16 ns 1.00
Simd Vector 3 127 72.47 ns 1.114 ns 72.33 ns 0.93
Old Scalar 3 rand 74.88 ns 1.746 ns 74.20 ns 1.00
Simd Scalar 3 rand 69.90 ns 2.844 ns 69.62 ns 0.95
Old Vector 3 rand 79.11 ns 1.322 ns 78.64 ns 1.06
Simd Vector 3 rand 71.01 ns 1.411 ns 71.15 ns 0.95
Old Scalar 31 -135 193.71 ns 4.543 ns 195.36 ns 1.00
Simd Scalar 31 -135 120.04 ns 2.101 ns 119.40 ns 0.62
Old Vector 31 -135 205.53 ns 4.701 ns 203.45 ns 1.06
Simd Vector 31 -135 123.31 ns 0.545 ns 123.18 ns 0.63
Old Scalar 8 -135 90.47 ns 1.790 ns 89.38 ns 1.00
Simd Scalar 8 -135 78.54 ns 1.401 ns 78.03 ns 0.87
Old Vector 8 -135 96.83 ns 2.601 ns 96.91 ns 1.06
Simd Vector 8 -135 79.13 ns 1.795 ns 77.98 ns 0.88
Old Scalar 8 127 101.69 ns 6.660 ns 99.46 ns 1.00
Simd Scalar 8 127 96.09 ns 6.373 ns 94.53 ns 0.95
Old Vector 8 127 102.99 ns 3.771 ns 102.01 ns 1.02
Simd Vector 8 127 73.59 ns 2.666 ns 73.31 ns 0.73
Old Scalar 8 rand 103.97 ns 3.321 ns 104.48 ns 1.00
Simd Scalar 8 rand 82.86 ns 1.501 ns 82.59 ns 0.80
Old Vector 8 rand 108.49 ns 3.400 ns 108.75 ns 1.05
Simd Vector 8 rand 85.37 ns 3.038 ns 85.61 ns 0.82
Old Scalar 31 127 210.06 ns 4.930 ns 213.30 ns 1.00
Simd Scalar 31 127 148.37 ns 0.948 ns 148.15 ns 0.71
Old Vector 31 127 211.06 ns 4.719 ns 209.58 ns 1.01
Simd Vector 31 127 85.09 ns 1.971 ns 85.97 ns 0.41
Old Scalar 31 rand 201.06 ns 6.534 ns 200.35 ns 1.00
Simd Scalar 31 rand 133.03 ns 3.310 ns 132.84 ns 0.66
Old Vector 31 rand 208.15 ns 1.368 ns 208.51 ns 1.04
Simd Vector 31 rand 124.65 ns 2.458 ns 125.64 ns 0.62
Old Scalar 64 -135 330.62 ns 8.218 ns 335.51 ns 1.00
Simd Scalar 64 -135 176.57 ns 0.642 ns 176.71 ns 0.54
Old Vector 64 -135 352.52 ns 8.173 ns 348.11 ns 1.07
Simd Vector 64 -135 204.00 ns 4.785 ns 207.07 ns 0.62
Old Scalar 64 127 360.84 ns 1.471 ns 361.17 ns 1.00
Simd Scalar 64 127 229.14 ns 0.951 ns 229.16 ns 0.64
Old Vector 64 127 338.87 ns 2.188 ns 339.29 ns 0.94
Simd Vector 64 127 80.83 ns 1.626 ns 81.53 ns 0.22
Old Scalar 64 rand 392.69 ns 10.484 ns 395.07 ns 1.00
Simd Scalar 64 rand 196.38 ns 2.057 ns 196.30 ns 0.50
Old Vector 64 rand 365.91 ns 2.169 ns 365.65 ns 0.94
Simd Vector 64 rand 176.67 ns 2.038 ns 176.35 ns 0.45
Old Scalar 4096 -135 20,294.82 ns 139.198 ns 20,258.78 ns 1.00
Simd Scalar 4096 -135 8,079.11 ns 538.255 ns 7,962.02 ns 0.38
Old Vector 4096 -135 21,663.18 ns 822.590 ns 21,590.36 ns 1.07
Simd Vector 4096 -135 9,771.83 ns 333.494 ns 9,747.51 ns 0.48
Old Scalar 4096 127 21,108.22 ns 451.334 ns 21,167.26 ns 1.00
Simd Scalar 4096 127 12,481.15 ns 937.653 ns 12,099.64 ns 0.62
Old Vector 4096 127 20,041.76 ns 429.274 ns 19,846.46 ns 0.95
Simd Vector 4096 127 1,636.11 ns 11.117 ns 1,637.51 ns 0.08
Old Scalar 4096 rand 37,403.94 ns 898.612 ns 37,031.23 ns 1.00
Simd Scalar 4096 rand 19,484.87 ns 217.592 ns 19,387.52 ns 0.52
Old Vector 4096 rand 37,422.98 ns 727.632 ns 37,567.51 ns 0.99
Simd Vector 4096 rand 18,162.88 ns 166.710 ns 18,162.37 ns 0.48
Old Scalar 4194304 -135 26,359,079.91 ns 161,380.166 ns 26,360,412.50 ns 1.00
Simd Scalar 4194304 -135 8,687,431.90 ns 247,095.263 ns 8,571,582.81 ns 0.33
Old Vector 4194304 -135 24,146,727.68 ns 150,344.737 ns 24,142,065.62 ns 0.92
Simd Vector 4194304 -135 10,578,086.76 ns 214,830.668 ns 10,618,515.62 ns 0.40
Old Scalar 4194304 127 22,197,904.02 ns 97,907.211 ns 22,207,804.69 ns 1.00
Simd Scalar 4194304 127 12,562,917.63 ns 55,109.149 ns 12,575,878.12 ns 0.57
Old Vector 4194304 127 20,752,435.94 ns 90,285.401 ns 20,746,096.88 ns 0.93
Simd Vector 4194304 127 2,432,535.46 ns 25,094.285 ns 2,435,466.80 ns 0.11
Old Scalar 4194304 rand 47,390,139.61 ns 253,250.702 ns 47,412,031.82 ns 1.00
Simd Scalar 4194304 rand 23,302,391.46 ns 102,415.604 ns 23,294,990.62 ns 0.49
Old Vector 4194304 rand 43,004,355.56 ns 129,764.309 ns 43,021,175.00 ns 0.91
Simd Vector 4194304 rand 31,204,950.83 ns 98,126.218 ns 31,191,275.00 ns 0.66
ushort[] serialize benchmark report (x3~4 in .NET8, x2.5 in .NET6)
Method Job Setting Mean StdDev Median Ratio
Old Scalar 0 72.25 ns 6.676 ns 69.45 ns 1.00
Simd Scalar 0 67.48 ns 1.321 ns 67.91 ns 0.86
Old Vector 0 68.35 ns 2.507 ns 67.92 ns 0.92
Simd Vector 0 67.72 ns 2.586 ns 67.20 ns 0.92
Old Scalar 1 127 72.67 ns 2.579 ns 72.78 ns 1.00
Simd Scalar 1 127 70.72 ns 2.366 ns 70.33 ns 0.98
Old Vector 1 127 75.06 ns 4.516 ns 74.69 ns 1.03
Simd Vector 1 127 73.75 ns 9.467 ns 70.01 ns 1.03
Old Scalar 1 rand 66.57 ns 0.274 ns 66.62 ns 1.00
Simd Scalar 1 rand 67.67 ns 2.395 ns 67.18 ns 1.01
Old Vector 1 rand 68.47 ns 2.058 ns 68.32 ns 1.02
Simd Vector 1 rand 78.82 ns 12.966 ns 72.96 ns 1.27
Old Scalar 3 127 74.12 ns 1.733 ns 73.47 ns 1.00
Simd Scalar 3 127 66.34 ns 1.916 ns 66.00 ns 0.90
Old Vector 3 127 73.32 ns 2.788 ns 72.28 ns 0.98
Simd Vector 3 127 71.19 ns 3.357 ns 71.47 ns 1.00
Old Scalar 3 rand 76.64 ns 1.334 ns 76.84 ns 1.00
Simd Scalar 3 rand 72.47 ns 1.001 ns 72.53 ns 0.95
Old Vector 3 rand 77.11 ns 0.947 ns 77.31 ns 1.01
Simd Vector 3 rand 66.31 ns 0.342 ns 66.33 ns 0.87
Old Scalar 8 127 90.91 ns 0.783 ns 91.18 ns 1.00
Simd Scalar 8 127 78.09 ns 6.350 ns 76.05 ns 0.84
Old Vector 8 127 93.89 ns 2.579 ns 94.00 ns 1.03
Simd Vector 8 127 65.78 ns 0.799 ns 65.45 ns 0.72
Old Scalar 8 rand 93.53 ns 3.722 ns 91.41 ns 1.00
Simd Scalar 8 rand 79.16 ns 1.150 ns 79.60 ns 0.81
Old Vector 8 rand 96.96 ns 3.195 ns 96.54 ns 1.03
Simd Vector 8 rand 77.28 ns 1.772 ns 76.41 ns 0.80
Old Scalar 16 127 118.20 ns 5.406 ns 116.01 ns 1.00
Simd Scalar 16 127 86.47 ns 3.351 ns 86.39 ns 0.73
Old Vector 16 127 113.57 ns 4.334 ns 112.24 ns 0.96
Simd Vector 16 127 70.08 ns 3.060 ns 69.81 ns 0.59
Old Scalar 16 rand 122.55 ns 0.340 ns 122.58 ns 1.00
Simd Scalar 16 rand 99.52 ns 1.963 ns 100.42 ns 0.81
Old Vector 16 rand 124.45 ns 1.813 ns 124.06 ns 1.02
Simd Vector 16 rand 86.30 ns 2.890 ns 86.16 ns 0.71
Old Scalar 31 127 167.72 ns 0.519 ns 167.72 ns 1.00
Simd Scalar 31 127 99.77 ns 3.243 ns 98.32 ns 0.60
Old Vector 31 127 162.53 ns 1.596 ns 162.27 ns 0.97
Simd Vector 31 127 85.32 ns 3.011 ns 84.15 ns 0.52
Old Scalar 31 rand 195.94 ns 1.191 ns 195.90 ns 1.00
Simd Scalar 31 rand 131.22 ns 2.449 ns 131.46 ns 0.67
Old Vector 31 rand 190.65 ns 4.240 ns 188.80 ns 0.98
Simd Vector 31 rand 105.07 ns 1.833 ns 104.49 ns 0.54
Old Scalar 64 127 282.00 ns 12.438 ns 281.36 ns 1.00
Simd Scalar 64 127 147.68 ns 4.057 ns 149.42 ns 0.52
Old Vector 64 127 258.36 ns 0.853 ns 258.02 ns 0.90
Simd Vector 64 127 84.26 ns 3.025 ns 83.92 ns 0.29
Old Scalar 64 rand 337.68 ns 23.001 ns 331.88 ns 1.00
Simd Scalar 64 rand 198.89 ns 3.668 ns 199.08 ns 0.60
Old Vector 64 rand 334.42 ns 14.122 ns 333.65 ns 0.99
Simd Vector 64 rand 163.36 ns 4.242 ns 162.51 ns 0.48
Old Scalar 4096 127 13,461.26 ns 304.231 ns 13,342.98 ns 1.00
Simd Scalar 4096 127 5,066.30 ns 140.879 ns 5,067.29 ns 0.38
Old Vector 4096 127 14,508.56 ns 413.951 ns 14,357.88 ns 1.09
Simd Vector 4096 127 1,595.70 ns 15.924 ns 1,594.44 ns 0.12
Old Scalar 4096 rand 18,181.25 ns 119.197 ns 18,167.88 ns 1.00
Simd Scalar 4096 rand 7,862.37 ns 90.874 ns 7,878.42 ns 0.43
Old Vector 4096 rand 17,981.42 ns 501.713 ns 17,740.45 ns 0.99
Simd Vector 4096 rand 6,857.95 ns 220.758 ns 6,888.23 ns 0.38
Old Scalar 4194304 127 14,398,654.57 ns 221,532.617 ns 14,320,076.56 ns 1.00
Simd Scalar 4194304 127 5,391,375.36 ns 76,973.979 ns 5,402,529.69 ns 0.37
Old Vector 4194304 127 14,705,850.22 ns 216,029.752 ns 14,642,123.44 ns 1.02
Simd Vector 4194304 127 2,343,825.33 ns 33,388.728 ns 2,338,436.52 ns 0.16
Old Scalar 4194304 rand 23,578,324.17 ns 154,616.629 ns 23,612,262.50 ns 1.00
Simd Scalar 4194304 rand 9,899,772.66 ns 223,303.114 ns 9,960,557.03 ns 0.42
Old Vector 4194304 rand 24,234,066.15 ns 140,033.602 ns 24,220,614.06 ns 1.03
Simd Vector 4194304 rand 8,106,887.99 ns 250,208.136 ns 8,089,319.53 ns 0.35
int[] serialize benchmark report (x2~4 in .NET8, x1.2~2 in .NET6)
Method Job Setting Mean StdDev Median Ratio
Old Scalar 0 72.38 ns 5.435 ns 72.92 ns 1.00
Simd Scalar 0 64.82 ns 4.431 ns 63.62 ns 0.90
Old Vector 0 62.19 ns 1.080 ns 61.59 ns 0.84
Simd Vector 0 64.69 ns 1.430 ns 65.14 ns 0.86
Old Scalar 1 0 68.79 ns 2.756 ns 68.55 ns 1.00
Simd Scalar 1 0 68.44 ns 2.526 ns 68.32 ns 1.00
Old Vector 1 0 69.85 ns 1.832 ns 70.53 ns 1.00
Simd Vector 1 0 70.45 ns 1.247 ns 70.11 ns 1.02
Old Scalar 1 rand 72.74 ns 1.546 ns 72.35 ns 1.00
Simd Scalar 1 rand 68.27 ns 2.461 ns 68.33 ns 0.95
Old Vector 1 rand 67.67 ns 0.419 ns 67.60 ns 0.93
Simd Vector 1 rand 71.02 ns 1.608 ns 71.97 ns 0.98
Old Scalar 3 0 72.58 ns 1.955 ns 73.53 ns 1.00
Simd Scalar 3 0 72.20 ns 1.387 ns 72.52 ns 1.01
Old Vector 3 0 74.33 ns 1.811 ns 74.53 ns 1.03
Simd Vector 3 0 72.71 ns 0.652 ns 72.76 ns 1.02
Old Scalar 3 rand 79.74 ns 1.254 ns 80.09 ns 1.00
Simd Scalar 3 rand 70.37 ns 2.447 ns 69.56 ns 0.91
Old Vector 3 rand 81.71 ns 2.313 ns 81.60 ns 1.03
Simd Vector 3 rand 74.47 ns 2.084 ns 73.76 ns 0.94
Old Scalar 8 0 88.73 ns 1.398 ns 88.19 ns 1.00
Simd Scalar 8 0 87.15 ns 0.853 ns 87.21 ns 0.98
Old Vector 8 0 92.26 ns 2.062 ns 93.42 ns 1.04
Simd Vector 8 0 73.18 ns 1.596 ns 73.97 ns 0.82
Old Scalar 8 rand 108.76 ns 1.834 ns 107.92 ns 1.00
Simd Scalar 8 rand 79.58 ns 0.816 ns 79.77 ns 0.73
Old Vector 8 rand 109.50 ns 2.334 ns 110.45 ns 1.01
Simd Vector 8 rand 79.48 ns 2.630 ns 79.30 ns 0.75
Old Scalar 16 0 123.26 ns 12.312 ns 119.60 ns 1.00
Simd Scalar 16 0 123.52 ns 9.134 ns 120.87 ns 1.01
Old Vector 16 0 118.29 ns 3.384 ns 118.66 ns 0.96
Simd Vector 16 0 80.31 ns 4.121 ns 78.75 ns 0.66
Old Scalar 16 rand 159.60 ns 3.899 ns 161.12 ns 1.00
Simd Scalar 16 rand 99.43 ns 2.889 ns 98.46 ns 0.62
Old Vector 16 rand 158.18 ns 3.358 ns 158.52 ns 0.99
Simd Vector 16 rand 99.26 ns 3.292 ns 97.76 ns 0.63
Old Scalar 31 0 174.78 ns 3.879 ns 176.60 ns 1.00
Simd Scalar 31 0 159.41 ns 3.227 ns 160.91 ns 0.91
Old Vector 31 0 170.26 ns 4.875 ns 166.97 ns 0.98
Simd Vector 31 0 92.87 ns 2.644 ns 93.10 ns 0.53
Old Scalar 31 rand 240.31 ns 0.475 ns 240.46 ns 1.00
Simd Scalar 31 rand 132.35 ns 1.504 ns 132.72 ns 0.55
Old Vector 31 rand 236.84 ns 4.784 ns 239.04 ns 0.98
Simd Vector 31 rand 134.48 ns 3.111 ns 135.22 ns 0.56
Old Scalar 64 0 270.40 ns 4.125 ns 269.25 ns 1.00
Simd Scalar 64 0 310.60 ns 3.357 ns 312.37 ns 1.15
Old Vector 64 0 265.18 ns 0.901 ns 265.20 ns 0.98
Simd Vector 64 0 159.55 ns 4.250 ns 161.41 ns 0.58
Old Scalar 64 rand 492.24 ns 9.885 ns 492.93 ns 1.00
Simd Scalar 64 rand 260.77 ns 4.923 ns 258.70 ns 0.53
Old Vector 64 rand 476.86 ns 8.921 ns 475.06 ns 0.97
Simd Vector 64 rand 246.95 ns 1.821 ns 246.90 ns 0.50
Old Scalar 4096 0 14,123.06 ns 198.615 ns 14,205.47 ns 1.00
Simd Scalar 4096 0 12,711.88 ns 109.318 ns 12,722.09 ns 0.90
Old Vector 4096 0 14,325.21 ns 78.716 ns 14,300.04 ns 1.02
Simd Vector 4096 0 3,664.15 ns 207.046 ns 3,650.62 ns 0.26
Old Scalar 4096 rand 43,052.00 ns 349.980 ns 43,013.18 ns 1.00
Simd Scalar 4096 rand 20,884.08 ns 114.496 ns 20,913.20 ns 0.49
Old Vector 4096 rand 43,162.41 ns 419.923 ns 43,254.36 ns 1.00
Simd Vector 4096 rand 19,376.65 ns 229.895 ns 19,338.94 ns 0.45
Old Scalar 4194304 0 16,461,541.04 ns 101,479.047 ns 16,462,490.62 ns 1.00
Simd Scalar 4194304 0 13,420,517.41 ns 43,725.764 ns 13,419,034.38 ns 0.82
Old Vector 4194304 0 17,115,307.08 ns 156,568.553 ns 17,096,718.75 ns 1.04
Simd Vector 4194304 0 3,940,153.75 ns 31,870.350 ns 3,928,339.84 ns 0.24
Old Scalar 4194304 rand 51,264,756.67 ns 224,423.250 ns 51,278,150.00 ns 1.00
Simd Scalar 4194304 rand 23,920,772.29 ns 127,033.715 ns 23,891,815.62 ns 0.47
Old Vector 4194304 rand 51,959,047.86 ns 390,922.613 ns 52,011,970.00 ns 1.01
Simd Vector 4194304 rand 26,619,539.58 ns 173,648.010 ns 26,613,143.75 ns 0.52
uint[] serialize benchmark report (x5 in .NET8, x4 in .NET6)
Method Job Setting Mean StdDev Median Ratio
Old Scalar 0 71.07 ns 6.917 ns 68.28 ns 1.00
Simd Scalar 0 69.88 ns 5.860 ns 68.77 ns 0.99
Old Vector 0 63.99 ns 4.202 ns 62.11 ns 0.91
Simd Vector 0 60.27 ns 0.482 ns 60.22 ns 0.79
Old Scalar 1 0 66.84 ns 1.444 ns 66.90 ns 1.00
Simd Scalar 1 0 70.91 ns 6.931 ns 67.63 ns 1.00
Old Vector 1 0 63.17 ns 1.176 ns 62.70 ns 0.94
Simd Vector 1 0 68.49 ns 0.542 ns 68.32 ns 1.02
Old Scalar 1 rand 75.20 ns 8.147 ns 72.24 ns 1.00
Simd Scalar 1 rand 73.46 ns 4.012 ns 73.68 ns 0.99
Old Vector 1 rand 71.14 ns 3.823 ns 69.74 ns 0.96
Simd Vector 1 rand 66.40 ns 1.422 ns 67.02 ns 0.94
Old Scalar 3 0 67.45 ns 0.252 ns 67.54 ns 1.00
Simd Scalar 3 0 69.09 ns 3.504 ns 68.27 ns 1.07
Old Vector 3 0 69.45 ns 0.428 ns 69.55 ns 1.03
Simd Vector 3 0 73.13 ns 1.306 ns 73.53 ns 1.08
Old Scalar 3 rand 80.54 ns 2.284 ns 80.12 ns 1.00
Simd Scalar 3 rand 70.81 ns 0.901 ns 70.95 ns 0.88
Old Vector 3 rand 79.26 ns 0.789 ns 79.47 ns 0.98
Simd Vector 3 rand 71.62 ns 1.072 ns 71.16 ns 0.89
Old Scalar 8 0 87.19 ns 2.646 ns 86.66 ns 1.00
Simd Scalar 8 0 78.42 ns 0.720 ns 78.43 ns 0.89
Old Vector 8 0 82.49 ns 0.339 ns 82.60 ns 0.94
Simd Vector 8 0 72.35 ns 1.862 ns 73.20 ns 0.83
Old Scalar 8 rand 101.42 ns 0.315 ns 101.30 ns 1.00
Simd Scalar 8 rand 83.64 ns 1.793 ns 84.24 ns 0.82
Old Vector 8 rand 116.78 ns 6.178 ns 117.01 ns 1.10
Simd Vector 8 rand 80.52 ns 0.969 ns 80.25 ns 0.79
Old Scalar 16 0 107.30 ns 2.669 ns 106.06 ns 1.00
Simd Scalar 16 0 96.36 ns 1.868 ns 97.19 ns 0.90
Old Vector 16 0 116.38 ns 10.049 ns 113.26 ns 1.02
Simd Vector 16 0 86.86 ns 13.831 ns 81.24 ns 0.79
Old Scalar 16 rand 162.54 ns 5.571 ns 162.74 ns 1.00
Simd Scalar 16 rand 98.33 ns 1.316 ns 98.52 ns 0.60
Old Vector 16 rand 159.31 ns 4.767 ns 161.31 ns 0.98
Simd Vector 16 rand 92.31 ns 3.571 ns 91.41 ns 0.57
Old Scalar 31 0 154.96 ns 1.814 ns 155.59 ns 1.00
Simd Scalar 31 0 124.53 ns 7.176 ns 122.50 ns 0.82
Old Vector 31 0 144.03 ns 0.559 ns 144.14 ns 0.93
Simd Vector 31 0 92.44 ns 6.237 ns 90.81 ns 0.60
Old Scalar 31 rand 290.31 ns 35.714 ns 281.16 ns 1.00
Simd Scalar 31 rand 147.18 ns 9.028 ns 147.64 ns 0.51
Old Vector 31 rand 255.14 ns 13.241 ns 249.52 ns 0.87
Simd Vector 31 rand 129.02 ns 9.668 ns 125.85 ns 0.45
Old Scalar 64 0 238.75 ns 0.787 ns 238.80 ns 1.00
Simd Scalar 64 0 244.79 ns 0.884 ns 244.86 ns 1.02
Old Vector 64 0 255.55 ns 13.088 ns 249.74 ns 1.06
Simd Vector 64 0 153.95 ns 1.030 ns 154.02 ns 0.64
Old Scalar 64 rand 623.41 ns 147.914 ns 555.81 ns 1.00
Simd Scalar 64 rand 261.52 ns 13.738 ns 257.17 ns 0.45
Old Vector 64 rand 475.65 ns 2.768 ns 476.25 ns 0.64
Simd Vector 64 rand 235.19 ns 5.529 ns 237.97 ns 0.36
Old Scalar 4096 0 12,902.83 ns 550.797 ns 12,852.14 ns 1.00
Simd Scalar 4096 0 8,910.17 ns 377.737 ns 8,839.02 ns 0.69
Old Vector 4096 0 13,878.94 ns 423.457 ns 13,852.32 ns 1.07
Simd Vector 4096 0 3,206.65 ns 201.270 ns 3,136.82 ns 0.25
Old Scalar 4096 rand 26,801.26 ns 500.940 ns 26,534.97 ns 1.00
Simd Scalar 4096 rand 9,005.84 ns 81.284 ns 9,001.29 ns 0.34
Old Vector 4096 rand 26,883.85 ns 679.605 ns 27,118.19 ns 1.00
Simd Vector 4096 rand 8,033.06 ns 332.967 ns 7,966.49 ns 0.30
Old Scalar 4194304 0 17,217,498.96 ns 439,483.873 ns 17,256,873.44 ns 1.00
Simd Scalar 4194304 0 8,864,271.80 ns 209,480.653 ns 8,814,285.16 ns 0.52
Old Vector 4194304 0 16,600,032.33 ns 187,641.176 ns 16,600,425.00 ns 0.96
Simd Vector 4194304 0 3,669,334.41 ns 168,573.082 ns 3,610,645.70 ns 0.22
Old Scalar 4194304 rand 38,645,037.10 ns 2,248,631.327 ns 37,898,330.00 ns 1.00
Simd Scalar 4194304 rand 10,557,864.84 ns 419,185.002 ns 10,363,848.44 ns 0.27
Old Vector 4194304 rand 36,893,078.57 ns 304,238.707 ns 36,926,864.29 ns 0.93
Simd Vector 4194304 rand 8,403,770.52 ns 131,016.996 ns 8,446,523.44 ns 0.21

All benchmarks are done.
Except long[], all serialize method show very good performance.

@pCYSl5EDgo pCYSl5EDgo changed the title .NET 8 Update: .NET 8 Update: Hardware Intrinsics Jan 24, 2024
@pCYSl5EDgo pCYSl5EDgo mentioned this pull request Jan 24, 2024
14 tasks
Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will prepare a clean commit log Pull Request when you give the go-ahead.

I'll wait for this.

@pCYSl5EDgo
Copy link
Contributor Author

pCYSl5EDgo commented Jan 25, 2024

I've rewrited formatters in C# 9 and moved them to in main MessagePack directory because this PR aims to improve default ArrayFormatters.

Unsafe.AsRef is silently changed from ref T AsRef<T> (scoped ref T source) to ref T AsRef<T> (scoped ref readonly T source). In .NET 8, it requires C#12.

I found this PR is dependent on #1734.

SIMD does not support integer division and 64bit integer multiplication
So DateTime serialization is very poor
@pCYSl5EDgo
Copy link
Contributor Author

This comment is written for later developers and describes why DateTimeArrayFormatter does not use SIMD.

DateTimeArrayFormatter is now added but this formatter does not utilize any SIMD functions at all since SIMD doesn't have any ability of integer div/mod operation.
In addition, until AVX512 (which can be used in Zen4 machine), there are api for no 64bit integer multiplication.

Reference SharpLab assembly code.

DateTime serialize without SIMD benchmark report (x2 in both)
Method Job Setting Mean StdDev Median Ratio
Old Scalar 0 75.78 ns 7.245 ns 74.02 ns 1.00
Simd Scalar 0 67.42 ns 2.760 ns 67.27 ns 0.88
Old Vector 0 66.37 ns 4.844 ns 64.94 ns 0.88
Simd Vector 0 63.57 ns 1.288 ns 63.37 ns 0.80
Old Scalar 1 rand 75.78 ns 2.390 ns 75.85 ns 1.00
Simd Scalar 1 rand 73.43 ns 3.087 ns 72.82 ns 0.97
Old Vector 1 rand 75.10 ns 1.925 ns 74.83 ns 0.98
Simd Vector 1 rand 95.68 ns 2.165 ns 95.50 ns 1.25
Old Scalar 1 utc 74.75 ns 1.509 ns 75.14 ns 1.00
Simd Scalar 1 utc 74.45 ns 1.426 ns 74.37 ns 1.00
Old Vector 1 utc 70.76 ns 1.946 ns 69.86 ns 0.95
Simd Vector 1 utc 71.09 ns 1.399 ns 70.95 ns 0.95
Old Scalar 3 rand 141.65 ns 19.168 ns 140.26 ns 1.00
Simd Scalar 3 rand 174.33 ns 11.602 ns 171.55 ns 1.24
Old Vector 3 rand 139.78 ns 4.395 ns 139.68 ns 1.01
Simd Vector 3 rand 148.41 ns 12.839 ns 144.56 ns 1.06
Old Scalar 3 utc 102.32 ns 11.200 ns 97.79 ns 1.00
Simd Scalar 3 utc 85.21 ns 5.066 ns 83.90 ns 0.84
Old Vector 3 utc 95.13 ns 6.573 ns 93.21 ns 0.94
Simd Vector 3 utc 81.06 ns 2.696 ns 81.77 ns 0.75
Old Scalar 8 rand 264.54 ns 13.817 ns 259.70 ns 1.00
Simd Scalar 8 rand 255.52 ns 9.032 ns 256.24 ns 0.95
Old Vector 8 rand 258.59 ns 8.122 ns 259.58 ns 0.95
Simd Vector 8 rand 182.73 ns 1.917 ns 182.52 ns 0.68
Old Scalar 8 utc 131.87 ns 2.304 ns 131.94 ns 1.00
Simd Scalar 8 utc 107.83 ns 2.043 ns 108.59 ns 0.82
Old Vector 8 utc 146.33 ns 3.718 ns 147.20 ns 1.10
Simd Vector 8 utc 106.37 ns 0.322 ns 106.37 ns 0.81
Old Scalar 16 rand 379.41 ns 7.893 ns 379.08 ns 1.00
Simd Scalar 16 rand 402.33 ns 14.135 ns 401.98 ns 1.06
Old Vector 16 rand 452.44 ns 11.477 ns 448.50 ns 1.19
Simd Vector 16 rand 336.69 ns 11.269 ns 334.30 ns 0.89
Old Scalar 16 utc 214.99 ns 6.397 ns 212.61 ns 1.00
Simd Scalar 16 utc 151.15 ns 4.020 ns 152.62 ns 0.70
Old Vector 16 utc 270.33 ns 36.246 ns 256.22 ns 1.25
Simd Vector 16 utc 157.51 ns 3.501 ns 156.49 ns 0.73
Old Scalar 31 rand 719.14 ns 3.350 ns 718.38 ns 1.00
Simd Scalar 31 rand 682.40 ns 6.484 ns 681.68 ns 0.95
Old Vector 31 rand 804.97 ns 3.056 ns 804.61 ns 1.12
Simd Vector 31 rand 778.17 ns 57.820 ns 765.90 ns 1.11
Old Scalar 31 utc 419.78 ns 6.257 ns 417.69 ns 1.00
Simd Scalar 31 utc 300.25 ns 14.616 ns 297.95 ns 0.76
Old Vector 31 utc 449.02 ns 12.661 ns 450.35 ns 1.07
Simd Vector 31 utc 307.18 ns 8.844 ns 304.63 ns 0.74
Old Scalar 64 rand 1,705.32 ns 59.621 ns 1,681.02 ns 1.00
Simd Scalar 64 rand 1,398.57 ns 29.381 ns 1,397.54 ns 0.81
Old Vector 64 rand 1,696.00 ns 34.492 ns 1,683.47 ns 0.98
Simd Vector 64 rand 1,077.99 ns 4.281 ns 1,078.17 ns 0.63
Old Scalar 64 utc 809.06 ns 18.949 ns 815.92 ns 1.00
Simd Scalar 64 utc 498.54 ns 21.903 ns 495.30 ns 0.63
Old Vector 64 utc 886.79 ns 12.981 ns 888.45 ns 1.09
Simd Vector 64 utc 564.98 ns 82.444 ns 533.36 ns 0.65
Old Scalar 4096 rand 125,547.24 ns 1,213.114 ns 125,271.29 ns 1.00
Simd Scalar 4096 rand 99,558.42 ns 1,035.725 ns 99,378.20 ns 0.79
Old Vector 4096 rand 122,202.52 ns 631.395 ns 122,173.18 ns 0.97
Simd Vector 4096 rand 100,926.67 ns 1,528.048 ns 100,785.58 ns 0.80
Old Scalar 4096 utc 62,545.58 ns 721.018 ns 62,293.51 ns 1.00
Simd Scalar 4096 utc 38,460.85 ns 1,240.379 ns 37,988.86 ns 0.63
Old Vector 4096 utc 65,894.73 ns 795.250 ns 65,698.10 ns 1.05
Simd Vector 4096 utc 35,719.59 ns 445.582 ns 35,522.50 ns 0.57
Old Scalar 4194304 rand 141,690,505.77 ns 1,478,730.033 ns 141,738,825.00 ns 1.00
Simd Scalar 4194304 rand 112,130,300.00 ns 1,334,293.852 ns 112,089,200.00 ns 0.79
Old Vector 4194304 rand 141,831,608.33 ns 381,558.216 ns 141,931,612.50 ns 1.00
Simd Vector 4194304 rand 113,438,212.00 ns 1,037,760.445 ns 113,063,760.00 ns 0.80
Old Scalar 4194304 utc 86,847,380.92 ns 4,397,553.959 ns 84,739,441.67 ns 1.00
Simd Scalar 4194304 utc 43,334,727.78 ns 867,568.775 ns 42,976,520.83 ns 0.50
Old Vector 4194304 utc 87,281,693.71 ns 2,851,344.404 ns 86,920,320.00 ns 0.99
Simd Vector 4194304 utc 44,852,673.57 ns 371,590.583 ns 44,774,720.00 ns 0.52

@AArnott
Copy link
Collaborator

AArnott commented Mar 31, 2024

@pCYSl5EDgo I haven't merged this as it's still marked Draft. I'm curious what your intention is for this PR going forward.

@pCYSl5EDgo
Copy link
Contributor Author

pCYSl5EDgo commented Apr 1, 2024

@AArnott
I close this draft PR and will make another PR after #1734.
This draft PR's main purpose is to show performance improvements.

@pCYSl5EDgo pCYSl5EDgo closed this Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants