Uniform sampling: use Canon's method #1287

dhardy · 2023-02-17T15:52:19Z

Closes #570, #1145, #1154, #1196, #1286. See also #1172 (TODO: SIMD), #494 (here we add "unbiased" feature flag).

Also implements PartialEq for all our Uniform impls and Eq for all but FP. See #1217.

Yet another PR to finally update Uniform integer sampling (maybe):

Uses Canon's method (up to two RNG samples) for distribution and single sampling
Adds an "unbiased" feature flag, which instead uses Lemire's method for distributions and Canon's methods with unlimited samples for single-sampling

Based on canon-uniform-benches branch, revised

Also: add "unbiased" feature flag

This is a small tweak unsupported by evidence, but brings SIMD in line with unbiased integer range sampling.

Allows simpler tests

Note: unbiased does pass current value-stability tests, but could fail extra ones in the future.

dhardy · 2023-02-17T16:04:13Z

Baseline results (new benchmark on master)


samplei8/SmallRng/single

time:   [1.9184 ns 1.9187 ns 1.9190 ns]

Found 20223 outliers among 100000 measurements (20.22%)

1018 (1.02%) low severe

16 (0.02%) low mild

158 (0.16%) high mild

19031 (19.03%) high severe

Benchmarking samplei8/SmallRng/distr: Warming up for 1.0000 s

Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.4s, enable flat sampling, or reduce sample count to 52520.

samplei8/SmallRng/distr time:   [1.1100 ns 1.1103 ns 1.1107 ns]

Found 13145 outliers among 100000 measurements (13.14%)

5635 (5.63%) low severe

423 (0.42%) low mild

2051 (2.05%) high mild

5036 (5.04%) high severe

samplei8/ChaCha8Rng/single

time:   [2.3286 ns 2.3305 ns 2.3324 ns]

Found 5234 outliers among 100000 measurements (5.23%)

1877 (1.88%) high mild

3357 (3.36%) high severe

samplei8/ChaCha8Rng/distr

time:   [1.7106 ns 1.7107 ns 1.7109 ns]

Found 4087 outliers among 100000 measurements (4.09%)

444 (0.44%) low mild

2617 (2.62%) high mild

1026 (1.03%) high severe

samplei8/Pcg32/single   time:   [1.6857 ns 1.6865 ns 1.6873 ns]

Found 6777 outliers among 100000 measurements (6.78%)

510 (0.51%) high mild

6267 (6.27%) high severe

samplei8/Pcg32/distr    time:   [1.2538 ns 1.2539 ns 1.2540 ns]

Found 27845 outliers among 100000 measurements (27.84%)

22197 (22.20%) low severe

47 (0.05%) low mild

7 (0.01%) high mild

5594 (5.59%) high severe

samplei8/Pcg64/single   time:   [2.1280 ns 2.1290 ns 2.1301 ns]

Found 22971 outliers among 100000 measurements (22.97%)

13429 (13.43%) low severe

5441 (5.44%) low mild

558 (0.56%) high mild

3543 (3.54%) high severe

samplei8/Pcg64/distr    time:   [1.4348 ns 1.4349 ns 1.4350 ns]

Found 40705 outliers among 100000 measurements (40.70%)

16283 (16.28%) low severe

24422 (24.42%) high severe
samplei16/SmallRng/single

time:   [1.9058 ns 1.9059 ns 1.9060 ns]

Found 2986 outliers among 100000 measurements (2.99%)

2986 (2.99%) high severe

Benchmarking samplei16/SmallRng/distr: Warming up for 1.0000 s

Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.4s, enable flat sampling, or reduce sample count to 52870.

samplei16/SmallRng/distr

time:   [1.0703 ns 1.0706 ns 1.0709 ns]

Found 986 outliers among 100000 measurements (0.99%)

444 (0.44%) high mild

542 (0.54%) high severe

samplei16/ChaCha8Rng/single

time:   [2.0399 ns 2.0404 ns 2.0409 ns]

Found 4574 outliers among 100000 measurements (4.57%)

138 (0.14%) low mild

65 (0.07%) high mild

4371 (4.37%) high severe

samplei16/ChaCha8Rng/distr

time:   [1.7870 ns 1.7874 ns 1.7877 ns]

Found 4997 outliers among 100000 measurements (5.00%)

1024 (1.02%) low mild

2289 (2.29%) high mild

1684 (1.68%) high severe

samplei16/Pcg32/single  time:   [1.6959 ns 1.6967 ns 1.6975 ns]

Found 2849 outliers among 100000 measurements (2.85%)

554 (0.55%) high mild

2295 (2.29%) high severe

samplei16/Pcg32/distr   time:   [1.2458 ns 1.2460 ns 1.2461 ns]

Found 4853 outliers among 100000 measurements (4.85%)

453 (0.45%) high mild

4400 (4.40%) high severe

samplei16/Pcg64/single  time:   [1.9074 ns 1.9076 ns 1.9078 ns]

Found 3419 outliers among 100000 measurements (3.42%)

14 (0.01%) low mild

1315 (1.31%) high mild

2090 (2.09%) high severe

samplei16/Pcg64/distr   time:   [1.4340 ns 1.4342 ns 1.4345 ns]

Found 34978 outliers among 100000 measurements (34.98%)

22432 (22.43%) low severe

12546 (12.55%) high severe
samplei32/SmallRng/single

time:   [4.9445 ns 4.9550 ns 4.9655 ns]

Found 1 outliers among 100000 measurements (0.00%)

1 (0.00%) high severe

Benchmarking samplei32/SmallRng/distr: Warming up for 1.0000 s

Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 50180.

samplei32/SmallRng/distr

time:   [1.8612 ns 1.8700 ns 1.8791 ns]

Found 8951 outliers among 100000 measurements (8.95%)

5188 (5.19%) high mild

3763 (3.76%) high severe

samplei32/ChaCha8Rng/single

time:   [5.8213 ns 5.8339 ns 5.8463 ns]

samplei32/ChaCha8Rng/distr

time:   [2.3517 ns 2.3605 ns 2.3697 ns]

Found 8845 outliers among 100000 measurements (8.85%)

5780 (5.78%) high mild

3065 (3.06%) high severe

samplei32/Pcg32/single  time:   [4.7795 ns 4.7899 ns 4.8005 ns]

Found 3 outliers among 100000 measurements (0.00%)

3 (0.00%) high severe

samplei32/Pcg32/distr   time:   [2.0956 ns 2.1029 ns 2.1099 ns]

Found 9127 outliers among 100000 measurements (9.13%)

5384 (5.38%) high mild

3743 (3.74%) high severe

samplei32/Pcg64/single  time:   [5.4698 ns 5.4821 ns 5.4942 ns]

Found 1 outliers among 100000 measurements (0.00%)

1 (0.00%) high mild

samplei32/Pcg64/distr   time:   [2.5071 ns 2.5155 ns 2.5239 ns]

Found 8743 outliers among 100000 measurements (8.74%)

5745 (5.75%) high mild

2998 (3.00%) high severe
samplei64/SmallRng/single

time:   [5.9268 ns 5.9361 ns 5.9454 ns]

Found 2 outliers among 100000 measurements (0.00%)

2 (0.00%) high mild

samplei64/SmallRng/distr

time:   [1.7516 ns 1.7579 ns 1.7644 ns]

Found 9262 outliers among 100000 measurements (9.26%)

5356 (5.36%) high mild

3906 (3.91%) high severe

samplei64/ChaCha8Rng/single

time:   [7.8579 ns 7.8709 ns 7.8840 ns]

Found 3 outliers among 100000 measurements (0.00%)

1 (0.00%) high mild

2 (0.00%) high severe

samplei64/ChaCha8Rng/distr

time:   [3.5666 ns 3.5778 ns 3.5892 ns]

Found 8734 outliers among 100000 measurements (8.73%)

6057 (6.06%) high mild

2677 (2.68%) high severe

samplei64/Pcg32/single  time:   [7.1110 ns 7.1241 ns 7.1368 ns]

samplei64/Pcg32/distr   time:   [2.9155 ns 2.9241 ns 2.9327 ns]

Found 9162 outliers among 100000 measurements (9.16%)

5522 (5.52%) high mild

3640 (3.64%) high severe

samplei64/Pcg64/single  time:   [6.6004 ns 6.6123 ns 6.6247 ns]

Found 62 outliers among 100000 measurements (0.06%)

62 (0.06%) high mild

samplei64/Pcg64/distr   time:   [2.5028 ns 2.5110 ns 2.5195 ns]

Found 9183 outliers among 100000 measurements (9.18%)

4994 (4.99%) high mild

4189 (4.19%) high severe
samplei128/SmallRng/single

time:   [11.482 ns 11.496 ns 11.510 ns]

Found 185 outliers among 100000 measurements (0.18%)

185 (0.18%) high mild

samplei128/SmallRng/distr

time:   [5.6678 ns 5.6780 ns 5.6879 ns]

Found 8484 outliers among 100000 measurements (8.48%)

7430 (7.43%) high mild

1054 (1.05%) high severe

samplei128/ChaCha8Rng/single

time:   [14.400 ns 14.419 ns 14.439 ns]

Found 1 outliers among 100000 measurements (0.00%)

1 (0.00%) high mild

samplei128/ChaCha8Rng/distr

time:   [7.8090 ns 7.8233 ns 7.8374 ns]

Found 8322 outliers among 100000 measurements (8.32%)

6595 (6.59%) high mild

1727 (1.73%) high severe

samplei128/Pcg32/single time:   [13.412 ns 13.430 ns 13.448 ns]

Found 15 outliers among 100000 measurements (0.01%)

15 (0.01%) high mild

samplei128/Pcg32/distr  time:   [7.1153 ns 7.1296 ns 7.1429 ns]

Found 8659 outliers among 100000 measurements (8.66%)

6130 (6.13%) high mild

2529 (2.53%) high severe

samplei128/Pcg64/single time:   [12.365 ns 12.382 ns 12.399 ns]

Found 1 outliers among 100000 measurements (0.00%)

1 (0.00%) high mild

samplei128/Pcg64/distr  time:   [6.5060 ns 6.5183 ns 6.5302 ns]

Found 8991 outliers among 100000 measurements (8.99%)

5736 (5.74%) high mild

3255 (3.25%) high severe

New results (compared to baseline)


samplei8/SmallRng/single

time:   [1.4978 ns 1.4982 ns 1.4986 ns]

change: [-21.946% -21.917% -21.892%] (p = 0.00 < 0.05)

Performance has improved.

Found 17805 outliers among 100000 measurements (17.80%)

1817 (1.82%) low severe

18 (0.02%) low mild

530 (0.53%) high mild

15440 (15.44%) high severe

samplei8/SmallRng/distr time:   [1.8966 ns 1.8971 ns 1.8977 ns]

change: [+70.464% +70.566% +70.650%] (p = 0.00 < 0.05)

Performance has regressed.

Found 15393 outliers among 100000 measurements (15.39%)

10589 (10.59%) low severe

33 (0.03%) low mild

6 (0.01%) high mild

4765 (4.76%) high severe

samplei8/ChaCha8Rng/single

time:   [2.0854 ns 2.0858 ns 2.0862 ns]

change: [-10.575% -10.497% -10.423%] (p = 0.00 < 0.05)

Performance has improved.

Found 593 outliers among 100000 measurements (0.59%)

4 (0.00%) low mild

474 (0.47%) high mild

115 (0.12%) high severe

samplei8/ChaCha8Rng/distr

time:   [2.6350 ns 2.6357 ns 2.6364 ns]

change: [+54.015% +54.066% +54.109%] (p = 0.00 < 0.05)

Performance has regressed.

Found 16017 outliers among 100000 measurements (16.02%)

1602 (1.60%) low mild

10054 (10.05%) high mild

4361 (4.36%) high severe

samplei8/Pcg32/single   time:   [1.4994 ns 1.5000 ns 1.5005 ns]

change: [-11.112% -11.060% -11.008%] (p = 0.00 < 0.05)

Performance has improved.

Found 34661 outliers among 100000 measurements (34.66%)

1486 (1.49%) low severe

20299 (20.30%) low mild

68 (0.07%) high mild

12808 (12.81%) high severe

Benchmarking samplei8/Pcg32/distr: Warming up for 1.0000 s

Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 53100.

samplei8/Pcg32/distr    time:   [1.0578 ns 1.0580 ns 1.0582 ns]

change: [-15.545% -15.496% -15.430%] (p = 0.00 < 0.05)

Performance has improved.

Found 25037 outliers among 100000 measurements (25.04%)

5278 (5.28%) low severe

113 (0.11%) low mild

2061 (2.06%) high mild

17585 (17.59%) high severe

samplei8/Pcg64/single   time:   [1.9000 ns 1.9004 ns 1.9009 ns]

change: [-10.787% -10.738% -10.688%] (p = 0.00 < 0.05)

Performance has improved.

Found 4558 outliers among 100000 measurements (4.56%)

343 (0.34%) high mild

4215 (4.21%) high severe

samplei8/Pcg64/distr    time:   [1.4648 ns 1.4649 ns 1.4651 ns]

change: [+2.0833% +2.0954% +2.1074%] (p = 0.00 < 0.05)

Performance has regressed.

Found 12686 outliers among 100000 measurements (12.69%)

7611 (7.61%) low severe

50 (0.05%) low mild

10 (0.01%) high mild

5015 (5.01%) high severe
samplei16/SmallRng/single

time:   [1.6958 ns 1.6961 ns 1.6963 ns]

change: [-11.022% -11.008% -10.994%] (p = 0.00 < 0.05)

Performance has improved.

Found 3109 outliers among 100000 measurements (3.11%)

2 (0.00%) high mild

3107 (3.11%) high severe

samplei16/SmallRng/distr

time:   [1.9426 ns 1.9432 ns 1.9437 ns]

change: [+81.022% +81.160% +81.268%] (p = 0.00 < 0.05)

Performance has regressed.

Found 30150 outliers among 100000 measurements (30.15%)

12940 (12.94%) low severe

17210 (17.21%) high severe

samplei16/ChaCha8Rng/single

time:   [1.8547 ns 1.8553 ns 1.8559 ns]

change: [-9.1088% -9.0724% -9.0349%] (p = 0.00 < 0.05)

Performance has improved.

Found 14926 outliers among 100000 measurements (14.93%)

8618 (8.62%) low mild

2771 (2.77%) high mild

3537 (3.54%) high severe

samplei16/ChaCha8Rng/distr

time:   [2.6924 ns 2.6935 ns 2.6947 ns]

change: [+50.625% +50.699% +50.754%] (p = 0.00 < 0.05)

Performance has regressed.

Found 4189 outliers among 100000 measurements (4.19%)

974 (0.97%) high mild

3215 (3.21%) high severe

samplei16/Pcg32/single  time:   [1.5690 ns 1.5693 ns 1.5696 ns]

change: [-7.5591% -7.5113% -7.4643%] (p = 0.00 < 0.05)

Performance has improved.

Found 10941 outliers among 100000 measurements (10.94%)

1206 (1.21%) low severe

15 (0.01%) low mild

91 (0.09%) high mild

9629 (9.63%) high severe

Benchmarking samplei16/Pcg32/distr: Warming up for 1.0000 s

Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 53730.

samplei16/Pcg32/distr   time:   [1.0367 ns 1.0369 ns 1.0370 ns]

change: [-16.514% -16.471% -16.428%] (p = 0.00 < 0.05)

Performance has improved.

Found 7284 outliers among 100000 measurements (7.28%)

9 (0.01%) low severe

2537 (2.54%) high mild

4738 (4.74%) high severe

samplei16/Pcg64/single  time:   [1.7850 ns 1.7854 ns 1.7857 ns]

change: [-6.4245% -6.4057% -6.3846%] (p = 0.00 < 0.05)

Performance has improved.

Found 5112 outliers among 100000 measurements (5.11%)

38 (0.04%) high mild

5074 (5.07%) high severe

samplei16/Pcg64/distr   time:   [1.4695 ns 1.4695 ns 1.4696 ns]

change: [+2.4432% +2.4613% +2.4796%] (p = 0.00 < 0.05)

Performance has regressed.

Found 3202 outliers among 100000 measurements (3.20%)

12 (0.01%) high mild

3190 (3.19%) high severe
samplei32/SmallRng/single

time:   [2.9643 ns 2.9717 ns 2.9790 ns]

change: [-40.218% -40.027% -39.812%] (p = 0.00 < 0.05)

Performance has improved.

Found 1 outliers among 100000 measurements (0.00%)

1 (0.00%) high mild

samplei32/SmallRng/distr

time:   [2.1418 ns 2.1468 ns 2.1520 ns]

change: [+14.158% +14.625% +15.127%] (p = 0.00 < 0.05)

Performance has regressed.

samplei32/ChaCha8Rng/single

time:   [3.4354 ns 3.4436 ns 3.4519 ns]

change: [-41.166% -40.973% -40.797%] (p = 0.00 < 0.05)

Performance has improved.

samplei32/ChaCha8Rng/distr

time:   [4.3137 ns 4.3205 ns 4.3273 ns]

change: [+82.252% +83.029% +83.739%] (p = 0.00 < 0.05)

Performance has regressed.

Found 17 outliers among 100000 measurements (0.02%)

17 (0.02%) high mild

samplei32/Pcg32/single  time:   [2.8065 ns 2.8139 ns 2.8213 ns]

change: [-41.459% -41.254% -41.074%] (p = 0.00 < 0.05)

Performance has improved.

samplei32/Pcg32/distr   time:   [2.2714 ns 2.2769 ns 2.2826 ns]

change: [+7.8178% +8.2767% +8.7563%] (p = 0.00 < 0.05)

Performance has regressed.

samplei32/Pcg64/single  time:   [3.5373 ns 3.5460 ns 3.5546 ns]

change: [-35.538% -35.317% -35.115%] (p = 0.00 < 0.05)

Performance has improved.

samplei32/Pcg64/distr   time:   [2.8174 ns 2.8238 ns 2.8304 ns]

change: [+11.798% +12.255% +12.692%] (p = 0.00 < 0.05)

Performance has regressed.
samplei64/SmallRng/single

time:   [4.4161 ns 4.4223 ns 4.4285 ns]

change: [-25.650% -25.501% -25.338%] (p = 0.00 < 0.05)

Performance has improved.

Found 8 outliers among 100000 measurements (0.01%)

8 (0.01%) high mild

samplei64/SmallRng/distr

time:   [1.9359 ns 1.9407 ns 1.9454 ns]

change: [+9.9079% +10.397% +10.890%] (p = 0.00 < 0.05)

Performance has regressed.

Found 1 outliers among 100000 measurements (0.00%)

1 (0.00%) high severe

samplei64/ChaCha8Rng/single

time:   [5.7185 ns 5.7264 ns 5.7344 ns]

change: [-27.418% -27.246% -27.100%] (p = 0.00 < 0.05)

Performance has improved.

Found 1 outliers among 100000 measurements (0.00%)

1 (0.00%) high mild

samplei64/ChaCha8Rng/distr

time:   [4.0937 ns 4.1020 ns 4.1103 ns]

change: [+14.257% +14.652% +15.085%] (p = 0.00 < 0.05)

Performance has regressed.

samplei64/Pcg32/single  time:   [4.9361 ns 4.9447 ns 4.9533 ns]

change: [-30.763% -30.592% -30.414%] (p = 0.00 < 0.05)

Performance has improved.

Found 48 outliers among 100000 measurements (0.05%)

44 (0.04%) high mild

4 (0.00%) high severe

samplei64/Pcg32/distr   time:   [3.3706 ns 3.3777 ns 3.3847 ns]

change: [+15.104% +15.511% +15.898%] (p = 0.00 < 0.05)

Performance has regressed.

samplei64/Pcg64/single  time:   [4.7009 ns 4.7079 ns 4.7150 ns]

change: [-28.960% -28.801% -28.651%] (p = 0.00 < 0.05)

Performance has improved.

samplei64/Pcg64/distr   time:   [2.8317 ns 2.8380 ns 2.8442 ns]

change: [+12.584% +13.020% +13.467%] (p = 0.00 < 0.05)

Performance has regressed.

Found 3 outliers among 100000 measurements (0.00%)

3 (0.00%) high severe
samplei128/SmallRng/single

time:   [9.6697 ns 9.6778 ns 9.6860 ns]

change: [-15.933% -15.813% -15.695%] (p = 0.00 < 0.05)

Performance has improved.

Found 20 outliers among 100000 measurements (0.02%)

20 (0.02%) high mild

samplei128/SmallRng/distr

time:   [6.7277 ns 6.7370 ns 6.7460 ns]

change: [+18.371% +18.650% +18.908%] (p = 0.00 < 0.05)

Performance has regressed.

Found 95 outliers among 100000 measurements (0.10%)

95 (0.10%) high mild

samplei128/ChaCha8Rng/single

time:   [12.092 ns 12.107 ns 12.121 ns]

change: [-16.186% -16.036% -15.883%] (p = 0.00 < 0.05)

Performance has improved.

Found 5 outliers among 100000 measurements (0.01%)

3 (0.00%) high mild

2 (0.00%) high severe

samplei128/ChaCha8Rng/distr

time:   [8.8107 ns 8.8237 ns 8.8367 ns]

change: [+12.524% +12.787% +13.051%] (p = 0.00 < 0.05)

Performance has regressed.

samplei128/Pcg32/single time:   [10.177 ns 10.190 ns 10.203 ns]

change: [-24.246% -24.126% -23.979%] (p = 0.00 < 0.05)

Performance has improved.

Found 1 outliers among 100000 measurements (0.00%)

1 (0.00%) high mild

samplei128/Pcg32/distr  time:   [8.3067 ns 8.3188 ns 8.3311 ns]

change: [+16.386% +16.681% +16.965%] (p = 0.00 < 0.05)

Performance has regressed.

Found 100 outliers among 100000 measurements (0.10%)

100 (0.10%) high mild

samplei128/Pcg64/single time:   [10.039 ns 10.047 ns 10.056 ns]

change: [-18.984% -18.858% -18.732%] (p = 0.00 < 0.05)

Performance has improved.

Found 47 outliers among 100000 measurements (0.05%)

47 (0.05%) high mild

samplei128/Pcg64/distr  time:   [7.0425 ns 7.0534 ns 7.0643 ns]

change: [+7.9572% +8.2093% +8.4552%] (p = 0.00 < 0.05)

Performance has regressed.

Found 19 outliers among 100000 measurements (0.02%)

18 (0.02%) high mild

1 (0.00%) high severe

Looks like a decent improvement for single-sampling, but considerably worse for distribution sampling.

New results (unbiased feature)


samplei8/SmallRng/single

time:   [1.9076 ns 1.9082 ns 1.9089 ns]

change: [-0.5836% -0.5448% -0.5079%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 22388 outliers among 100000 measurements (22.39%)

18016 (18.02%) low severe

44 (0.04%) low mild

22 (0.02%) high mild

4306 (4.31%) high severe

Benchmarking samplei8/SmallRng/distr: Warming up for 1.0000 s

Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.6s, enable flat sampling, or reduce sample count to 51640.

samplei8/SmallRng/distr time:   [1.1186 ns 1.1188 ns 1.1189 ns]

change: [+1.0264% +1.1032% +1.1788%] (p = 0.00 < 0.05)

Performance has regressed.

Found 7570 outliers among 100000 measurements (7.57%)

186 (0.19%) low severe

3 (0.00%) low mild

2649 (2.65%) high mild

4732 (4.73%) high severe

samplei8/ChaCha8Rng/single

time:   [2.0471 ns 2.0477 ns 2.0483 ns]

change: [-12.214% -12.133% -12.059%] (p = 0.00 < 0.05)

Performance has improved.

Found 14935 outliers among 100000 measurements (14.94%)

2 (0.00%) low severe

642 (0.64%) low mild

5812 (5.81%) high mild

8479 (8.48%) high severe

samplei8/ChaCha8Rng/distr

time:   [1.7169 ns 1.7173 ns 1.7176 ns]

change: [+0.3602% +0.3809% +0.4034%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 4558 outliers among 100000 measurements (4.56%)

138 (0.14%) low mild

2210 (2.21%) high mild

2210 (2.21%) high severe

samplei8/Pcg32/single   time:   [1.7169 ns 1.7175 ns 1.7181 ns]

change: [+1.7780% +1.8401% +1.8967%] (p = 0.00 < 0.05)

Performance has regressed.

Found 30316 outliers among 100000 measurements (30.32%)

6836 (6.84%) low mild

920 (0.92%) high mild

22560 (22.56%) high severe

samplei8/Pcg32/distr    time:   [1.2623 ns 1.2624 ns 1.2626 ns]

change: [+0.6649% +0.6766% +0.6904%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 27891 outliers among 100000 measurements (27.89%)

19384 (19.38%) low severe

75 (0.07%) low mild

7 (0.01%) high mild

8425 (8.43%) high severe

samplei8/Pcg64/single   time:   [2.0078 ns 2.0088 ns 2.0097 ns]

change: [-5.7091% -5.6495% -5.5883%] (p = 0.00 < 0.05)

Performance has improved.

Found 16243 outliers among 100000 measurements (16.24%)

8554 (8.55%) high mild

7689 (7.69%) high severe

samplei8/Pcg64/distr    time:   [1.4221 ns 1.4222 ns 1.4224 ns]

change: [-0.8919% -0.8796% -0.8670%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 6857 outliers among 100000 measurements (6.86%)

17 (0.02%) low severe

3 (0.00%) high mild

6837 (6.84%) high severe
samplei16/SmallRng/single

time:   [1.7045 ns 1.7048 ns 1.7052 ns]

change: [-10.573% -10.550% -10.531%] (p = 0.00 < 0.05)

Performance has improved.

Found 45291 outliers among 100000 measurements (45.29%)

20430 (20.43%) low severe

24861 (24.86%) high severe

Benchmarking samplei16/SmallRng/distr: Warming up for 1.0000 s

Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 52320.

samplei16/SmallRng/distr

time:   [1.0812 ns 1.0815 ns 1.0819 ns]

change: [+0.5067% +0.6014% +0.6705%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 1152 outliers among 100000 measurements (1.15%)

290 (0.29%) high mild

862 (0.86%) high severe

samplei16/ChaCha8Rng/single

time:   [1.9882 ns 1.9886 ns 1.9890 ns]

change: [-2.5715% -2.5408% -2.5091%] (p = 0.00 < 0.05)

Performance has improved.

Found 8702 outliers among 100000 measurements (8.70%)

301 (0.30%) low mild

4494 (4.49%) high mild

3907 (3.91%) high severe

samplei16/ChaCha8Rng/distr

time:   [1.7356 ns 1.7360 ns 1.7363 ns]

change: [-2.8998% -2.8740% -2.8467%] (p = 0.00 < 0.05)

Performance has improved.

Found 13632 outliers among 100000 measurements (13.63%)

2 (0.00%) low mild

12538 (12.54%) high mild

1092 (1.09%) high severe

samplei16/Pcg32/single  time:   [1.5160 ns 1.5162 ns 1.5164 ns]

change: [-10.682% -10.638% -10.594%] (p = 0.00 < 0.05)

Performance has improved.

Found 26466 outliers among 100000 measurements (26.47%)

13076 (13.08%) low severe

217 (0.22%) low mild

229 (0.23%) high mild

12944 (12.94%) high severe

samplei16/Pcg32/distr   time:   [1.2681 ns 1.2684 ns 1.2686 ns]

change: [+1.7735% +1.7986% +1.8233%] (p = 0.00 < 0.05)

Performance has regressed.

Found 23148 outliers among 100000 measurements (23.15%)

208 (0.21%) low severe

1 (0.00%) low mild

84 (0.08%) high mild

22855 (22.86%) high severe

samplei16/Pcg64/single  time:   [1.9071 ns 1.9077 ns 1.9083 ns]

change: [-0.0228% +0.0049% +0.0417%] (p = 0.74 > 0.05)

No change in performance detected.

Found 36106 outliers among 100000 measurements (36.11%)

18848 (18.85%) low severe

1174 (1.17%) low mild

88 (0.09%) high mild

15996 (16.00%) high severe

samplei16/Pcg64/distr   time:   [1.4509 ns 1.4511 ns 1.4514 ns]

change: [+1.1517% +1.1772% +1.2000%] (p = 0.00 < 0.05)

Performance has regressed.

Found 18304 outliers among 100000 measurements (18.30%)

5057 (5.06%) low severe

13 (0.01%) low mild

21 (0.02%) high mild

13213 (13.21%) high severe
samplei32/SmallRng/single

time:   [3.6718 ns 3.6817 ns 3.6919 ns]

change: [-25.935% -25.697% -25.453%] (p = 0.00 < 0.05)

Performance has improved.

samplei32/SmallRng/distr

time:   [1.8915 ns 1.8981 ns 1.9048 ns]

change: [+0.8519% +1.3457% +1.7827%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 8712 outliers among 100000 measurements (8.71%)

5349 (5.35%) high mild

3363 (3.36%) high severe

samplei32/ChaCha8Rng/single

time:   [4.3550 ns 4.3663 ns 4.3772 ns]

change: [-25.412% -25.156% -24.906%] (p = 0.00 < 0.05)

Performance has improved.

Found 4 outliers among 100000 measurements (0.00%)

4 (0.00%) high mild

samplei32/ChaCha8Rng/distr

time:   [2.7254 ns 2.7344 ns 2.7433 ns]

change: [+15.251% +15.837% +16.393%] (p = 0.00 < 0.05)

Performance has regressed.

Found 8503 outliers among 100000 measurements (8.50%)

6055 (6.05%) high mild

2448 (2.45%) high severe

samplei32/Pcg32/single  time:   [3.6329 ns 3.6435 ns 3.6541 ns]

change: [-24.192% -23.935% -23.661%] (p = 0.00 < 0.05)

Performance has improved.

samplei32/Pcg32/distr   time:   [2.1075 ns 2.1145 ns 2.1215 ns]

change: [+0.0819% +0.5507% +1.0484%] (p = 0.02 < 0.05)

Change within noise threshold.

Found 9153 outliers among 100000 measurements (9.15%)

5330 (5.33%) high mild

3823 (3.82%) high severe

samplei32/Pcg64/single  time:   [4.2954 ns 4.3080 ns 4.3205 ns]

change: [-21.711% -21.416% -21.098%] (p = 0.00 < 0.05)

Performance has improved.

samplei32/Pcg64/distr   time:   [2.4684 ns 2.4771 ns 2.4858 ns]

change: [-1.9894% -1.5294% -1.0481%] (p = 0.00 < 0.05)

Performance has improved.

Found 9636 outliers among 100000 measurements (9.64%)

5199 (5.20%) high mild

4437 (4.44%) high severe
samplei64/SmallRng/single

time:   [5.2532 ns 5.2629 ns 5.2723 ns]

change: [-11.567% -11.342% -11.128%] (p = 0.00 < 0.05)

Performance has improved.

samplei64/SmallRng/distr

time:   [1.7670 ns 1.7737 ns 1.7806 ns]

change: [+0.3709% +0.8994% +1.5000%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 9659 outliers among 100000 measurements (9.66%)

5357 (5.36%) high mild

4302 (4.30%) high severe

samplei64/ChaCha8Rng/single

time:   [6.7406 ns 6.7520 ns 6.7634 ns]

change: [-14.394% -14.216% -14.015%] (p = 0.00 < 0.05)

Performance has improved.

samplei64/ChaCha8Rng/distr

time:   [3.5487 ns 3.5596 ns 3.5705 ns]

change: [-0.9413% -0.5079% -0.0649%] (p = 0.02 < 0.05)

Change within noise threshold.

Found 8781 outliers among 100000 measurements (8.78%)

5848 (5.85%) high mild

2933 (2.93%) high severe

samplei64/Pcg32/single  time:   [5.7866 ns 5.7992 ns 5.8118 ns]

change: [-18.870% -18.598% -18.380%] (p = 0.00 < 0.05)

Performance has improved.

Found 1 outliers among 100000 measurements (0.00%)

1 (0.00%) high mild

samplei64/Pcg32/distr   time:   [3.2564 ns 3.2654 ns 3.2744 ns]

change: [+11.243% +11.670% +12.117%] (p = 0.00 < 0.05)

Performance has regressed.

Found 8838 outliers among 100000 measurements (8.84%)

5754 (5.75%) high mild

3084 (3.08%) high severe

samplei64/Pcg64/single  time:   [5.6715 ns 5.6825 ns 5.6933 ns]

change: [-14.281% -14.061% -13.872%] (p = 0.00 < 0.05)

Performance has improved.

Found 1 outliers among 100000 measurements (0.00%)

1 (0.00%) high severe

samplei64/Pcg64/distr   time:   [2.5092 ns 2.5177 ns 2.5262 ns]

change: [-0.1602% +0.2676% +0.7783%] (p = 0.26 > 0.05)

No change in performance detected.

Found 9574 outliers among 100000 measurements (9.57%)

5084 (5.08%) high mild

4490 (4.49%) high severe
samplei128/SmallRng/single

time:   [10.410 ns 10.423 ns 10.436 ns]

change: [-9.4864% -9.3321% -9.1484%] (p = 0.00 < 0.05)

Performance has improved.

Found 10 outliers among 100000 measurements (0.01%)

10 (0.01%) high mild

samplei128/SmallRng/distr

time:   [5.6088 ns 5.6180 ns 5.6275 ns]

change: [-1.2694% -1.0568% -0.8129%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 8739 outliers among 100000 measurements (8.74%)

5882 (5.88%) high mild

2857 (2.86%) high severe

samplei128/ChaCha8Rng/single

time:   [12.332 ns 12.349 ns 12.366 ns]

change: [-14.509% -14.355% -14.176%] (p = 0.00 < 0.05)

Performance has improved.

Found 13 outliers among 100000 measurements (0.01%)

13 (0.01%) high mild

samplei128/ChaCha8Rng/distr

time:   [7.9089 ns 7.9233 ns 7.9378 ns]

change: [+0.9915% +1.2787% +1.5530%] (p = 0.00 < 0.05)

Change within noise threshold.

Found 8295 outliers among 100000 measurements (8.29%)

6697 (6.70%) high mild

1598 (1.60%) high severe

samplei128/Pcg32/single time:   [11.405 ns 11.422 ns 11.440 ns]

change: [-15.114% -14.949% -14.783%] (p = 0.00 < 0.05)

Performance has improved.

samplei128/Pcg32/distr  time:   [7.4301 ns 7.4443 ns 7.4582 ns]

change: [+4.1310% +4.4150% +4.7024%] (p = 0.00 < 0.05)

Performance has regressed.

Found 8602 outliers among 100000 measurements (8.60%)

6161 (6.16%) high mild

2441 (2.44%) high severe

samplei128/Pcg64/single time:   [10.496 ns 10.511 ns 10.526 ns]

change: [-15.273% -15.113% -14.953%] (p = 0.00 < 0.05)

Performance has improved.

samplei128/Pcg64/distr  time:   [6.3761 ns 6.3885 ns 6.4010 ns]

change: [-2.2301% -1.9907% -1.7150%] (p = 0.00 < 0.05)

Performance has improved.

Found 8753 outliers among 100000 measurements (8.75%)

5805 (5.80%) high mild

2948 (2.95%) high severe

These look not-quite-as-good for single-sampling (but still an improvement), and significantly better for distribution sampling...

...I hate micro-benchmarking (see results in #1286). Looks like we should just use Lemire's method for distribution sampling in all cases.

vks · 2023-02-17T16:59:20Z

Looks like we should just use Lemire's method for distribution sampling in all cases.

Agreed, especially if it is less biased.

dhardy · 2023-02-18T10:53:02Z

Bench re-runs (lower clock speed, better formatted): results.ods Highlights:

			biased vs base	unbiased vs base	unbiased vs biased
samplei8	ChaCha8Rng	distr	57.00%	0.10%	-36.30%
samplei8	Pcg32	distr	-16.70%	0.50%	20.60%
samplei8	Pcg64	distr	2.10%	0.10%	-2.00%
samplei8	SmallRng	distr	70.60%	1.60%	-40.50%
samplei16	ChaCha8Rng	distr	49.00%	-0.20%	-33.00%
samplei16	Pcg32	distr	-16.60%	0.00%	20.00%
samplei16	Pcg64	distr	2.40%	0.30%	-2.10%
samplei16	SmallRng	distr	74.20%	-2.90%	-44.30%
samplei32	ChaCha8Rng	distr	80.80%	13.50%	-37.20%
samplei32	Pcg32	distr	8.20%	0.30%	-7.30%
samplei32	Pcg64	distr	11.60%	-0.80%	-11.10%
samplei32	SmallRng	distr	12.50%	-0.10%	-11.20%
samplei64	ChaCha8Rng	distr	13.30%	-0.50%	-12.20%
samplei64	Pcg32	distr	14.60%	11.30%	-2.90%
samplei64	Pcg64	distr	12.40%	-0.40%	-11.40%
samplei64	SmallRng	distr	11.70%	0.40%	-10.10%
samplei128	ChaCha8Rng	distr	12.50%	-1.40%	-12.30%
samplei128	Pcg32	distr	14.40%	2.00%	-10.90%
samplei128	Pcg64	distr	9.30%	-2.10%	-10.40%
samplei128	SmallRng	distr	20.00%	-0.80%	-17.30%
samplei8	ChaCha8Rng	single	-9.70%	-11.10%	-1.60%
samplei8	Pcg32	single	-12.80%	-0.10%	14.60%
samplei8	Pcg64	single	-8.80%	-5.60%	3.60%
samplei8	SmallRng	single	-21.50%	0.00%	27.40%
samplei16	ChaCha8Rng	single	-9.40%	-0.80%	9.50%
samplei16	Pcg32	single	-7.30%	-12.10%	-5.20%
samplei16	Pcg64	single	-7.80%	-1.80%	6.50%
samplei16	SmallRng	single	-11.00%	-11.10%	-0.10%
samplei32	ChaCha8Rng	single	-41.70%	-24.90%	28.90%
samplei32	Pcg32	single	-42.20%	-25.30%	29.20%
samplei32	Pcg64	single	-36.50%	-22.20%	22.60%
samplei32	SmallRng	single	-40.20%	-25.10%	25.30%
samplei64	ChaCha8Rng	single	-26.70%	-12.70%	19.00%
samplei64	Pcg32	single	-29.50%	-18.00%	16.30%
samplei64	Pcg64	single	-28.80%	-15.60%	18.60%
samplei64	SmallRng	single	-26.20%	-12.40%	18.80%
samplei128	ChaCha8Rng	single	-17.50%	-14.80%	3.20%
samplei128	Pcg32	single	-25.50%	-16.60%	11.90%
samplei128	Pcg64	single	-20.40%	-16.00%	5.60%
samplei128	SmallRng	single	-14.60%	-9.90%	5.50%

So, yes, this supports the idea that we should always use Lemire's method for distribution sampling.

dhardy · 2023-02-18T10:58:33Z

Remaining question: whether to keep both biased and unbiased options for single-sampling (using a feature flag). See #494. I am inclined to keep this under the following conditions:

Biased is the default (otherwise it is an optimisation that will likely get little use, so why bother).
Only the default option is tested by value-stability tests. (Currently achieved by only build-testing with "unbiased" enabled.)

There is not a strong rationale for this however, we could reduce to just one implementation (either).

dhardy · 2023-02-20T10:20:58Z

I'm inclined to merge this as-is. Review please, maybe @TheIronBorn or @vks?

src/distributions/uniform.rs

CHANGELOG.md

dhardy · 2023-02-21T09:43:27Z

Thanks @TheIronBorn. Updated.

dhardy · 2023-03-23T10:46:05Z

I'd like to merge this but am still waiting for a reviewer to approve (policy requires review is not by the author). @TheIronBorn you last reviewed this; would you mind revisiting?

dhardy added 8 commits February 17, 2023 11:58

Add uniform distribution benchmarks

8cf5972

Based on canon-uniform-benches branch, revised

Uniform: use sampling methods from canon-uniform-benches branch

c12bddb

Also: add "unbiased" feature flag

Fix feature simd_support

9d2c5fa

Update value stability tests

09df9da

Add line to CHANGELOG

cb953e4

Uniform SIMD sampling: use Lemire's method

8dc7f79

This is a small tweak unsupported by evidence, but brings SIMD in line with unbiased integer range sampling.

Uniform: impl PartialEq, Eq where possible

202d840

Allows simpler tests

CI: benches now require small_rng; build-test unbiased

777d9e7

Note: unbiased does pass current value-stability tests, but could fail extra ones in the future.

dhardy mentioned this pull request Feb 18, 2023

Implement an exact Bernoulli distribution #1193

Open

Always use Lemire's method for distribution sampling

6d5f123

dhardy marked this pull request as ready for review February 20, 2023 10:07

TheIronBorn reviewed Feb 20, 2023

View reviewed changes

src/distributions/uniform.rs Outdated Show resolved Hide resolved

src/distributions/uniform.rs Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

Address review

e2148da

dhardy mentioned this pull request Feb 21, 2023

Uniform float improvements #1289

Merged

GUIpsp mentioned this pull request Mar 8, 2023

Uniform Generator hangs for certain limits. #1299

Open

TheIronBorn approved these changes Mar 23, 2023

View reviewed changes

dhardy merged commit 22d0756 into rust-random:master Mar 24, 2023

dhardy mentioned this pull request Mar 14, 2024

Sampling benches in rand* 0.9.0-alpha.0 #1409

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniform sampling: use Canon's method #1287

Uniform sampling: use Canon's method #1287

dhardy commented Feb 17, 2023

dhardy commented Feb 17, 2023

vks commented Feb 17, 2023

dhardy commented Feb 18, 2023

dhardy commented Feb 18, 2023

dhardy commented Feb 20, 2023

dhardy commented Feb 21, 2023

dhardy commented Mar 23, 2023

Uniform sampling: use Canon's method #1287

Uniform sampling: use Canon's method #1287

Conversation

dhardy commented Feb 17, 2023

dhardy commented Feb 17, 2023

vks commented Feb 17, 2023

dhardy commented Feb 18, 2023

dhardy commented Feb 18, 2023

dhardy commented Feb 20, 2023

dhardy commented Feb 21, 2023

dhardy commented Mar 23, 2023