Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling benches in rand* 0.9.0-alpha.0 #1409

Closed
Thell opened this issue Mar 13, 2024 · 7 comments
Closed

Sampling benches in rand* 0.9.0-alpha.0 #1409

Thell opened this issue Mar 13, 2024 · 7 comments

Comments

@Thell
Copy link

Thell commented Mar 13, 2024

Hello! I was pretty excited to see the alpha release because of all the hype regarding Canon's method but either I did something wrong or, ..., well 🤷

The first bench is from dba696e on Feb 15 and then the alpha release.

The command used was

rustup run nightly cargo criterion --bench uniform --features small_rng

samplei32

Alpha Rng single distr
SmallRng 3.30 ns (✅ 1.00x) 2.01 ns (✅ 1.00x)
=> SmallRng 3.31 ns (✅ 1.00x) 2.07 ns (✅ 1.00x)
ChaCha8Rng 3.59 ns (✅ 1.09x slower) 2.48 ns (❌ 1.23x slower)
=> ChaCha8Rng 3.62 ns (✅ 1.09x slower) 2.66 ns (❌ 1.29x slower)
Pcg32 2.85 ns (✅ 1.16x faster) 2.25 ns (❌ 1.12x slower)
=> Pcg32 2.93 ns (✅ 1.13x faster) 2.28 ns (✅ 1.10x slower)
Pcg64 3.88 ns (❌ 1.18x slower) 2.66 ns (❌ 1.32x slower)
=> Pcg64 3.86 ns (❌ 1.17x slower) 2.72 ns (❌ 1.32x slower)

samplei64

Alpha Rng single distr
SmallRng 4.63 ns (✅ 1.00x) 1.90 ns (✅ 1.00x)
=> SmallRng 4.58 ns (✅ 1.00x) 1.91 ns (✅ 1.00x)
ChaCha8Rng 5.97 ns (❌ 1.29x slower) 3.60 ns (❌ 1.89x slower)
=> ChaCha8Rng 5.90 ns (❌ 1.29x slower) 3.63 ns (❌ 1.90x slower)
Pcg32 5.29 ns (❌ 1.14x slower) 3.12 ns (❌ 1.64x slower)
=> Pcg32 5.25 ns (❌ 1.15x slower) 3.12 ns (❌ 1.63x slower)
Pcg64 5.00 ns (✅ 1.08x slower) 2.68 ns (❌ 1.40x slower)
=> Pcg64 5.05 ns (✅ 1.10x slower) 2.70 ns (❌ 1.41x slower)

As you can see, no gains. At least things didn't get worse. I also tested with feature unbiased and I didn't see anything change but if I understand correctly that shouldn't alter single samples.

So did I miss something?

@TheIronBorn
Copy link
Collaborator

TheIronBorn commented Mar 14, 2024

It would be useful to know what CPU you ran this on, RUSTFLAGS/etc.

Note also if you didn't already that Canon's method is only used for single.

@dhardy
Copy link
Member

dhardy commented Mar 14, 2024

I did a lot of benchmarking on this, but it was a while back. From what I remember, in many cases there was not a single best option. Also, as @TheIronBorn says, results are likely dependant on your CPU architecture.

See my merge PR with some benchmarks and links to others: #1287

@Thell
Copy link
Author

Thell commented Mar 14, 2024

$ rustc --print target-cpus | head -n2
Available CPUs for this target:
    native                  - Select the CPU of the current host (currently znver3).

$ rustup show -v
Default host: x86_64-unknown-linux-gnu
rustup home:  /home/thell/.rustup

installed toolchains
--------------------

stable-x86_64-pc-windows-gnu
(rustc does not exist)

stable-x86_64-unknown-linux-gnu (default)
rustc 1.76.0 (07dca489a 2024-02-04)

nightly-x86_64-unknown-linux-gnu
rustc 1.78.0-nightly (9c3ad802d 2024-03-07)


active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.76.0 (07dca489a 2024-02-04)
$ git pull
$ git checkout dba696e9 -b pre-alpha
$ git reset --hard && git clean -fdx
$ RUSTFLAGS='-C target-cpu=native' rustup run nightly cargo bench --bench uniform --features small_rng -- --save-baseline pre-alpha
$ git reset --hard
$ git checkout -b alpha 0.9.0-alpha.0
$ ls ./target/c*
samplei128  samplei16  samplei32  samplei64  samplei8
$ RUSTFLAGS='-C target-cpu=native' rustup run nightly cargo bench --bench uniform --features small_rng -- --save-baseline alpha
 RUSTFLAGS='-C target-cpu=native' rustup run nightly cargo bench --bench uniform --features small_rng -- --load-baseline pre-alpha --baseline
alpha

I wasn't sure how to have the console output with color shown so I just put them all in here. Almost all of the i32 and i64 tests are within default noise threshold with the exception being ChaCha8 which regressed.

Perhaps something was changed at some previous point and I should re-test using the current stable release tag?

Results
Running benches/uniform.rs (target/release/deps/uniform-1dbdf69be6706036)
samplei8/SmallRng/single
                        time:   [1.9008 ns 1.9021 ns 1.9037 ns]
                        change: [+0.2405% +0.3151% +0.4014%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6317 outliers among 100000 measurements (6.32%)
  3300 (3.30%) high mild
  3017 (3.02%) high severe
samplei8/SmallRng/distr time:   [1.0950 ns 1.0956 ns 1.0964 ns]
                        change: [-0.0325% +0.1498% +0.3289%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 16083 outliers among 100000 measurements (16.08%)
  3394 (3.39%) high mild
  12689 (12.69%) high severe
samplei8/ChaCha8Rng/single
                        time:   [2.0377 ns 2.0382 ns 2.0386 ns]
                        change: [-1.4719% -1.4391% -1.4090%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3413 outliers among 100000 measurements (3.41%)
  2265 (2.27%) high mild
  1148 (1.15%) high severe
samplei8/ChaCha8Rng/distr
                        time:   [1.7296 ns 1.7299 ns 1.7302 ns]
                        change: [-1.5488% -1.5209% -1.4961%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1695 outliers among 100000 measurements (1.70%)
  22 (0.02%) low mild
  380 (0.38%) high mild
  1293 (1.29%) high severe
samplei8/Pcg32/single   time:   [1.6675 ns 1.6686 ns 1.6704 ns]
                        change: [+9.3987% +9.4737% +9.6010%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1686 outliers among 100000 measurements (1.69%)
  201 (0.20%) high mild
  1485 (1.49%) high severe
samplei8/Pcg32/distr    time:   [1.0939 ns 1.0941 ns 1.0942 ns]
                        change: [+0.1939% +0.3701% +0.5310%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3666 outliers among 100000 measurements (3.67%)
  998 (1.00%) high mild
  2668 (2.67%) high severe
samplei8/Pcg64/single   time:   [1.9474 ns 1.9489 ns 1.9506 ns]
                        change: [+1.4925% +1.5704% +1.6890%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8212 outliers among 100000 measurements (8.21%)
  6587 (6.59%) high mild
  1625 (1.62%) high severe
samplei8/Pcg64/distr    time:   [1.5320 ns 1.5324 ns 1.5328 ns]
                        change: [+0.2939% +0.3313% +0.3705%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6196 outliers among 100000 measurements (6.20%)
  3 (0.00%) high mild
  6193 (6.19%) high severe

samplei16/SmallRng/single
                        time:   [1.6374 ns 1.6379 ns 1.6384 ns]
                        change: [-11.384% -11.323% -11.275%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1746 outliers among 100000 measurements (1.75%)
  136 (0.14%) high mild
  1610 (1.61%) high severe
samplei16/SmallRng/distr
                        time:   [1.1045 ns 1.1047 ns 1.1050 ns]
                        change: [+0.3058% +0.4961% +0.6822%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4429 outliers among 100000 measurements (4.43%)
  1074 (1.07%) high mild
  3355 (3.35%) high severe
samplei16/ChaCha8Rng/single
                        time:   [1.8959 ns 1.8963 ns 1.8966 ns]
                        change: [+1.1191% +1.1439% +1.1704%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 19919 outliers among 100000 measurements (19.92%)
  18434 (18.43%) high mild
  1485 (1.49%) high severe
samplei16/ChaCha8Rng/distr
                        time:   [1.7354 ns 1.7367 ns 1.7386 ns]
                        change: [+0.1916% +0.2717% +0.3957%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2411 outliers among 100000 measurements (2.41%)
  15 (0.01%) low mild
  685 (0.69%) high mild
  1711 (1.71%) high severe
samplei16/Pcg32/single  time:   [1.3995 ns 1.3997 ns 1.4000 ns]
                        change: [-3.1237% -3.0960% -3.0672%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24995 outliers among 100000 measurements (25.00%)
  623 (0.62%) low severe
  247 (0.25%) low mild
  2613 (2.61%) high mild
  21512 (21.51%) high severe
samplei16/Pcg32/distr   time:   [1.0966 ns 1.0968 ns 1.0970 ns]
                        change: [+0.0429% +0.3030% +0.6548%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 4469 outliers among 100000 measurements (4.47%)
  1438 (1.44%) high mild
  3031 (3.03%) high severe
samplei16/Pcg64/single  time:   [1.9126 ns 1.9130 ns 1.9135 ns]
                        change: [-0.4449% -0.4158% -0.3866%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1734 outliers among 100000 measurements (1.73%)
  202 (0.20%) high mild
  1532 (1.53%) high severe
samplei16/Pcg64/distr   time:   [1.5354 ns 1.5359 ns 1.5364 ns]
                        change: [+0.9211% +0.9628% +1.0018%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10652 outliers among 100000 measurements (10.65%)
  1393 (1.39%) high mild
  9259 (9.26%) high severe

samplei32/SmallRng/single
                        time:   [3.9972 ns 4.0033 ns 4.0097 ns]
                        change: [-0.5468% -0.3320% -0.1152%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100000 measurements (0.01%)
  15 (0.01%) high mild
samplei32/SmallRng/distr
                        time:   [2.0955 ns 2.1032 ns 2.1107 ns]
                        change: [+0.1780% +0.6765% +1.1834%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 8459 outliers among 100000 measurements (8.46%)
  6355 (6.36%) high mild
  2104 (2.10%) high severe
samplei32/ChaCha8Rng/single
                        time:   [3.6114 ns 3.6196 ns 3.6277 ns]
                        change: [+0.5737% +0.8999% +1.2095%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100000 measurements (0.01%)
  7 (0.01%) high mild
samplei32/ChaCha8Rng/distr
                        time:   [2.6196 ns 2.6296 ns 2.6392 ns]
                        change: [+6.6041% +7.1811% +7.7045%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8818 outliers among 100000 measurements (8.82%)
  5672 (5.67%) high mild
  3146 (3.15%) high severe
samplei32/Pcg32/single  time:   [3.0424 ns 3.0501 ns 3.0582 ns]
                        change: [-1.3766% -0.9890% -0.6601%] (p = 0.00 < 0.05)
                        Change within noise threshold.
samplei32/Pcg32/distr   time:   [2.1075 ns 2.1153 ns 2.1231 ns]
                        change: [-1.4113% -0.8950% -0.4133%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8456 outliers among 100000 measurements (8.46%)
  6794 (6.79%) high mild
  1662 (1.66%) high severe
samplei32/Pcg64/single  time:   [4.3433 ns 4.3531 ns 4.3632 ns]
                        change: [-0.1027% +0.2315% +0.5439%] (p = 0.15 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100000 measurements (0.01%)
  3 (0.00%) high mild
  3 (0.00%) high severe
samplei32/Pcg64/distr   time:   [2.8218 ns 2.8318 ns 2.8423 ns]
                        change: [-0.1475% +0.3556% +0.8927%] (p = 0.18 > 0.05)
                        No change in performance detected.
Found 8683 outliers among 100000 measurements (8.68%)
  5879 (5.88%) high mild
  2804 (2.80%) high severe

samplei64/SmallRng/single
                        time:   [4.7586 ns 4.7656 ns 4.7724 ns]
                        change: [-0.0581% +0.1328% +0.3318%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 22 outliers among 100000 measurements (0.02%)
  22 (0.02%) high mild
samplei64/SmallRng/distr
                        time:   [1.9820 ns 1.9894 ns 1.9969 ns]
                        change: [-0.2791% +0.2478% +0.8057%] (p = 0.36 > 0.05)
                        No change in performance detected.
Found 7729 outliers among 100000 measurements (7.73%)
  7696 (7.70%) high mild
  33 (0.03%) high severe
samplei64/ChaCha8Rng/single
                        time:   [6.1432 ns 6.1521 ns 6.1609 ns]
                        change: [+2.1822% +2.3995% +2.5999%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 91 outliers among 100000 measurements (0.09%)
  82 (0.08%) high mild
  9 (0.01%) high severe
samplei64/ChaCha8Rng/distr
                        time:   [3.3616 ns 3.3734 ns 3.3852 ns]
                        change: [+0.5329% +0.9489% +1.4765%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8176 outliers among 100000 measurements (8.18%)
  6897 (6.90%) high mild
  1279 (1.28%) high severe
samplei64/Pcg32/single  time:   [5.1936 ns 5.2022 ns 5.2108 ns]
                        change: [-0.5869% -0.3398% -0.0981%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 37 outliers among 100000 measurements (0.04%)
  37 (0.04%) high mild
samplei64/Pcg32/distr   time:   [3.1294 ns 3.1391 ns 3.1494 ns]
                        change: [-0.1468% +0.2478% +0.7038%] (p = 0.27 > 0.05)
                        No change in performance detected.
Found 8510 outliers among 100000 measurements (8.51%)
  6389 (6.39%) high mild
  2121 (2.12%) high severe
samplei64/Pcg64/single  time:   [5.2153 ns 5.2234 ns 5.2318 ns]
                        change: [+0.0131% +0.2179% +0.4687%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100000 measurements (0.01%)
  14 (0.01%) high mild
samplei64/Pcg64/distr   time:   [2.9215 ns 2.9317 ns 2.9420 ns]
                        change: [+0.1886% +0.7063% +1.2397%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 8034 outliers among 100000 measurements (8.03%)
  6741 (6.74%) high mild
  1293 (1.29%) high severe

samplei128/SmallRng/single
                        time:   [9.3967 ns 9.4054 ns 9.4141 ns]
                        change: [+0.5760% +0.7111% +0.8483%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 60 outliers among 100000 measurements (0.06%)
  52 (0.05%) high mild
  8 (0.01%) high severe
samplei128/SmallRng/distr
                        time:   [4.0873 ns 4.0978 ns 4.1082 ns]
                        change: [-0.2938% +0.0917% +0.4264%] (p = 0.62 > 0.05)
                        No change in performance detected.
Found 8806 outliers among 100000 measurements (8.81%)
  6068 (6.07%) high mild
  2738 (2.74%) high severe
samplei128/ChaCha8Rng/single
                        time:   [11.655 ns 11.669 ns 11.683 ns]
                        change: [-2.8671% -2.6973% -2.5594%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 90 outliers among 100000 measurements (0.09%)
  89 (0.09%) high mild
  1 (0.00%) high severe
samplei128/ChaCha8Rng/distr
                        time:   [6.3019 ns 6.3170 ns 6.3323 ns]
                        change: [-0.8811% -0.5615% -0.2195%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7936 outliers among 100000 measurements (7.94%)
  7379 (7.38%) high mild
  557 (0.56%) high severe
samplei128/Pcg32/single time:   [9.9469 ns 9.9607 ns 9.9746 ns]
                        change: [-1.5890% -1.3931% -1.1947%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100000 measurements (0.01%)
  9 (0.01%) high mild
  1 (0.00%) high severe
samplei128/Pcg32/distr  time:   [6.3234 ns 6.3372 ns 6.3508 ns]
                        change: [-0.4388% -0.1246% +0.1837%] (p = 0.43 > 0.05)
                        No change in performance detected.
Found 8658 outliers among 100000 measurements (8.66%)
  5817 (5.82%) high mild
  2841 (2.84%) high severe
samplei128/Pcg64/single time:   [10.006 ns 10.017 ns 10.029 ns]
                        change: [-2.0306% -1.8609% -1.6901%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 44 outliers among 100000 measurements (0.04%)
  39 (0.04%) high mild
  5 (0.01%) high severe
samplei128/Pcg64/distr  time:   [5.4523 ns 5.4664 ns 5.4804 ns]
                        change: [-0.0032% +0.3598% +0.7320%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 8410 outliers among 100000 measurements (8.41%)
  6442 (6.44%) high mild
  1968 (1.97%) high severe

@Thell
Copy link
Author

Thell commented Mar 14, 2024

I did a lot of benchmarking on this, but it was a while back. From what I remember, in many cases there was not a single best option. Also, as @TheIronBorn says, results are likely dependant on your CPU architecture.

See my merge PR with some benchmarks and links to others: #1287

Yeah, those discussions and benches are why I was excited to give it a try with the uniform sample.

@dhardy
Copy link
Member

dhardy commented Mar 14, 2024

Sorry, I should have said CPU micro-architecture. Not that I've seen enough data to draw any real conclusions about how the various methods perform for each. Try: cat /proc/cpuinfo | head

Also, that's a narrow range of commits you picked — there don't appear to be any code changes.

Yes, micro-benchmarks can be this inconsistent, unfortunately.

@Thell
Copy link
Author

Thell commented Mar 14, 2024

cat /proc/cpuinfo | head

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 25
model           : 80
model name      : AMD Ryzen 7 5700G with Radeon Graphics
stepping        : 0
microcode       : 0xffffffff
cpu MHz         : 3792.776
cache size      : 512 KB
physical id     : 0

Regarding the range of commits... oh boy, thank you for pointing that out. I bet that's what I did wrong, I was thinking the prepare alpha was the merge (that's what I get for not looking closer). I'll find where the uniform.rs was changed on master and bench from before that.

@Thell
Copy link
Author

Thell commented Mar 14, 2024

Also, that's a narrow range of commits you picked — there don't appear to be any code changes.

Thanks again. That's more like what I was hoping to see.

Bench (Distributions) Pre-Canon Canon
uniform_u32x1_6_single 41 ns/iter (+/- 0) 17 ns/iter (+/- 0)
uniform_u64x1_6_single 41 ns/iter (+/- 0) 15 ns/iter (+/- 0)

Since there isn't a uniform bench from prior to the 'canon' commit I'll just assume it got better too. 😄

Now I'm really looking forward 0.9.0.

I'll close this as use error and, again, let me express my appreciation for your efforts. 👍

@Thell Thell closed this as completed Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants