WIP: Investigate Canon's uniform-integer sampling method #1196

dhardy · 2021-10-20T17:00:05Z

See #1172. Also includes #1154 as a comparator.

Some of this work is @TheIronBorn's; I have cleaned up and tweaked his impls a bit, and written new benchmarks using Criterion.

Algorithms to compare:

"old": uses rejection sampling (essentially Lemire's method) for repeated sampling from a distribution; for sample_single we use a bit-mask method for i32 and larger
"oneill": uses O'Neill's suggested optimisations from switch sample_single int to O'neill's modulo method & update gen_rang… #1154 for sample_single; for repeated sampling this is just Lemire's method again (I may not have optimised properly however)
"canon": the subject of Look into this alleged optimal algorithm for bounded random integers #1172
"canon-lemire": a slight tweak, using the guard condition from Lemire's method; this requires an extra modulus operation (except with repeated sampling from a distribution) and generally appears the same or worse in benchmarks
"bitmask": a bitmask method, implemented by @TheIronBorn

Of these, Canon's method definitely appears the best all-round method. I'm not done testing yet and may use some tweaks for i128 and some SIMD cases.

Cases:

repeated sampling from a distribution, using ranges with low and high probabilities of rejection (picked by @TheIronBorn)
single sampling, using the same low and high ranges, and using varying ranges (the upper bound is incremented before each sample)

Research questions

Is sample faster than sample_single?
Yes, definitely (sample_single is approx 50% slower). It's clear so I won't bother with evidence here.

Is it better to use u32 over u64 random numbers when sampling small integers?
For Canon-Lemire, maybe for i8/i16. For Canon, no. For i32, no. Evidence.
(Note: several benches vary by 2.5%, one by 9%, so take with a pinch of salt.)

Note further: the answer may be different on 32-bit CPUs. Tested on 5800X.

Is it better to use u64 instead of u128 samples for 128-bit integers?
Yes (~15% faster for non-SIMD distribution sampling). Evidence.

Which is the best algorithm for XX?
Each case will get a new post below.

CC @ctsrc @stephentyrone

dhardy · 2021-10-20T17:02:49Z

Which algorithm to use for repeated sampling from a distribution? (non-SIMD)

Canon and Canon-Lemire are basically the same, and clear winners for i32 and
i64 (same at i128 except that Bitmask also does similarly well). At i8/i16 and
for low-reject cases, all algorithms are about the same.

For i8 and i32, Canon appears slightly better than Canon-Lemire, though I'm not
convinced the results are significant. In any case, it makes sense to choose
Canon when the two are the same.

Evidence

dhardy · 2021-10-22T11:02:00Z

How much bias is acceptable for samples?
This is a slightly trickier question: is using a single u64 sample for i8 / i16 results acceptable (leaving bias of 1-in-2^56 / 1-in-2^48)?
In practice the performance advantage is tiny for discrete samples (from a fixed distribution: 941μs vs 991μs or 5% faster for i8, 6% for i16), but for SIMD this may be larger.

Results (see Canon vs Canon64)

uniform_dist_int_i8/Old/high_reject                                                                             
                        time:   [1.0550 ns 1.0564 ns 1.0579 ns]
                        change: [-4.8870% -4.7632% -4.6061%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
uniform_dist_int_i8/Old/low_reject                                                                             
                        time:   [1.0613 ns 1.0631 ns 1.0655 ns]
                        change: [-4.2396% -4.0519% -3.7987%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe
uniform_dist_int_i8/Lemire/high_reject                                                                             
                        time:   [1.2753 ns 1.2757 ns 1.2762 ns]
                        change: [+0.0946% +0.1332% +0.1760%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  9 (9.00%) high severe
uniform_dist_int_i8/Lemire/low_reject                                                                             
                        time:   [1.2861 ns 1.2864 ns 1.2869 ns]
                        change: [-0.4160% -0.3649% -0.3147%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe
uniform_dist_int_i8/Canon/high_reject                                                                             
                        time:   [990.75 ps 990.89 ps 991.05 ps]
                        change: [-5.7790% -5.7415% -5.7063%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
uniform_dist_int_i8/Canon/low_reject                                                                             
                        time:   [982.39 ps 982.87 ps 983.55 ps]
                        change: [-6.6996% -6.6568% -6.6131%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe
uniform_dist_int_i8/Canon64/high_reject                                                                             
                        time:   [940.71 ps 941.12 ps 941.56 ps]
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  8 (8.00%) high mild
  1 (1.00%) high severe
uniform_dist_int_i8/Canon64/low_reject                                                                             
                        time:   [941.74 ps 942.07 ps 942.43 ps]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  4 (4.00%) low mild
  3 (3.00%) high mild
uniform_dist_int_i8/Canon-Lemire/high_reject                                                                             
                        time:   [992.74 ps 994.00 ps 996.01 ps]
                        change: [-5.4800% -5.4102% -5.3175%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
uniform_dist_int_i8/Canon-Lemire/low_reject                                                                             
                        time:   [998.76 ps 999.10 ps 999.47 ps]
                        change: [-4.3281% -4.2604% -4.1969%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
uniform_dist_int_i8/Bitmask/high_reject                                                                             
                        time:   [1.0390 ns 1.0397 ns 1.0409 ns]
                        change: [-3.4370% -3.1355% -2.6074%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
uniform_dist_int_i8/Bitmask/low_reject                                                                             
                        time:   [835.01 ps 835.25 ps 835.54 ps]
                        change: [-0.3614% -0.2779% -0.2112%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
uniform_dist_int_i16/Old/high_reject                                                                             
                        time:   [1.1080 ns 1.1084 ns 1.1088 ns]
                        change: [+5.1971% +5.3899% +5.6302%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
uniform_dist_int_i16/Old/low_reject                                                                             
                        time:   [1.0954 ns 1.0957 ns 1.0960 ns]
                        change: [+2.0607% +2.2329% +2.3994%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
  4 (4.00%) low severe
  7 (7.00%) high mild
  5 (5.00%) high severe
uniform_dist_int_i16/Lemire/high_reject                                                                             
                        time:   [1.0487 ns 1.0490 ns 1.0492 ns]
                        change: [+0.3726% +0.6616% +0.9021%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
uniform_dist_int_i16/Lemire/low_reject                                                                             
                        time:   [1.0490 ns 1.0494 ns 1.0498 ns]
                        change: [-0.4838% -0.4226% -0.3644%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
uniform_dist_int_i16/Canon/high_reject                                                                             
                        time:   [998.31 ps 998.53 ps 998.78 ps]
                        change: [+0.0661% +0.1725% +0.2543%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
uniform_dist_int_i16/Canon/low_reject                                                                             
                        time:   [991.89 ps 992.23 ps 992.65 ps]
                        change: [-0.2821% -0.2012% -0.1351%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
uniform_dist_int_i16/Canon64/high_reject                                                                             
                        time:   [938.18 ps 938.39 ps 938.62 ps]
Found 10 outliers among 100 measurements (10.00%)
  8 (8.00%) high mild
  2 (2.00%) high severe
uniform_dist_int_i16/Canon64/low_reject                                                                             
                        time:   [937.62 ps 937.98 ps 938.37 ps]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
uniform_dist_int_i16/Canon-Lemire/high_reject                                                                             
                        time:   [983.96 ps 985.47 ps 987.44 ps]
                        change: [-5.7130% -5.5611% -5.3530%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
uniform_dist_int_i16/Canon-Lemire/low_reject                                                                             
                        time:   [999.00 ps 999.33 ps 999.73 ps]
                        change: [-5.0901% -5.0338% -4.9807%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
uniform_dist_int_i16/Bitmask/high_reject                                                                             
                        time:   [864.94 ps 865.76 ps 866.93 ps]
                        change: [+0.4297% +0.7857% +1.2749%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe
uniform_dist_int_i16/Bitmask/low_reject                                                                             
                        time:   [845.80 ps 851.63 ps 862.39 ps]
                        change: [+1.6534% +2.0227% +2.5567%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

@TheIronBorn

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

@TheIronBorn

This is from @TheIronBorn (see #1172)

This beats other uniform_int_i128 results

dhardy · 2022-03-02T11:00:51Z

So, the first round of analysis (non-SIMD integer types only) is complete.

Read the report here
Raw results (distr_random and single_random benches are most useful)

There may be a second round after a few tweaks to these algorithms. However, most important for now is to answer a few questions regarding bias and use of multiple different algorithms (see Final Thoughts).

TheIronBorn · 2022-03-02T19:57:52Z

This is a very comprehensive report, thank you

dhardy · 2022-12-06T16:35:41Z

I didn't finish this (changing the uniform sampler) yet because from the report it's not really clear which sampler to use.

I have also run benchmarks on my laptop CPU (Intel 11th gen) to compare. I apologise if it's not very clear which commit to use to run these benches yourself; I think it's probably 9d0c88a (i.e. this branch before the two temp commits).

It would also be nice if we had some method of determining at compile time the type of RNG used, thus allowing algorithm selection based on the RNG. Unfortunately this isn't really feasible with Rust's current feature set (probably we need specialization and/or generic_const_exprs).

TheIronBorn · 2022-12-06T20:59:09Z

Is it worth picking some good-enough algorithm which solves the #1145 issue?

The over-engineered solution is add the algorithm selection to the RngCore trait and allow RNGs to specify which to use. That's rather nasty though.

dhardy · 2023-01-14T16:38:12Z

I uploaded the Intel 1145 benchmarks here.

Benches should be run from this branch. If you would like to benchmark on a different CPU:

run: cargo bench --bench uniform
copy the results from target/criterion to a new folder in the root and add to git
make a PR against that branch

dhardy · 2023-02-01T12:49:46Z

Notes on review of distr_random_* and single_random_* benches across both CPUs:

distr_random_i8:
Decent: all. Differences on Pcg64 and SmallRng are barely significant.
Best is probably Canon32. Lemire and Old are decent on Intel but less reliably.

distr_random_i16:
Decent: all. Again, differences are only really significant on two RNGs.
Best: Canon32. Good: Lemire, Old, maybe Canon-reduced.

distr_random_i32:
Decent (low var): Canon, Canon-Lemire, Unbiased, Canon-reduced. In one case (AMD ChaCha8) Canon-reduced has a strong win, otherwise differences are not significant.
High var: Canon32, Lemire, Old. Averages are similar, variance is much MUCH higher. Canon32 is worst.

distr_random_i64:
All: high variance
Poor: Canon-Unbiased
Best: Canon-Lemire.

distr_random_i128:
All: high variance
Best on average, but with long tails: Lemire, Old. Also decent: Canon-reduced, Canon-Lemire.
Worst: Canon-Unbiased

single_random_i8:
Good: all. Best: Canon32. Also great in some tests: Canon-reduced, ONeill.
Worst: Canon-Lemire, Unbiased.

single_random_i16:
Good: all. Best: Canon32. Also great in some tests: Canon-reduced, ONeill.
Worst: Canon-Lemire, Unbiased.

single_random_i32:
Best: Canon-reduced. Next best: Canon. Also decent, Unbiased, Canon-Lemire.
Terrible: everything else. Worst: Old.

single_random_i64:
All: high variance
Worst: Old, Unbiased. Poor: ONeill, Canon-Lemire (sometimes).

single_random_i128:
All high variance.
Best: Canon-reduced. Decent: Canon, Canon32.
Bad: Old, Canon-Lemire. Poor: ONeill.

Best for i8: Canon32. Decent: Canon-reduced. Both have max bias 1-in-2^56.
Best for i16: Canon32. Decent: Canon-reduced. Both have max bias 1-in-2^48.
Best for i32: Canon-reduced. But bias is 1-in-2^32. Next best: Canon.
Best for i64: Unclear.
Best for i128: Canon-reduced.

These are all variations of the Canon algorithm (usually with increased max bias):

Canon: initially use max(64, size) bits, optionally another max(64, size) bits
Canon32: initially use max(32, size) bits, optionally another max(32, size) bits
Canon-reduced, i8-i32: use 64 bits once only [should be named "Biased64"]
Canon-reduced, i64: use 64 bits initially, optionally another 32 bits
Canon-reduced, i128: use 128 bits initially, optionally another 128 bits

Additional variants may be worth exploring:

Use 64-bit initial sample, optionally another 32-bits (i.e. Canon-reduced applied properly to smaller sizes)
Use max(32, size) bits initial sample with up to two 32-bit reduction steps

dhardy · 2023-02-01T17:34:02Z

Note: rand/src/seq/mod.rs has a little function to force sampling at a lower bit-width where possible since (a) this improves platform independence of results (for usize) and (b) it is noted to be faster.

fn gen_index<R: Rng + ?Sized>(rng: &mut R, ubound: usize) -> usize {
    if ubound <= (core::u32::MAX as usize) {
        rng.gen_range(0..ubound as u32) as usize
    } else {
        rng.gen_range(0..ubound)
    }
}

Maybe uniform samplers for 64 and 128-bit sizes should test the range and potentially use the algorithm for a smaller size?

There are a few issues however: (1) RNG bits used varies by inputs (though this is already non-constant with most algorithms), (2) more complex code with more ifs, (3) may not integrate well with distribution samplers.

dhardy · 2023-02-02T11:44:38Z

I added a new algorithm:

Canon32-2: max(32, size) bit sample, optionally followed by one or two 32-bit bias reduction steps

New results:

i8, i16: Canon32-2 is slightly slower than Canon32 (but with less bias). Mostly well ahead of Canon (same bias), but loses in some tests. Therefore Canon32-2 may be the best choice here if we want less bias than 1-in-2^48.

i32: high variance, slower than most, easily beaten by Unbiased. These benches give all 64-bit samplers very low variance and all 32-bit samplers very high variance. At a guess, this is because 64-bit samplers (almost) never need a bias-reduction step therefore the branch predictor is highly effective? Best option remains Canon.

i64: Canon32-2 is 10-20% behind Canon32, sometimes ahead of Unbiased but behind Canon and Canon-Lemire. Looks like Canon-Lemire is the best option here (so the extra cost of the improved bias check is worth it here, but not for i32 output with 64-bit RNG output).

i128: Canon32-2 is well behind Canon-reduced which remains the best choice.

dhardy · 2023-02-02T12:20:04Z

I previously wrote (in the report):

It would be nice if we could select an algorithm at compile-time based on whether we are using a 32-bit or 64-bit or block RNG.

Looking again at the results: ~~Pcg64~~ Canon beats ~~Pcg32~~ Canon32 when using algorithms sampling 64-bit values and ~~Pcg32~~ Canon32 wins when sampling 32-bit values, but the differences are usually not large enough to influence the choice of algorithm. Exception: for i8/i16 output, Canon-reduced beats Canon32 on 64-bit RNGs while Canon32 wins on 32-bit RNGs (including ChaCha, which consumes 32-bit words from a block).

Bias: according to my benchmarks, one uniform distribution sample takes (very vaguely) 100ns, allowing about 3e14 samples per CPU-core per year. Bias of 1-in-2^48 samples implies approximately one biased sample per CPU-year. This only affects i8 / u8 output, which I think is acceptable, especially considering this type is unlikely to be used where this type of bias really matters. Possibly we should also have a low/no bias feature flag, but that can be a different issue (i.e. for later/never).

(Alternatively, Canon32-2 reduces the bias with 8-16-bit output at little cost.)

Therefore I propose we use the algorithms mentioned above: Canon32 for 8-16 bit output, Canon for 32-bit, Canon-Lemire for 64-bit and Canon-reduced for 128-bit. Yes, having so many variants is a little weird, but the differences are mostly small, and not everything reduces well to single a generic implementation anyway.

Regarding SIMD types, I propose to leave these for later.

vks · 2023-02-02T16:42:35Z

Therefore I propose we use the algorithms mentioned above: Canon32 for 8-16 bit output, Canon for 32-bit, Canon-Lemire for 64-bit and Canon-reduced for 128-bit. Yes, having so many variants is a little weird, but the differences are mostly small, and not everything reduces well to single a generic implementation anyway.

How would this affect code size compared to the status quo?

dhardy · 2023-02-02T16:54:58Z

LOC? Probably a bit worse. I may be able to make impls partially share via macros. Conceptually the differences between the algorithms are small.

Compiled? Generic code is monomorphised at compile time anyway.

dhardy · 2023-02-06T11:40:37Z

Note: this branch is behind canon-uniform-benches (which also includes benchmark results). Neither is appropriate for merging.

My analysis above misses something: the Canon-Lemire optimisation is essentially an extra cost up-front (% op) for less rejection from the initial sample. With distributions, this cost is only paid once, therefore probably always worthwhile (though it does use an extra field in the struct). With single-sampling, the benches say it is only useful for 64-bit sampling. Further, this should be applicable to all variants of the Canon method; current benchmarks only include it on the base method.

dhardy · 2023-02-06T16:50:36Z

Originally I assumed the Canon-Lemire variant was a mistake, but it was mentioned on Twitter.

The Canon method is based off multiplying a theoretically infinite-precision random value in the range [0, 1) by the range, taking the integer part (i.e. rounding down). In practice, a finite number of random bits are sampled, and, if the fractional part of the result is close enough to 1 that further bits could influence the last integer digit, then more digits must be sampled to avoid bias.

The Lemire method is effectively the familiar sample % range method, rejecting samples which could cause bias.

It just so happens that the code for the two, up to checking the first threshold, is identical, but with Lemire's method using a tighter threshold. Presumably this is where the idea for Canon-Lemire came from.

Unfortunately, it doesn't work. Lemire rejects some samples to avoid bias; Canon's method does not reject any and assigns exactly 1/range of the theoretical infinite-precision random number to each output. This example demonstrates a whole 2^-64 of that range which Lemire's method assigns to 0 and Canon's method assigns to 1; therefore the Canon-Lemire method is biased.

(A waste of time — shows why I should never have accepted a hand-wavey argument without proof in the first place!)

vks · 2023-02-06T18:15:46Z

Sorry you found out about the bias so late, that's really frustating. Seems like Canon-Lemire was not sufficiently tested?

TheIronBorn · 2023-02-06T21:42:28Z

Ah sorry, I should have been more rigorous before including it

dhardy · 2023-02-07T09:48:13Z

was not sufficiently tested?

Testing for bias is only really practical with small samplers, and all of these samplers are at least 32-bit. Yes I could build an artificially smaller version, but I didn't get around to it.

dhardy · 2023-02-07T16:15:16Z

Some more benchmark results, reduced to five algorithms at each size, chosen from:

Lemire: widening multiply with rejection test
ONeill: O'Neill's proposed method: https://www.pcg-random.org/posts/bounded-rands.html
Canon: widening multiply using max(64, ty-bits) sample, followed by one bias-reduction step at the same sample size where required
Canon32: as above, but using max(32, ty-bits) sample sizes
Canon32-2: as above, but with up to two bias-reduction steps
Canon-Un[biased]: as Canon, but with an unlimited number of bias-reduction steps
Canon-Red[uced]: Canon's method with max(64, ty-bits) initial sample size and half with for the bias reduction step steps
Canon-Red-Un: as Canon-Red, but with an unlimited number of bias reduction steps

Canon32-unbiased and Canon-reduced-unbiased are new to this round of testing.

Conclusions

Best for i8, i16: Canon32, or if less bias is wanted, Canon32-2 or Canon32-Un.

Best for i32: Canon or Canon-Un (Biased64 is fastest, but too biased).

Best for i64: Canon, Lemire, ONeill or Canon-Un.

Best for i128: Probably Lemire (distributions only). Otherwise, hard to choose.

Thoughts? I can't think of any more variants; we should probably pick from the above.

We should consider bias: Canon32 samples up to 64 bits and Canon32-2 up to 96, Canon up to 128 (double for i128 output); subtract the output size to get worst-case bias. Probably any of these (except Biased64 for i32) is fine.

dhardy added B-value Breakage: changes output values X-experimental Type: experimental change labels Oct 20, 2021

dhardy changed the title ~~Investigate Canon's uniform-integer sampling method~~ WIP: Investigate Canon's uniform-integer sampling method Oct 20, 2021

vks mentioned this pull request Feb 22, 2022

switch sample_single int to O'neill's modulo method & update gen_rang… #1154

Closed

Apply rustfmt to rand::distributions::uniform module

8bc957c

dhardy force-pushed the work4 branch from 7e56891 to 82971a3 Compare February 24, 2022 10:33

dhardy added 16 commits February 24, 2022 15:30

Move UniformChar, UniformDuration to new submodule

8449015

Move UniformFloat to a new submodule

255ff71

Impl ONeill, Canon, Canon-Lemire and Bitmask methods for integer types

00869d7

This is based on @TheIronBorn's work (#1154, #1172), with some changes.

Add Criterion benchmark for uniform int distributions

8753100

Add uniform SIMD impls for new algs

aada17d

This is from @TheIronBorn (see #1172)

Add uniform benches for SIMD types

1cfba72

Uniform: dedup

7cfa6f9

Move UniformInt to a new submodule

4518f4d

Add algorithm impls for repeated sampling

1a56b3e

Extra SIMD i8 benches

faa141c

Add UniformInt<i128>::sample_canon_64 (15% faster)

b7f9c49

Additional Canon64 variants

5326db3

Add sample_single_inclusive_canon_64

a97199b

This beats other uniform_int_i128 results

dedup SIMD sample_single_inclusive_bitmask

9d0c88a

temp

b912a09

temp

a9d9a88

dhardy force-pushed the work4 branch from 82971a3 to a9d9a88 Compare February 24, 2022 15:34

dhardy mentioned this pull request Apr 18, 2022

SIMD distributions aren't well documented #1227

Open

dhardy mentioned this pull request Jan 5, 2023

Performance improvements for shuffle and partial_shuffle #1272

Merged

dhardy mentioned this pull request Feb 3, 2023

Rng::gen_range() is slowest with power-of-2 input sizes? #1145

Closed

This was referenced Feb 16, 2023

Canon uniform benches #1286

Closed

Compromising accuracy and speed [distributions] #494

Open

Uniform sampling: use Canon's method #1287

Merged

dhardy closed this Mar 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Investigate Canon's uniform-integer sampling method #1196

WIP: Investigate Canon's uniform-integer sampling method #1196

dhardy commented Oct 20, 2021 •

edited

dhardy commented Oct 20, 2021 •

edited

dhardy commented Oct 22, 2021

dhardy commented Mar 2, 2022

TheIronBorn commented Mar 2, 2022

dhardy commented Dec 6, 2022 •

edited

TheIronBorn commented Dec 6, 2022

dhardy commented Jan 14, 2023

dhardy commented Feb 1, 2023

dhardy commented Feb 1, 2023

dhardy commented Feb 2, 2023

dhardy commented Feb 2, 2023 •

edited

vks commented Feb 2, 2023

dhardy commented Feb 2, 2023 •

edited

dhardy commented Feb 6, 2023

dhardy commented Feb 6, 2023

vks commented Feb 6, 2023

TheIronBorn commented Feb 6, 2023

dhardy commented Feb 7, 2023

dhardy commented Feb 7, 2023

WIP: Investigate Canon's uniform-integer sampling method #1196

WIP: Investigate Canon's uniform-integer sampling method #1196

Conversation

dhardy commented Oct 20, 2021 • edited

Research questions

dhardy commented Oct 20, 2021 • edited

dhardy commented Oct 22, 2021

dhardy commented Mar 2, 2022

TheIronBorn commented Mar 2, 2022

dhardy commented Dec 6, 2022 • edited

TheIronBorn commented Dec 6, 2022

dhardy commented Jan 14, 2023

dhardy commented Feb 1, 2023

dhardy commented Feb 1, 2023

dhardy commented Feb 2, 2023

dhardy commented Feb 2, 2023 • edited

vks commented Feb 2, 2023

dhardy commented Feb 2, 2023 • edited

dhardy commented Feb 6, 2023

dhardy commented Feb 6, 2023

vks commented Feb 6, 2023

TheIronBorn commented Feb 6, 2023

dhardy commented Feb 7, 2023

dhardy commented Feb 7, 2023

Conclusions

dhardy commented Oct 20, 2021 •

edited

dhardy commented Oct 20, 2021 •

edited

dhardy commented Dec 6, 2022 •

edited

dhardy commented Feb 2, 2023 •

edited

dhardy commented Feb 2, 2023 •

edited