New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binomial: Faster sampling for n * p < 10 #735
Conversation
Performance looks decent (< 1ms for all n tested):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a clear improvement over the existing approach regarding run-time. There may be accuracy concerns for large n
with small p
; ideally we should check or at least bound n
for now.
As noted, it would be nice to look into other approaches also, but that can be a future PR.
We could also consider moving some of the set-up to new()
.
Summary: sampling now uses the inverse transform method for small expected result, otherwise it is the same as before.
src/distributions/binomial.rs
Outdated
let a = ((self.n + 1) as f64) * s; | ||
let mut r = q.powf(self.n as f64); | ||
// FIXME: Using integer exponentiation should be faster, but it is not available for | ||
// `u64` exponents, neither in `std` or `num`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could trap with an if
: most uses will have small enough n
. GSL only supports 32-bit n
anyway. But I don't know if it matters enough to bother and to what n
it's faster anyway?
The article notes issues with underflow and round-off for large n
, so it may in any case make sense not to use this method for large n
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a nice fix for the integer exponentiation issue. However, it requires some more thought to determine an appropriate cutoff. Maybe looking at other implementations helps with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like GSL does not have an n
cutoff for the BINV algorithm. However, they are using a cutoff for the BINV loop. If it is exceeded, they sample a new random value and start over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether a cutoff for large n
is a good idea. After all, the issue triggering this PR (#734) was about performance problems for large n
and very small p
.
GSL only supports 32-bit n anyway.
Maybe we should do that as well? This would be a breaking change though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@geq1t would you like to comment?
I don't know (without analysis), but the paper mentioned use of 64-bit types in one impl to avoid rounding errors, implying the internals might need higher precision than the input type.
The reason I suggested the cut-off is because it sounds like it avoids this problem, though of course it might be far more restrictive than necessary. Perhaps we should simply use this as-is for now but flag it as needing review for accuracy under certain conditions.
I "implemented" the recommended algorithm for large expected values in the paper, but it does not quite work yet. It is a bit of the pain that the algorithm is specified with gotos. I agree that this should be another PR however. |
By using integer exponentiation (using the Before:
After:
Is this worth it to ship our own |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look, and this patch certainly fixes the problem I was having. It also matches the GSL code, apart from GSL having a limit on the number of iterations. (I was so far unable to get a hold of the original paper as it's behind a paywall, so I can't compare with that).
With regards to the exponentiation issue: note that the original cast to f64 also possibly loses a few bits of precision (though at the point that it makes a difference, I'm not sure it really matters anymore). My personal preference would be to roll our own integer exponentiation for u64 exponents, but since a) I won't be the one doing the work, and b) I'm not strong enough in numerical analysis to make any predicitions on what is better, it's better if I don't make any recommendations on that. For my current purposes, using an f64 exponent is fast enough, and 32 bits is large enough.
Thank you both for your hard work on this issue!
Our previous implementation was very slow (rust-random#734). The benchmarks makes sure we won't have such regressions in the future.
This increase performance significantly, see below. However, it requires shipping our own integer exponentiation implementation, because `u64` exponents are not supported by `std` or `num`. Before: ``` test distr_binomial ... bench: 88,365 ns/iter (+/- 2,215) =90 MB/s test distr_binomial_small ... bench: 33,775 ns/iter (+/- 5,494) =236 MB/s test misc_binomial_1 ... bench: 13 ns/iter (+/- 2) test misc_binomial_10 ... bench: 72 ns/iter (+/- 1) test misc_binomial_100 ... bench: 71 ns/iter (+/- 11) test misc_binomial_1000 ... bench: 624 ns/iter (+/- 19) test misc_binomial_1e12 ... bench: 681 ns/iter (+/- 18) ``` After: ``` test distr_binomial ... bench: 44,354 ns/iter (+/- 3,518) = 180 MB/s test distr_binomial_small ... bench: 22,736 ns/iter (+/- 514) = 351 MB/s test misc_binomial_1 ... bench: 10 ns/iter (+/- 0) test misc_binomial_10 ... bench: 23 ns/iter (+/- 0) test misc_binomial_100 ... bench: 27 ns/iter (+/- 4) test misc_binomial_1000 ... bench: 621 ns/iter (+/- 15) test misc_binomial_1e12 ... bench: 686 ns/iter (+/- 20) ```
Hope I didn't merge this too soon; this test appears to have hung: https://travis-ci.org/rust-random/rand/jobs/498834909 |
Weird that this failure only happens on MIPS! |
Fixes #734.