Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binomial: Faster sampling for n * p < 10 #735

Merged
merged 3 commits into from Feb 27, 2019

Conversation

vks
Copy link
Collaborator

@vks vks commented Feb 19, 2019

Fixes #734.

@dhardy
Copy link
Member

dhardy commented Feb 20, 2019

Performance looks decent (< 1ms for all n tested):

bench-binomial$ cargo run  
    Finished dev [unoptimized + debuginfo] target(s) in 0.35s
     Running `target/debug/bench-binomial`
1.00e0  Rand: 1000000 in 0.000s Statrs: 1000000 in 0.388s       GSL: 1000000 in 0.002s
1.00e-1 Rand: 99845 in 0.000s   Statrs: 100194 in 0.388s        GSL: 99915 in 0.000s
1.00e-2 Rand: 9889 in 0.000s    Statrs: 10081 in 0.404s GSL: 10129 in 0.000s
1.00e-3 Rand: 1006 in 0.000s    Statrs: 950 in 0.402s   GSL: 1024 in 0.000s
1.00e-4 Rand: 109 in 0.000s     Statrs: 96 in 0.423s    GSL: 108 in 0.000s
1.00e-5 Rand: 14 in 0.000s      Statrs: 10 in 0.401s    GSL: 9 in 0.000s
1.00e-6 Rand: 1 in 0.000s       Statrs: 1 in 0.416s     GSL: 0 in 0.000s
1.00e-7 Rand: 0 in 0.000s       Statrs: 1 in 0.393s     GSL: 1 in 0.000s
1.00e-8 Rand: 0 in 0.000s       Statrs: 0 in 0.409s     GSL: 0 in 0.000s
1.00e-9 Rand: 0 in 0.000s       Statrs: 0 in 0.386s     GSL: 0 in 0.000s
...
1.00e-31        Rand: 0 in 0.000s       Statrs: 0 in 0.385s     GSL: 0 in 0.000s
1.00e-32        Rand: 0 in 0.000s       Statrs: 0 in 0.384s     GSL: 0 in 0.000s
1.00e-33        Rand: 0 in 0.000s       Statrs: 0 in 0.384s     GSL: 0 in 0.000s
1.00e-34        Rand: 0 in 0.000s       Statrs: 0 in 0.386s     GSL: 0 in 0.000s
1.00e-35        Rand: 0 in 0.000s       Statrs: 0 in 0.384s     GSL: 0 in 0.000s
1.00e-36        Rand: 0 in 0.000s       Statrs: 0 in 0.387s     GSL: 0 in 0.000s
...
1.00e-296       Rand: 0 in 0.000s       Statrs: 0 in 0.400s     GSL: 0 in 0.000s
1.00e-297       Rand: 0 in 0.000s       Statrs: 0 in 0.401s     GSL: 0 in 0.000s
1.00e-298       Rand: 0 in 0.000s       Statrs: 0 in 0.414s     GSL: 0 in 0.000s
1.00e-299       Rand: 0 in 0.000s       Statrs: 0 in 0.416s     GSL: 0 in 0.000s
1.00e-300       Rand: 0 in 0.000s       Statrs: 0 in 0.403s     GSL: 0 in 0.000s
1.00e-301       Rand: 0 in 0.000s       Statrs: 0 in 0.400s     GSL: 0 in 0.000s
1.00e-302       Rand: 0 in 0.000s       Statrs: 0 in 0.408s     GSL: 0 in 0.000s
1.00e-303       Rand: 0 in 0.000s       Statrs: 0 in 0.416s     GSL: 0 in 0.000s
1.00e-304       Rand: 0 in 0.000s       Statrs: 0 in 0.406s     GSL: 0 in 0.000s
1.00e-305       Rand: 0 in 0.000s       Statrs: 0 in 0.406s     GSL: 0 in 0.000s
1.00e-306       Rand: 0 in 0.000s       Statrs: 0 in 0.398s     GSL: 0 in 0.000s
1.00e-307       Rand: 0 in 0.000s       Statrs: 0 in 0.401s     GSL: 0 in 0.000s
1.00e-308       Rand: 0 in 0.000s       Statrs: 0 in 0.401s     GSL: 0 in 0.000s
1.00e-309       Rand: 0 in 0.000s       Statrs: 0 in 0.400s     GSL: 0 in 0.000s

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clear improvement over the existing approach regarding run-time. There may be accuracy concerns for large n with small p; ideally we should check or at least bound n for now.

As noted, it would be nice to look into other approaches also, but that can be a future PR.

We could also consider moving some of the set-up to new().

Summary: sampling now uses the inverse transform method for small expected result, otherwise it is the same as before.

let a = ((self.n + 1) as f64) * s;
let mut r = q.powf(self.n as f64);
// FIXME: Using integer exponentiation should be faster, but it is not available for
// `u64` exponents, neither in `std` or `num`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could trap with an if: most uses will have small enough n. GSL only supports 32-bit n anyway. But I don't know if it matters enough to bother and to what n it's faster anyway?

The article notes issues with underflow and round-off for large n, so it may in any case make sense not to use this method for large n.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a nice fix for the integer exponentiation issue. However, it requires some more thought to determine an appropriate cutoff. Maybe looking at other implementations helps with that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like GSL does not have an n cutoff for the BINV algorithm. However, they are using a cutoff for the BINV loop. If it is exceeded, they sample a new random value and start over.

Copy link
Collaborator Author

@vks vks Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether a cutoff for large n is a good idea. After all, the issue triggering this PR (#734) was about performance problems for large n and very small p.

GSL only supports 32-bit n anyway.

Maybe we should do that as well? This would be a breaking change though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geq1t would you like to comment?

I don't know (without analysis), but the paper mentioned use of 64-bit types in one impl to avoid rounding errors, implying the internals might need higher precision than the input type.

The reason I suggested the cut-off is because it sounds like it avoids this problem, though of course it might be far more restrictive than necessary. Perhaps we should simply use this as-is for now but flag it as needing review for accuracy under certain conditions.

@vks
Copy link
Collaborator Author

vks commented Feb 21, 2019

As noted, it would be nice to look into other approaches also, but that can be a future PR.

I "implemented" the recommended algorithm for large expected values in the paper, but it does not quite work yet. It is a bit of the pain that the algorithm is specified with gotos.

I agree that this should be another PR however.

@vks
Copy link
Collaborator Author

vks commented Feb 25, 2019

By using integer exponentiation (using the num implementation modified to work with u64 exponents), I was able to get a significant speedup:

Before:

test distr_binomial       ... bench:      88,365 ns/iter (+/- 2,215) = 90 MB/s
test distr_binomial_small ... bench:      33,775 ns/iter (+/- 5,494) = 236 MB/s
test misc_binomial_1      ... bench:          13 ns/iter (+/- 2)
test misc_binomial_10     ... bench:          72 ns/iter (+/- 1)
test misc_binomial_100    ... bench:          71 ns/iter (+/- 11)
test misc_binomial_1000   ... bench:         624 ns/iter (+/- 19)
test misc_binomial_1e12   ... bench:         681 ns/iter (+/- 18)

After:

test distr_binomial       ... bench:      44,354 ns/iter (+/- 3,518) = 180 MB/s
test distr_binomial_small ... bench:      22,736 ns/iter (+/- 514) = 351 MB/s
test misc_binomial_1      ... bench:          10 ns/iter (+/- 0)
test misc_binomial_10     ... bench:          23 ns/iter (+/- 0)
test misc_binomial_100    ... bench:          27 ns/iter (+/- 4)
test misc_binomial_1000   ... bench:         621 ns/iter (+/- 15)
test misc_binomial_1e12   ... bench:         686 ns/iter (+/- 20)

Is this worth it to ship our own powi implementation? It's only 14 lines of code.

Copy link

@geq1t geq1t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a look, and this patch certainly fixes the problem I was having. It also matches the GSL code, apart from GSL having a limit on the number of iterations. (I was so far unable to get a hold of the original paper as it's behind a paywall, so I can't compare with that).

With regards to the exponentiation issue: note that the original cast to f64 also possibly loses a few bits of precision (though at the point that it makes a difference, I'm not sure it really matters anymore). My personal preference would be to roll our own integer exponentiation for u64 exponents, but since a) I won't be the one doing the work, and b) I'm not strong enough in numerical analysis to make any predicitions on what is better, it's better if I don't make any recommendations on that. For my current purposes, using an f64 exponent is fast enough, and 32 bits is large enough.

Thank you both for your hard work on this issue!

Our previous implementation was very slow (rust-random#734). The benchmarks makes
sure we won't have such regressions in the future.
This increase performance significantly, see below. However, it requires
shipping our own integer exponentiation implementation, because `u64`
exponents are not supported by `std` or `num`.

Before:
```
test distr_binomial       ... bench:  88,365 ns/iter (+/- 2,215) =90 MB/s
test distr_binomial_small ... bench:  33,775 ns/iter (+/- 5,494) =236 MB/s
test misc_binomial_1      ... bench:      13 ns/iter (+/- 2)
test misc_binomial_10     ... bench:      72 ns/iter (+/- 1)
test misc_binomial_100    ... bench:      71 ns/iter (+/- 11)
test misc_binomial_1000   ... bench:     624 ns/iter (+/- 19)
test misc_binomial_1e12   ... bench:     681 ns/iter (+/- 18)
```

After:
```
test distr_binomial       ... bench:  44,354 ns/iter (+/- 3,518) = 180 MB/s
test distr_binomial_small ... bench:  22,736 ns/iter (+/- 514) = 351 MB/s
test misc_binomial_1      ... bench:      10 ns/iter (+/- 0)
test misc_binomial_10     ... bench:      23 ns/iter (+/- 0)
test misc_binomial_100    ... bench:      27 ns/iter (+/- 4)
test misc_binomial_1000   ... bench:     621 ns/iter (+/- 15)
test misc_binomial_1e12   ... bench:     686 ns/iter (+/- 20)
```
@dhardy dhardy merged commit a9f6b8f into rust-random:master Feb 27, 2019
@dhardy
Copy link
Member

dhardy commented Feb 27, 2019

Hope I didn't merge this too soon; this test appears to have hung: https://travis-ci.org/rust-random/rand/jobs/498834909

@vks vks deleted the faster-binomial branch February 28, 2019 11:29
@vks
Copy link
Collaborator Author

vks commented Feb 28, 2019

Weird that this failure only happens on MIPS!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants