Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an implementation of alias method for weighted indices #692

Merged
merged 10 commits into from Mar 27, 2019
5 changes: 5 additions & 0 deletions benches/distributions.rs
Expand Up @@ -187,6 +187,11 @@ distr_int!(distr_weighted_u32, usize, WeightedIndex::new(&[1u32, 2, 3, 4, 12, 0,
distr_int!(distr_weighted_f64, usize, WeightedIndex::new(&[1.0f64, 0.001, 1.0/3.0, 4.01, 0.0, 3.3, 22.0, 0.001]).unwrap());
distr_int!(distr_weighted_large_set, usize, WeightedIndex::new((0..10000).rev().chain(1..10001)).unwrap());

distr_int!(distr_weighted_alias_method_i8, usize, AliasMethodWeightedIndex::new(vec![1i8, 2, 3, 4, 12, 0, 2, 1]).unwrap());
distr_int!(distr_weighted_alias_method_u32, usize, AliasMethodWeightedIndex::new(vec![1u32, 2, 3, 4, 12, 0, 2, 1]).unwrap());
distr_int!(distr_weighted_alias_method_f64, usize, AliasMethodWeightedIndex::new(vec![1.0f64, 0.001, 1.0/3.0, 4.01, 0.0, 3.3, 22.0, 0.001]).unwrap());
distr_int!(distr_weighted_alias_method_large_set, usize, AliasMethodWeightedIndex::new((0..10000).rev().chain(1..10001).collect()).unwrap());

// construct and sample from a range
macro_rules! gen_range_int {
($fnn:ident, $ty:ident, $low:expr, $high:expr) => {
Expand Down
6 changes: 5 additions & 1 deletion src/distributions/mod.rs
Expand Up @@ -181,7 +181,11 @@ pub use self::other::Alphanumeric;
#[doc(inline)] pub use self::uniform::Uniform;
pub use self::float::{OpenClosed01, Open01};
pub use self::bernoulli::Bernoulli;
#[cfg(feature="alloc")] pub use self::weighted::{WeightedIndex, WeightedError};
#[cfg(feature = "alloc")]
pub use self::weighted::{
AliasMethodWeight, AliasMethodWeightedIndex, AliasMethodWeightedIndexError, WeightedError,
WeightedIndex,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could make the module public and name these weighted::{AliasMethodWeight, AliasMethod, Error, BinarySearch}. Thoughts?

(obviously a breaking change)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be better and more like it is done in the standard library (std::io::Error). Again, let me know if you want to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe the module must be renamed because with the proposed naming the 'index' part is lost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure we need to keep Index in the name anyway; it's the only type of weighted sampling we have. It's not the best idea to break this stuff again, but still better to get it right than leave a mess IMO.

But lets wait for @vks to comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned above, I think we might get rid of our current implementation and only have the alias method. Then there wouldn't be naming issues.

Alternatively, if we decide to keep both, I would prefer to drop the common AliasMethod prefix and instead have an alias_method module. This would make it much easier for users to switch between the two implementations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current benchmarks show the alias method to always be in the lead or only slightly behind, however if you move the set-up time into the measurement loop, then the binary-search method can be significantly faster (three times faster on the large_set bench with 1000 samples; nine times faster with 100 samples, and a little faster on the smaller sets).

Memory usage will be a little higher with the Alias method due to the extra Vec<usize>; mostly this is unimportant I think (unless memory constrained and having a large set of weights in a small type).

The Alias method has some extra requirements on the type, notably Copy. Should we use Clone instead?


I think there is room for both implementations, though the current presentation and documentation is not ideal. So what do you think about the following structure?

distributions::{
    weighted::{
        alias::{WeightedIndex, Weight},
        WeightedIndex, Error
    },
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Alias method has some extra requirements on the type, notably Copy. Should we use Clone instead?

I missed that. In that case we should probably have both, and working with Clone would be nicer.

So what do you think about the following structure?

This is what I would suggest as well.

Copy link
Member

@dhardy dhardy Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. @zroug would you make these changes please?

The module documentation should give advice something like the following:

If you will sample from the WeightedIndex distribution only a few times, then the binary search method will be fastest, however, if you require many samples (thousands) then the Alias method may be faster. Both methods have O(n) set-up, however the cost factor for the Alias method is significantly higher enabling O(1) sampling vs O(log(n)) for the binary-search method. For small n, sampling may also be faster with the binary-search method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will make these changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did these changes in 5b29341 but I used alias_method instead of alias as module name. Just alias didn't sound right to me and the word alias has a much broader meaning. Are you okay with this?

I wasn't sure if I should keep the reexports for WeightedIndex. I have kept them.

};
#[cfg(feature="std")] pub use self::unit_sphere::UnitSphereSurface;
#[cfg(feature="std")] pub use self::unit_circle::UnitCircle;
#[cfg(feature="std")] pub use self::gamma::{Gamma, ChiSquared, FisherF,
Expand Down