Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement various choose() methods #490

Closed
wants to merge 15 commits into from

Conversation

sicking
Copy link
Contributor

@sicking sicking commented Jun 3, 2018

On top of the existing Rng.choose() and Rng.choose_mut(), this implements Rng.choose_from_iterator(), Rng.choose_weighted(), Rng.choose_weighted_with_total(), SliceRandom.choose(), SliceRandom.choose_mut(), and IteratorRandom.choose().

Regarding the vec.shuffle(rng) vs. rng.shuffle(vec) issue, my strategy for where to stick functions was basically:

  • Stick all functions on the Rng trait. Since this helps with documentation and discoverability.
  • For functions that can reasonably be used in chains like get_stuff().do_random_op(&mut rng).do_other_stuff(), I've also added functions on SliceRandom and IteratorRandom which forward to the implementation on the Rng trait.

This seems like a reasonable trade-off to me. But happy to discuss more.

Also, this patch relies on that issue #476 will be fixed. Otherwise using floating point weights will on very rare occasions panic.

@sicking
Copy link
Contributor Author

sicking commented Jun 3, 2018

Looks like wasm builds don't like using core::panic::catch_unwind. Opinions on if I should disable the test that uses that in wasm builds? And if so, how? Or create 11 separate #[should_panic] tests?

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there is quite a lot of overlap with #483 in fact. I see you decided to keep all the methods on Rng. I think my version is cleaner, but I don't know if it comes at a cost to usability.

I haven't looked into the "choose weighted" stuff much recently; I think @pitdicker did more. I wonder, is it ever worth iterating over weights as in your API instead of simply collecting the weights into a vector (possibly cumulatively)?

src/lib.rs Outdated
/// This function iterates over `weights` twice. Once to get the total
/// weight, and once while choosing the random value. If you know the total
/// weight, or plan to call this function multiple times, you should
/// consider using [`choose_weighted_with_total`] instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc is copied from other function?

@sicking
Copy link
Contributor Author

sicking commented Jun 3, 2018

The main reason that I had in mind for using an Iterator, rather than a Vec, for the weights is that it doesn't put any constraints on the storage format. For example:

let my_items = vec![("foo", 10), ("bar", 3), ("baz", 1)];
let weights = my_items.iter().map(|x| x.1);
println!("{:?}", rng.choose_weighted(my_items.iter(), weights));

Or

let my_items = get_vec_of_structs();
let weights = my_items.iter().map(|x| x.get_weight());
println!("{:?}", rng.choose_weighted(my_items.iter(), weights));

An alternative way to accomplish this goal would be to do something like:

    fn choose_weighted<Iter, F, Weight>(&mut self,
                                        items: Iter,
                                        f: F) -> Option<Iter::Item>
        where Iter: Iterator + Clone,
              F: FnMut(&Iter::Item) -> Weight,
              Weight: SampleUniform +
                      Default +
                      core::ops::Add<IterWeights::Item, Output=IterWeights::Item> +
                      core::cmp::PartialOrd<IterWeights::Item> +
                      Clone

This would work as well, but means that if the weights and items are stored in different vectors, then the caller will have to .zip() the lists together and will get back a tuple containing the chosen item, rather than the item itself.

@dhardy dhardy added the V-0.6 label Jun 7, 2018
src/seq.rs Outdated
}
}

/// XXX Any way to avoid '+ Sized' here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in an API that consumes self (i.e. fn choose(self, ...)). It's possible to use where Self: Sized on each method instead, but in practice I don't think that gains anything (unless we want to add methods taking &self, e.g. like size_hint does).

@dhardy dhardy added F-new-int Functionality: new, within Rand T-sequences Topic: sequences and removed V-0.6 labels Jun 8, 2018
@sicking
Copy link
Contributor Author

sicking commented Jun 12, 2018

I've rebased this on top of #483 and moved the choose_weighted methods off of Rng and instead live as functions inside the seq module.

This leaves it with a signature like this:

pub fn choose_weighted<R, IterItems, IterWeights>(rng: &mut R,
                                                  items: IterItems,
                                                  weights: IterWeights) -> Option<IterItems::Item>
    where R: Rng + ?Sized,
          IterItems: Iterator,
          IterWeights: Iterator + Clone,
          IterWeights::Item: SampleUniform + ...

Same thing for choose_weighted_with_total, just with an extra total_weight argument.

One alternative approach would be only let this function deal with the weights iterator. And instead of returning an IterItems::Item, return the index of the selected weight. Then it's up to the caller of using that index to grab an item from a slice (vec[index]) or from an iterator (iter.nth(index)).

This would leave us with a singature like:

pub trait IteratorRandom: Iterator + Sized {
    ...

    fn choose_index_weighted<R>(mut self,
                                rng: &mut R) -> Option<usize>
        where R: Rng + ?Sized,
              Self: Clone,
              Self::Item: SampleUniform + ...
    {
        ...

There's a few of upsides to this.

  • We don't have to have a "global" function in the seq module, but can instead stick this function together with the other choose, choose_multiple etc.
  • The signature is obviously simpler.
  • It's a lower-level primitive which can be useful if the caller want to remember the index for longer than they can hold on to a returned reference.
  • Once we implement a replacement for WeightedChoice, i.e. a Distribution which can return multiple items from a weighted distribution, the Distribution object will work well borrow checker. It'll own all the calculated cumulative weights and return usizes. It will not need to hold references to external items.

But also a few downsides:

  • This doesn't provide as much functionality. I.e. we have SliceRandom::choose even though we have Rng::gen_range exists. Though if really needed we could easily add a function with the old signature which wraps the new one. That could easily be done later.
  • The returned type is different from other choose methods since it returns an index rather than an item.
  • The caller has to potentially deal with unwrapping Options twice. Once from this function and once from Iterator::nth.

Ultimately I'm leaning towards this new way being a better solution though.

Some code differences:
Old:

// tuples or structs
let items = [('a', 2), ('b', 1), ('c', 1)];
let weights = items.iter().map(|&(_, w)| w);
let item = choose_weighted(&mut rng, items.iter(), weights).unwrap();

// separate arrays
let items = ['a', 'b', 'c'];
let weights = [2,   1,   1];
let item = choose_weighted(&mut rng, items.iter(), weights.iter()).unwrap();

// iterators
let items = get_items();
let weights = get_item_weights();
let item = choose_weighted(&mut rng, items, weights).unwrap();

New:

// tuples or structs
let items = [('a', 2), ('b', 1), ('c', 1)];
let item = items.iter().map(|&(_, w)| w)
    .choose_index_weighted(&mut rng)
    .map(|pos| items[pos]).unwrap();

// separate arrays
let items = ['a', 'b', 'c'];
let weights = [2,   1,   1];
let item = weights.iter()
    .choose_index_weighted(&mut rng)
    .map(|pos| items[pos]).unwrap();

// iterators
let items = get_items();
let weights = get_item_weights();
let item = weights
    .choose_index_weighted(&mut rng)
    .and_then(|pos| items.nth(pos)).unwrap();

@dhardy
Copy link
Member

dhardy commented Jun 13, 2018

A choose_index_weighted function seems like a good idea. How about also having a function like this?

fn choose_weighted<W, WF>(&self, weights: WF) -> T
  where WF: Fn(&T, usize) -> W,
  W: /* weight type requirements */

This may be usable like so:

let items = [('a', 2), ('b', 1), ('c', 1)];
let item = items.choose_weighted(|x, _| x.1).unwrap().0;

// separate arrays
let items = ['a', 'b', 'c'];
let weights = [2,   1,   1];
let item = items.choose_weighted(|_, i| weights[i]).unwrap();

I'm not sure if providing both an index and reference to the item in this function is overkill? I'm also not sure how well it optimises.

@sicking
Copy link
Contributor Author

sicking commented Jun 15, 2018

I'll create separate PRs for the tests and for choose_weighted. The rest of these changes are covered by #483

@sicking sicking closed this Jun 15, 2018
@sicking sicking deleted the pick_and_weighted branch June 21, 2018 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-new-int Functionality: new, within Rand T-sequences Topic: sequences
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants