Implement various choose() methods #490

sicking · 2018-06-03T03:14:01Z

On top of the existing Rng.choose() and Rng.choose_mut(), this implements Rng.choose_from_iterator(), Rng.choose_weighted(), Rng.choose_weighted_with_total(), SliceRandom.choose(), SliceRandom.choose_mut(), and IteratorRandom.choose().

Regarding the vec.shuffle(rng) vs. rng.shuffle(vec) issue, my strategy for where to stick functions was basically:

Stick all functions on the Rng trait. Since this helps with documentation and discoverability.
For functions that can reasonably be used in chains like get_stuff().do_random_op(&mut rng).do_other_stuff(), I've also added functions on SliceRandom and IteratorRandom which forward to the implementation on the Rng trait.

This seems like a reasonable trade-off to me. But happy to discuss more.

Also, this patch relies on that issue #476 will be fixed. Otherwise using floating point weights will on very rare occasions panic.

sicking · 2018-06-03T08:50:02Z

Looks like wasm builds don't like using core::panic::catch_unwind. Opinions on if I should disable the test that uses that in wasm builds? And if so, how? Or create 11 separate #[should_panic] tests?

dhardy

Looks like there is quite a lot of overlap with #483 in fact. I see you decided to keep all the methods on Rng. I think my version is cleaner, but I don't know if it comes at a cost to usability.

I haven't looked into the "choose weighted" stuff much recently; I think @pitdicker did more. I wonder, is it ever worth iterating over weights as in your API instead of simply collecting the weights into a vector (possibly cumulatively)?

dhardy · 2018-06-03T08:52:28Z

src/lib.rs

+    /// This function iterates over `weights` twice. Once to get the total
+    /// weight, and once while choosing the random value. If you know the total
+    /// weight, or plan to call this function multiple times, you should
+    /// consider using [`choose_weighted_with_total`] instead.


Doc is copied from other function?

sicking · 2018-06-03T10:18:23Z

The main reason that I had in mind for using an Iterator, rather than a Vec, for the weights is that it doesn't put any constraints on the storage format. For example:

let my_items = vec![("foo", 10), ("bar", 3), ("baz", 1)];
let weights = my_items.iter().map(|x| x.1);
println!("{:?}", rng.choose_weighted(my_items.iter(), weights));

Or

let my_items = get_vec_of_structs();
let weights = my_items.iter().map(|x| x.get_weight());
println!("{:?}", rng.choose_weighted(my_items.iter(), weights));

An alternative way to accomplish this goal would be to do something like:

    fn choose_weighted<Iter, F, Weight>(&mut self,
                                        items: Iter,
                                        f: F) -> Option<Iter::Item>
        where Iter: Iterator + Clone,
              F: FnMut(&Iter::Item) -> Weight,
              Weight: SampleUniform +
                      Default +
                      core::ops::Add<IterWeights::Item, Output=IterWeights::Item> +
                      core::cmp::PartialOrd<IterWeights::Item> +
                      Clone

This would work as well, but means that if the weights and items are stored in different vectors, then the caller will have to .zip() the lists together and will get back a tuple containing the chosen item, rather than the item itself.

dhardy · 2018-06-08T10:13:16Z

src/seq.rs

+    }
+}
+
+/// XXX Any way to avoid '+ Sized' here?


Not in an API that consumes self (i.e. fn choose(self, ...)). It's possible to use where Self: Sized on each method instead, but in practice I don't think that gains anything (unless we want to add methods taking &self, e.g. like size_hint does).

SliceExt → SliceRandom, IteratorExt → IteratorRandom

sicking · 2018-06-12T22:16:05Z

I've rebased this on top of #483 and moved the choose_weighted methods off of Rng and instead live as functions inside the seq module.

This leaves it with a signature like this:

pub fn choose_weighted<R, IterItems, IterWeights>(rng: &mut R,
                                                  items: IterItems,
                                                  weights: IterWeights) -> Option<IterItems::Item>
    where R: Rng + ?Sized,
          IterItems: Iterator,
          IterWeights: Iterator + Clone,
          IterWeights::Item: SampleUniform + ...

Same thing for choose_weighted_with_total, just with an extra total_weight argument.

One alternative approach would be only let this function deal with the weights iterator. And instead of returning an IterItems::Item, return the index of the selected weight. Then it's up to the caller of using that index to grab an item from a slice (vec[index]) or from an iterator (iter.nth(index)).

This would leave us with a singature like:

pub trait IteratorRandom: Iterator + Sized {
    ...

    fn choose_index_weighted<R>(mut self,
                                rng: &mut R) -> Option<usize>
        where R: Rng + ?Sized,
              Self: Clone,
              Self::Item: SampleUniform + ...
    {
        ...

There's a few of upsides to this.

We don't have to have a "global" function in the seq module, but can instead stick this function together with the other choose, choose_multiple etc.
The signature is obviously simpler.
It's a lower-level primitive which can be useful if the caller want to remember the index for longer than they can hold on to a returned reference.
Once we implement a replacement for WeightedChoice, i.e. a Distribution which can return multiple items from a weighted distribution, the Distribution object will work well borrow checker. It'll own all the calculated cumulative weights and return usizes. It will not need to hold references to external items.

But also a few downsides:

This doesn't provide as much functionality. I.e. we have SliceRandom::choose even though we have Rng::gen_range exists. Though if really needed we could easily add a function with the old signature which wraps the new one. That could easily be done later.
The returned type is different from other choose methods since it returns an index rather than an item.
The caller has to potentially deal with unwrapping Options twice. Once from this function and once from Iterator::nth.

Ultimately I'm leaning towards this new way being a better solution though.

Some code differences:
Old:

// tuples or structs
let items = [('a', 2), ('b', 1), ('c', 1)];
let weights = items.iter().map(|&(_, w)| w);
let item = choose_weighted(&mut rng, items.iter(), weights).unwrap();

// separate arrays
let items = ['a', 'b', 'c'];
let weights = [2,   1,   1];
let item = choose_weighted(&mut rng, items.iter(), weights.iter()).unwrap();

// iterators
let items = get_items();
let weights = get_item_weights();
let item = choose_weighted(&mut rng, items, weights).unwrap();

New:

// tuples or structs
let items = [('a', 2), ('b', 1), ('c', 1)];
let item = items.iter().map(|&(_, w)| w)
    .choose_index_weighted(&mut rng)
    .map(|pos| items[pos]).unwrap();

// separate arrays
let items = ['a', 'b', 'c'];
let weights = [2,   1,   1];
let item = weights.iter()
    .choose_index_weighted(&mut rng)
    .map(|pos| items[pos]).unwrap();

// iterators
let items = get_items();
let weights = get_item_weights();
let item = weights
    .choose_index_weighted(&mut rng)
    .and_then(|pos| items.nth(pos)).unwrap();

dhardy · 2018-06-13T06:51:07Z

A choose_index_weighted function seems like a good idea. How about also having a function like this?

fn choose_weighted<W, WF>(&self, weights: WF) -> T
  where WF: Fn(&T, usize) -> W,
  W: /* weight type requirements */

This may be usable like so:

let items = [('a', 2), ('b', 1), ('c', 1)];
let item = items.choose_weighted(|x, _| x.1).unwrap().0;

// separate arrays
let items = ['a', 'b', 'c'];
let weights = [2,   1,   1];
let item = items.choose_weighted(|_, i| weights[i]).unwrap();

I'm not sure if providing both an index and reference to the item in this function is overkill? I'm also not sure how well it optimises.

sicking · 2018-06-15T09:38:59Z

I'll create separate PRs for the tests and for choose_weighted. The rest of these changes are covered by #483

sicking mentioned this pull request Jun 3, 2018

Sequence functionality: API revisions #483

Merged

dhardy reviewed Jun 3, 2018

View reviewed changes

dhardy added the V-0.6 label Jun 7, 2018

dhardy reviewed Jun 8, 2018

View reviewed changes

dhardy added F-new-int Functionality: new, within Rand T-sequences Topic: sequences and removed V-0.6 labels Jun 8, 2018

dhardy added 14 commits June 12, 2018 10:44

Sequence functionality: API revisions

1c9c65a

seq: enable module without std

e88075e

Move choose, choose_mut and shuffle from Rng to SliceExt

8a07bb3

Impl new choose_multiple functions

3f5d562

Implement IteratorExt::choose

fe03aed

Implement SliceExt::partial_shuffle

27436e9

Move seq benches to new file

68d902e

Revise seq benches

1616f24

Deprecate seq::sample_slice (and _ref)

d8e9de1

Address review comments and doc example for choose_multiple

737df75

Optimise IteratorExt::choose

263d97c

Rename seq traits and export from prelude

ee52a64

SliceExt → SliceRandom, IteratorExt → IteratorRandom

Fix build for Rustc 1.22

83130e8

Rename choose_multiple → sample

27e323a

sicking force-pushed the pick_and_weighted branch from f93e5f9 to 2a786a8 Compare June 12, 2018 20:35

Implement choose_weighted and choose_weighted_with_total

15f91f5

sicking force-pushed the pick_and_weighted branch from 2a786a8 to 27e323a Compare June 12, 2018 22:30

sicking closed this Jun 15, 2018

sicking deleted the pick_and_weighted branch June 21, 2018 08:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement various choose() methods #490

Implement various choose() methods #490

sicking commented Jun 3, 2018

sicking commented Jun 3, 2018

dhardy left a comment

dhardy Jun 3, 2018

sicking commented Jun 3, 2018

dhardy Jun 8, 2018

sicking commented Jun 12, 2018 •

edited

dhardy commented Jun 13, 2018

sicking commented Jun 15, 2018

Implement various choose() methods #490

Implement various choose() methods #490

Conversation

sicking commented Jun 3, 2018

sicking commented Jun 3, 2018

dhardy left a comment

Choose a reason for hiding this comment

dhardy Jun 3, 2018

Choose a reason for hiding this comment

sicking commented Jun 3, 2018

dhardy Jun 8, 2018

Choose a reason for hiding this comment

sicking commented Jun 12, 2018 • edited

dhardy commented Jun 13, 2018

sicking commented Jun 15, 2018

sicking commented Jun 12, 2018 •

edited