Merging Rust Random Choice #110

StefanoD · 2016-07-20T08:54:36Z

Hi,
I want to ask, if there is an interested, to merge my small lib to this crate.
As far as I know, this random crate doesn't support to choose randomly samples by their weight/probabilities.

I have still to refactor the API and write more tests, but the implementation of the algorithm is finished. I have to implement the special case where I get a vec of Box, though.

https://github.com/StefanoD/Rust_Random_Choice

nagisa · 2016-07-20T10:11:05Z

Looks like a good crate!

I want to ask, if there is an interested, to merge my small lib to this crate.

I doubt there’s much drive to merge anything into rand currently. We are in progress of redesigning the crate fully and the general direction this is going is more modularisation – most notably there’s some intent to move distributions to a different crate. I suspect something like your crate would be better out as its own crate too.

dhardy · 2018-06-20T14:26:20Z

Well, we are finally looking at weighted sampling algorithms.

@StefanoD this Stochastic Universal Sampling algorithm has some slightly surprising limitations. Given spoke_gap = sum_of_weights / n, each element i has weight ∈ [k_i * spoke_gap, k_i * (spoke_gap + 1)) for some integer k_i. The element can then only be sampled either k_i or k_i+1 times.

Put simply: elements with weight less than spoke_gap cannot be sampled more than once, and elements with greater weight are guaranteed to be sampled at least once.

I guess for some applications this is okay, but it's not the same as taking n independent weighted samples. I'm therefore not sure it belongs in Rand itself.

The main attraction of this algorithm seems to be speed, specifically optimisation to use only a single random value. Modern PRNGs can be quite fast, so this may not be a good general-purpose choice, although may still be the fastest option.

(The "correct" sampling approach would be to place n spokes randomly on the wheel instead of n evenly spaced spokes. While generating n random positions is not hard, the would then either need sorting or each would need to be looked up independently from the cumulative weights, i.e. O(n log(n)) or O(N log(n)) where N is the number of elements.)

pitdicker · 2018-06-20T16:28:38Z

I wrote a little about it at the bottom of this comment dhardy#82 (comment). The are fields where it is useful, but it is definitely not one of the two 'standard' weighted choice algorithms.

StefanoD · 2018-06-20T17:24:47Z

@dhardy From wiki:

SUS is a development of fitness proportionate selection (FPS) which exhibits no bias and minimal spread. Where FPS chooses several solutions from the population by repeated random sampling, SUS uses a single random value to sample all of the solutions by choosing them at evenly spaced intervals. This gives weaker members of the population (according to their fitness) a chance to be chosen and thus reduces the unfair nature of fitness-proportional selection methods.

But yes, this approach is more about weights and only an approximation of probabilities, but this might be good enough for the most use cases and interesting for embedded devices or real time applications, because of its performance. Actually I think many approaches are also only approximations of probabilities, but with a worse time complexity.

dhardy · 2018-06-20T17:51:23Z

@StefanoD I'm just pointing out that the limits of the approximation are not clear. The highlighted text is more pointing out problems with other approaches than it is discussing limitations.

StefanoD · 2018-06-20T18:25:28Z

@dhardy

each element i has weight ∈ [k_i * spoke_gap, k_i * (spoke_gap + 1)) for some integer k_i. The element can then only be sampled either k_i or k_i+1 times.

If I understand you correctly, you say, that an element can be sampled maximum two times. But this is not correct.
weight is not element of [k_i * spoke_gap, k_i * (spoke_gap + 1)). weight is element of the weight the user stated as parameter!

If there are 4 samples with one element with a weight of 100 (given as parameter!) and the other three only 0.1 and the user wants 100 samples, than spoke_gap = 100.3 / 100 = 1.003. That is, that the sample with weight of 100 will be sampled at least 97 times, if I calculated correctly.

If you are interested of an respective unit test, let me know.

EDIT: After rereading your post, I think, I misunderstood you, but I'm still not sure if your claim is correct.

sicking · 2018-07-25T08:36:34Z

EDIT: After rereading your post, I think, I misunderstood you, but I'm still not sure if your claim is correct.

It's easy enough to test. Simply run

    let mut samples = vec![1, 2, 3, 4];
    let a = 10.0f64;
    let b = 20.0f64;
    let weights: Vec<f64> = vec![1.0, 1.0, a, b];
    let choices = random_choice().random_choice_f64(&samples, &weights, 1000);

You'll see that the difference in number of 0s and number of 1s returned will never be greater than 1. No matter what values you use for a and b.

and if you change weights to vec![1.0, 2.0, a, b];, the number of 1s will always be 2 * count(0s) ± 1.

dhardy · 2019-05-14T15:44:00Z

There does not appear to much more interest in this, and I'd prefer not to add this type of approximation to Rand. We also now support O(1) weighted sampling via the Alias method (#692).

alexcrichton added the F-new-int Functionality: new, within Rand label Jun 14, 2017

dhardy mentioned this issue Mar 4, 2018

Sequence sampling: seq, WeightedChoice dhardy/rand#82

Closed

5 tasks

dhardy mentioned this issue May 14, 2019

Limitations StefanoD/Rust_Random_Choice#5

Open

dhardy closed this as completed May 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging Rust Random Choice #110

Merging Rust Random Choice #110

StefanoD commented Jul 20, 2016 •

edited

nagisa commented Jul 20, 2016

dhardy commented Jun 20, 2018

pitdicker commented Jun 20, 2018

StefanoD commented Jun 20, 2018 •

edited

dhardy commented Jun 20, 2018

StefanoD commented Jun 20, 2018 •

edited

sicking commented Jul 25, 2018

dhardy commented May 14, 2019

Merging Rust Random Choice #110

Merging Rust Random Choice #110

Comments

StefanoD commented Jul 20, 2016 • edited

nagisa commented Jul 20, 2016

dhardy commented Jun 20, 2018

pitdicker commented Jun 20, 2018

StefanoD commented Jun 20, 2018 • edited

dhardy commented Jun 20, 2018

StefanoD commented Jun 20, 2018 • edited

sicking commented Jul 25, 2018

dhardy commented May 14, 2019

StefanoD commented Jul 20, 2016 •

edited

StefanoD commented Jun 20, 2018 •

edited

StefanoD commented Jun 20, 2018 •

edited