Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging Rust Random Choice #110

Closed
StefanoD opened this issue Jul 20, 2016 · 8 comments
Closed

Merging Rust Random Choice #110

StefanoD opened this issue Jul 20, 2016 · 8 comments
Labels
F-new-int Functionality: new, within Rand

Comments

@StefanoD
Copy link

StefanoD commented Jul 20, 2016

Hi,
I want to ask, if there is an interested, to merge my small lib to this crate.
As far as I know, this random crate doesn't support to choose randomly samples by their weight/probabilities.

I have still to refactor the API and write more tests, but the implementation of the algorithm is finished. I have to implement the special case where I get a vec of Box, though.

https://github.com/StefanoD/Rust_Random_Choice

@nagisa
Copy link
Contributor

nagisa commented Jul 20, 2016

Looks like a good crate!

I want to ask, if there is an interested, to merge my small lib to this crate.

I doubt there’s much drive to merge anything into rand currently. We are in progress of redesigning the crate fully and the general direction this is going is more modularisation – most notably there’s some intent to move distributions to a different crate. I suspect something like your crate would be better out as its own crate too.

@alexcrichton alexcrichton added the F-new-int Functionality: new, within Rand label Jun 14, 2017
@dhardy
Copy link
Member

dhardy commented Jun 20, 2018

Well, we are finally looking at weighted sampling algorithms.

@StefanoD this Stochastic Universal Sampling algorithm has some slightly surprising limitations. Given spoke_gap = sum_of_weights / n, each element i has weight ∈ [k_i * spoke_gap, k_i * (spoke_gap + 1)) for some integer k_i. The element can then only be sampled either k_i or k_i+1 times.

Put simply: elements with weight less than spoke_gap cannot be sampled more than once, and elements with greater weight are guaranteed to be sampled at least once.

I guess for some applications this is okay, but it's not the same as taking n independent weighted samples. I'm therefore not sure it belongs in Rand itself.

The main attraction of this algorithm seems to be speed, specifically optimisation to use only a single random value. Modern PRNGs can be quite fast, so this may not be a good general-purpose choice, although may still be the fastest option.

(The "correct" sampling approach would be to place n spokes randomly on the wheel instead of n evenly spaced spokes. While generating n random positions is not hard, the would then either need sorting or each would need to be looked up independently from the cumulative weights, i.e. O(n log(n)) or O(N log(n)) where N is the number of elements.)

@pitdicker
Copy link
Contributor

I wrote a little about it at the bottom of this comment dhardy#82 (comment). The are fields where it is useful, but it is definitely not one of the two 'standard' weighted choice algorithms.

@StefanoD
Copy link
Author

StefanoD commented Jun 20, 2018

@dhardy From wiki:

SUS is a development of fitness proportionate selection (FPS) which exhibits no bias and minimal spread. Where FPS chooses several solutions from the population by repeated random sampling, SUS uses a single random value to sample all of the solutions by choosing them at evenly spaced intervals. This gives weaker members of the population (according to their fitness) a chance to be chosen and thus reduces the unfair nature of fitness-proportional selection methods.

But yes, this approach is more about weights and only an approximation of probabilities, but this might be good enough for the most use cases and interesting for embedded devices or real time applications, because of its performance. Actually I think many approaches are also only approximations of probabilities, but with a worse time complexity.

@dhardy
Copy link
Member

dhardy commented Jun 20, 2018

@StefanoD I'm just pointing out that the limits of the approximation are not clear. The highlighted text is more pointing out problems with other approaches than it is discussing limitations.

@StefanoD
Copy link
Author

StefanoD commented Jun 20, 2018

@dhardy

each element i has weight ∈ [k_i * spoke_gap, k_i * (spoke_gap + 1)) for some integer k_i. The element can then only be sampled either k_i or k_i+1 times.

If I understand you correctly, you say, that an element can be sampled maximum two times. But this is not correct.
weight is not element of [k_i * spoke_gap, k_i * (spoke_gap + 1)). weight is element of the weight the user stated as parameter!

If there are 4 samples with one element with a weight of 100 (given as parameter!) and the other three only 0.1 and the user wants 100 samples, than spoke_gap = 100.3 / 100 = 1.003. That is, that the sample with weight of 100 will be sampled at least 97 times, if I calculated correctly.

If you are interested of an respective unit test, let me know.

EDIT: After rereading your post, I think, I misunderstood you, but I'm still not sure if your claim is correct.

@sicking
Copy link
Contributor

sicking commented Jul 25, 2018

EDIT: After rereading your post, I think, I misunderstood you, but I'm still not sure if your claim is correct.

It's easy enough to test. Simply run

    let mut samples = vec![1, 2, 3, 4];
    let a = 10.0f64;
    let b = 20.0f64;
    let weights: Vec<f64> = vec![1.0, 1.0, a, b];
    let choices = random_choice().random_choice_f64(&samples, &weights, 1000);

You'll see that the difference in number of 0s and number of 1s returned will never be greater than 1. No matter what values you use for a and b.

and if you change weights to vec![1.0, 2.0, a, b];, the number of 1s will always be 2 * count(0s) ± 1.

@dhardy
Copy link
Member

dhardy commented May 14, 2019

There does not appear to much more interest in this, and I'd prefer not to add this type of approximation to Rand. We also now support O(1) weighted sampling via the Alias method (#692).

@dhardy dhardy closed this as completed May 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-new-int Functionality: new, within Rand
Projects
None yet
Development

No branches or pull requests

6 participants