Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation for sampling without replacement #1210

Closed
OliverEvans96 opened this issue Jan 7, 2022 · 6 comments · Fixed by rust-random/book#48
Closed

Improve documentation for sampling without replacement #1210

OliverEvans96 opened this issue Jan 7, 2022 · 6 comments · Fixed by rust-random/book#48

Comments

@OliverEvans96
Copy link

OliverEvans96 commented Jan 7, 2022

Hello,

I see that sampling without replacement was requested in #596 and implemented in #1013, however the documentation is not very helpful to someone trying to figure out how to use this feature.

Currently the Rand lib only implements sampling with replacement, i.e. repeated sampling assumes the same distribution (that any sampled marble has been replaced). An alternative distribution implementing sampling without replacement has been requested.

  • The Sequences section in the Updating to 0.8 chapter of the rand book only says:

Weighted sampling without replacement is now supported, see rand::seq::index::sample_weighted and SliceRandom::choose_multiple_weighted.

Randomly sample exactly amount distinct indices from 0..length, and return them in an arbitrary order (there is no guarantee of shuffling or ordering). The weights are to be provided by the input function weights, which will be called once for each index.

So it took me 6 attempts to find an explanation, which is still only sampling indices, not sampling from a distribution.

It would be really awesome if a search for "replacement" in both the docs and the book returned a clear explanation with a usage example.

Thanks!
Oliver

@dhardy
Copy link
Member

dhardy commented Jan 10, 2022

Hi there, and thanks for pointing out the poor state of the docs.

First, the docs at rust-random.github.io are either outdated or broken. I guess we should fix them. On docs.rs: rand::seq::SliceRandom::choose_multiple_weighted has documentation. (See below.)

So if I understand correctly this issue is about:

  • fix documentation deployment to rust-random.github.io
  • ensure "replacement" is mentioned in the docs (though should you not search for "weighted" instead)?
  • expand the book to cover weighted sampling without replacement (we already have Weighted sequences)
  • fix SliceRandom::choose_multiple_weighted link in updating guide

@dhardy
Copy link
Member

dhardy commented Jan 10, 2022

How did you even find SliceRandom::choose_multiple_weighted? Because navigating the docs normally or even using the search I get to SliceRandom::choose_multiple_weighted (only difference: #method vs #tymethod) which includes the documentation.

@OliverEvans96
Copy link
Author

The #method variant is linked from the Sequences section in Updating to 0.8.

@OliverEvans96
Copy link
Author

I hope you'll excuse my ignorance, but, whether a set of samples are drawn with constant weights and whether they are drawn with replacement seem orthogonal.

I'm very curious to hear why I'm wrong. But I certainly would not have searched for "weighted" when trying to perform a uniform sample without replacement.

Or is the idea that I should basically implement choose_multiple_without_replacement myself by calling choose_weighted in a loop, setting the weight of any element that has previously been chosen to 0?

@dhardy
Copy link
Member

dhardy commented Jan 12, 2022

whether a set of samples are drawn with constant weights and whether they are drawn with replacement seem orthogonal.

If drawing without replacement, then the probability of each value being sampled is affected by each value drawn. There are two ways of simulating this: (1) maintaining a list of "taken" values and using rejection sampling (repeating the process if the sample is already "taken"), or (2) maintaining a representation of the probability of each remaining value being sampled. Method (2) is essentially weighted (or biased) sampling.

Thus, in abstract, the concepts might be considered orthogonal, but when simulating such sampling, the concepts are related.

Or is the idea that I should basically implement choose_multiple_without_replacement myself by calling choose_weighted in a loop, setting the weight of any element that has previously been chosen to 0?

This is method (2), and works fine where the pool of available values is small, but not so well if e.g. there are a million possible values and you only wish to sample 1000 of them.


But let me ask: is there some particular thing you're still struggling to understand how to implement, or are you merely suggesting parts of the documentation that might be improved? Because I have a suspicion that the documentation on choose_multiple and choose_multiple_weighted does not meet your requirements?

If you wish to modify the distribution you are sampling from with each sample taken, that is not supported by the Distribution trait (sample takes &self), so, yes, you'd have to implement this yourself using one of the above methods (you might wish to refer to seq::index code).

@vks
Copy link
Collaborator

vks commented Jan 13, 2022

@OliverEvans96 rand::seq::index::sample probably does what you want, unless you want to get the samples one-by-one. At the moment, we don't offer such an interface, because it is incompatible with our Distribution trait, which only supports independent samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants