Sequence functionality: API revisions #483

dhardy · 2018-05-26T17:44:08Z

Lets start trying to implement dhardy#82 !

The API is not fixed but has been discussed in the above thread. Comments welcome (but please read the above first).

@pitdicker this is an upstream branch; do what you like with it!

dhardy · 2018-05-27T13:26:48Z

I should note that this is not intended for merging within the 0.5 series, so will probably not be merged soon. At some point we should create a 0.5 branch and update master for 0.6, but not yet.

dhardy · 2018-05-27T14:30:15Z

That completes the basic implementations. Still to do:

API critique / bikeshed — may be worth opening a new issue to discuss this
revise sample_indices somehow, probably (could be separate PR)
Is returning an iterator for SliceExt::choose_multiple appropriate? Is slice.choose_multiple(rng, m).collect() as fast as if we just output a vector?
any other algorithm adjustments?
module documentation
clean up / rebase

vks · 2018-05-28T13:20:30Z

Update minimum Rustc version to 1.26

Is this planned for Rand 0.6?

Instead of having choose_multiple*, couldn't we just provide an iterator that can be collected? (Would be nicer in cases where the size of the iterator is known.)

dhardy · 2018-05-28T14:22:59Z

Updating the minimum Rustc version was to allow use of impl Trait here, but I ended up not using that anyway. No, not planned and it's probably a bit early yet. [Edit: commit removed. Saved in my 'misc' branch.]

The main motivation for choose_multiple_fill is no_std support. SliceExt::choose_multiple uses sample_indices which requires an allocator. I think any form of choose_multiple requires either O(m) storage and iteration over all n elements, or > O(m) storage (potentially O(n) as in sample_indices_inplace or O(m) plus dynamic storage for other mutated elements as in sample_indices_cache).

We could implement SliceExt::choose_multiple_fill with no_std support using an O(n) backing array, though it's probably more practical to use IteratorExt::choose_multiple_fill in this case.

I did try combining the IteratorExt::choose_multiple* functions since they use the same algorithm, but I don't think there's any safe way to make choose_multiple use the no_std-compatible choose_multiple_fill implementation. An iterator-based API still needs O(m) memory which must be passed in as a slice if there is no allocator.

pitdicker · 2018-05-28T18:25:02Z

Lets start trying to implement dhardy#82 !

🎉 I'll have a good look tomorrow.

Nice to see all the deprecated functionality go, and it is less than I expected.

dhardy · 2018-05-28T18:51:49Z

Yes, the "remove deprecations" thing will probably become it's own PR — but not before we have a 0.5 branch, and I'm not sure we should do that yet.

pitdicker · 2018-05-28T18:53:10Z

We should probably be brave and have a couple of point releases first.

dhardy · 2018-06-02T14:37:36Z

TODO: benchmark whether partial_shuffle is actually an improvement over choose_multiple (which does in fact yield shuffled results).

sicking · 2018-06-03T07:53:23Z

I didn't see this PR existed before adding #490. There's some overlap (mainly in where the implementation of the choose methods should live), but so far they mainly look independent.

dhardy · 2018-06-08T10:19:25Z

@sicking would you please review this PR in more detail since it is close to your own #490? The big differences I see:

Weighted choices: I haven't done that yet; hopefully it can be added into this API somehow.
Methods on the Rng trait: IMO these methods should never be more than a wrapper around something else; I know @vks thinks most shouldn't even be there and I believe @pitdicker views these mostly as convenience methods. I guess we could keep the existing methods (choose, choose_mut and shuffle) as wrappers instead of deprecating (as in this PR), but I'm not keen on adding several new things there as your PR does.
Trait names: SliceRandom vs SliceExt, IteratorRandom vs IteratorExt: I don't have much opinion on this, though use rand::seq::SliceRandom; seems redundant. I don't know if SliceExt would be confusing or conflict for users?
This PR has a few extras methods on the traits: shuffle, choose_multiple, partial_shuffle

If some parts of this/your PR are contentious but we can find an agreeable sub-set that would also help make progress.

sicking

Regarding the trait names. I whatever name that we choose, I think we should re-export the trait directly from lib.rs. So that people can do use rand::SliceX. No need to expose them to the internal module I think.

I personally like SliceRandom over SliceExt.

I guess even rand::SliceRandom is a bit redundant. But Ext feels like a pretty unclear abbreviation to me. And arguably also redundant for an extension trait.

Maybe SliceUtils?

Ultimately I don't care that much.

sicking · 2018-06-08T17:02:04Z

src/seq.rs

+            // Continue until the iterator is exhausted
+            for (i, elem) in self.enumerate() {
+                let k = rng.gen_range(0, i + 2);
+                if k == 0 {


Is there a reason for the temporary k? Seems like rng.gen_range(...) == 0 works as well.

Also, this is a perfect place to use gen_ratio.

gen_ratio gets a big speed-up:

# current test seq_iter_choose_from_100 ... bench: 1,053 ns/iter (+/- 55) # using (i + 2) as u32 test seq_iter_choose_from_100 ... bench: 491 ns/iter (+/- 16) # using TryFrom test seq_iter_choose_from_100 ... bench: 526 ns/iter (+/- 14)

Unfortunately TryFrom is still not stable, and also doesn't support working with usize directly (rust-lang/rust#49415). I'm not certain we should use as here and because this is an iterator without known exact size we have no choice but to check each time for correctness.

Yeah. Let's stick with what you have. This might be an argument for making gen_ratio generic over type though. But lets worry about that separately.

sicking · 2018-06-08T17:09:06Z

src/seq.rs

+        let mut len = 0;
+        while len < amount {
+            if let Some(elem) = self.next() {
+                buf[len] = elem;


This results in a boundary check on each assignment. You could invert the loop and do something like:

for (i, slot) in buf.iter_mut() { if let Some(elem) = self.next() { *slot = elem; } else { return i; } }

I tried this, and benchmarks show no difference. I would guess that the check can get eliminated:

for (i, slot) in buf.iter_mut().enumerate() { if let Some(elem) = self.next() { *slot = elem; } else { // Iterator exhausted; stop early return i; // length } }

I personally prefer the approach this approach as it doesn't rely on the optimizer removing the check. But I'm old and crotchety :). Either way is totally fine with me.

Same performance, so I'd rather go with what looks nicer 😄

sicking · 2018-06-08T17:16:39Z

src/seq.rs

+        for (i, elem) in self.enumerate() {
+            let k = rng.gen_range(0, i + 1 + amount);
+            if k < amount {
+                buf[k] = elem;


Similar bounds check here. The existing code does

if Some(slot) = buf.get_mut(amount) { *slot = elem; }

which I thought was neat.

But maybe in both these instances the optimizer is able to get rid of the bounds check? If that's the case I think what you have here is more readable.

I have no idea. How does one check that (other than by benchmark)? I tried looking at the generated code for a small example, but even the MIR is 1200 lines, and likely before optimisations. Assembly is 5800.

Since in this case there is a test k < amount directly before the buf[k] = ... (and amount = buf.len()), I would guess that the compiler can eliminate the redundant check.

I tend to play it safe and do the thing that doesn't rely on the optimizer, as long as it doesn't make the code meaningfully more ugly. Especially when in doubt. Not a big deal either way.

sicking · 2018-06-08T17:18:19Z

src/seq.rs

+        where R: Rng + ?Sized
+    {
+        let mut reservoir = Vec::with_capacity(amount);
+         reservoir.extend(self.by_ref().take(amount));


There's an extra indentation here.

sicking · 2018-06-08T17:41:16Z

src/seq.rs

+    /// 
+    /// Complexity is `O(n)` where `n` is the length of the iterator.
+    #[cfg(feature = "alloc")]
+    fn choose_multiple<R>(mut self, rng: &mut R, amount: usize) -> Vec<Self::Item>


It's a bummer that we can't implement this by using choose_multiple_fill. Even using unsafe code I wasn't able to make something working that didn't result in more code than the current approach.

Separately, I kind'a liked the Result<Vec, Vec> approach. It definitely looked weird, but it seems likely that there's a bug if someone is trying to collect more items than exists, and returning a Result seems more likely to catch that bug. But I don't have strong feelings about it.

I tried combining the implementations too. Problem is we want allocated-on-first-use storage which Vec gives us for free but isn't compatible with buffers; some kind of trait might be able to abstract over this, but as you said it just results in more code in total.

About the return type, I'm not sure; both have their merits. However what about consistency? I think it would be possible to make SliceExt::choose_multiple return a Result too, even using iterators as it does now.

Good point about consistency. Would make sense to return a Result<Iterator, Iterator> for SliceExt::choose_multiple.

You could even argue that we should be consistent with choose and return an Result<T, Something>, but that's more dubious.

I don't have strong feelings about this. Maybe sticking with what we have now makes for the cleanest API.

The std library already uses Option in cases where None is kind-of an error. I don't think it makes sense to use Result<T, ()> (or NoInput instead of ()).

Yeah, I was thinking that too. I think just returning a Vec/Iterator might be the cleanest solution.

sicking · 2018-06-08T18:28:36Z

src/seq.rs

+            i -= 1;
+            // lock element i in place.
+            self.swap(i, rng.gen_range(0, i + 1));
+        }


Same change as for shuffle.

But more importantly, it's a little surprising to me to shuffle the end of the slice, rather than the beginning of it. Is there a reason for that?

I tried reversing the order of one of these shuffles and found the result slower. I copied @burdges implementation.

Is the perf difference meaningful? I feel like it's quite surprising to operate on a slice from end, but I guess it could be ok as long as we are very clearly documenting it.

sicking · 2018-06-08T19:15:05Z

src/seq.rs

+            None
+        } else {
+            let len = self.len();
+            Some(&mut self[rng.gen_range(0, len)])


Put self.len() inline like in the previous function?

Causes a failure with rustc 1.22

sicking · 2018-06-08T19:20:04Z

src/seq.rs

+/// trait.SliceExt.html#method.choose_multiple).
+#[cfg(feature = "alloc")]
+#[derive(Debug)]
+pub struct SliceChooseIter<'a, S: ?Sized + 'a, T: 'a> {


You should also implement ExactSizeIterator for this struct.

Done (comment not auto-hidden)

sicking · 2018-06-08T19:25:03Z

src/seq.rs

+    type Item = &'a T;
+
+    fn next(&mut self) -> Option<Self::Item> {
+        self.indices.next().map(|i| &(*self.slice)[i])


You could call get_unchecked here to save the bounds-check.

The required trait is nightly-only for now: https://doc.rust-lang.org/std/slice/trait.SliceIndex.html#tymethod.get_unchecked

sicking · 2018-06-08T19:34:05Z

src/seq.rs

+        assert_eq!(v.iter().choose(&mut r), Some(&3));
+
+        let v: &[isize] = &[];
+        assert_eq!(v.choose(&mut r), None);


These tests aren't very comprehensive. But I'm happy to add the tests I have in #490 over there.

Sure, I'll look forward to it.

sicking · 2018-06-08T20:30:48Z

If we really want to take advantage of namespaces and avoid redundancy, maybe we should be so bold as simply calling the traits rand::Slice and rand::Iterator. But as someone that has lived in non-namespaced languages most of my life, those names definitely make me a little uncomfortable. I don't know if there would be any practical problems with it though?

dhardy · 2018-06-09T09:59:54Z

src/seq.rs

+    #[cfg(feature = "alloc")]
+    fn choose_multiple<R>(&self, rng: &mut R, amount: usize) -> SliceChooseIter<Self, Self::Item>
+        where R: Rng + ?Sized
+    {


This one could also be made to return a Result, but it's a bit ugly. Thoughts? Doesn't seem to have much overhead, other than the ugly code.

fn choose_multiple<R>(&self, rng: &mut R, amount: usize) -> Result<SliceChooseIter<Self, Self::Item>, SliceChooseIter<Self, Self::Item>> where R: Rng + ?Sized { if amount <= self.len() { Ok(SliceChooseIter { slice: self, _phantom: Default::default(), indices: sample_indices(rng, self.len(), amount).into_iter(), }) } else { let mut indices: Vec<usize> = (0..self.len()).collect(); indices.shuffle(rng); Err(SliceChooseIter { slice: self, _phantom: Default::default(), indices: indices.into_iter(), }) } }

The neat-freak in me thinks that maybe we should just go with the old approach.

"old" meaning without the Result? @pitdicker what do you think (also about using Result to indicate not enough elements on the other choose_multiple above)?

"old" meaning as you have it now, not a Result. Same with the other choose_multiple.

I want to be clear that I really don't have a strong opinion either way. Other than that I think it makes sense to be consistent.

I played with the same problem in dhardy#82 (comment).

I don't think returning a result type for choose_multiple(n), so an iterator can return an error if it has less than n elements, is worth it. Checking the length of the Vec is just as good an indication there are less elements. And is returning less if there is simply nothing more available not just how APIs that work with iterators operate right now? For example .take(5).collect() does not guarantee you end up with 5 elements.

What should partial_shuffle(amount) do if amount is greater then the number of elements in the slice? I suppose it should simply shuffle the entire slice, not return some option or error.

dhardy · 2018-06-09T10:16:14Z

@sicking I addressed some of your comments, though not the trait names (TBD).

dhardy · 2018-06-13T08:33:52Z

Looking at languages / libraries I only found the following (most languages don't seem to have this functionality within a standard library):

Python supports random.choice(seq), random.choices(population, weights=.., cum_weights=.., k) (normal or cumulative weights can be used), random.sample(population, k)
GSL has gsl_ran_choose(rng, dest, k, src, n, size) (sampling without replacement) and gsl_ran_sample(rng, dest, k, src, n, size) (sampling with replacement)

In both cases, k is number of samples to take. GSL uses n for the number of elements in the source array and size for the size of elements.

We normally use verbs for functions (choose) rather than nouns (choice). Perhaps we should follow GSL's convention, at least in using the word choose for sampling-without-replacement (also used in "n choose k"; see Binomial coefficient), but we (probably?) still want both single and multiple versions. So:

choose and choose_multiple
choose and choose_k
choose_single and choose
choose_1 and choose
choose (multiple variant only)

Thoughts? I prefer 1, 2 and 5 over the other options.

vks · 2018-06-13T09:11:23Z

What is the motivation for the variant that chooses one element? I don't see how choose_1(&mut rng) is preferable over choose(&mut rng, 1), so I would prefer option 5 over the others.

dhardy · 2018-06-13T09:43:18Z

There is some motivation for the single-element version: (a) simpler, (b) Option<T> is nicer to use than [T] of length 0 or 1 and (c) doesn't require an allocator.

Edit: sorry, return type is Iterator<Item=T>, so motivation (b) is negligible.

I suppose we could add SliceRandom::choose_multiple_fill using only Floyd's algorithm for sampling; i.e. it wouldn't need an allocator but would be slow for large numbers of elements (more than 400-500 samples; also not optimal when the sample space is small relative to the sample size). Then motivation (c) is negated.

Edit: no we can't easily; we need a buffer for the index sampling; using the supplied element buffer would require transmuting and that the element size is at least large enough to represent the largest index and is very broken for types implementing Drop. However, IteratorRandom::choose_multiple_fill can still be used without an allocator; it's just less efficient. Alternatively we could expose a sample_indices_fill function for use without allocators.

Edit: also, sample_indices uses some heuristics to decide which algorithm to use and uses an allocator for the result; if you know that only 1 element is needed then this overhead can be avoided. Currently the heuristics cannot be calculated at compile time if amount is known but length is not; we should probably add a special case here for very small amounts.

sicking · 2018-06-13T18:37:01Z

One thing to keep in mind is that @pitdicker has a different implementation of choose_multiple over in https://github.com/pitdicker/rand/blob/slice_random/src/seq.rs, which looks like it doesn't require allocation.

I don't know if there's a meaningful performance difference between its choose_multiple(1) and our current choose though.

I too have a preference for 1, 2, and 5 (though don't have an opinion about if we use "choose", "pick" or "sample" as base-word).

One important aspect is that it might not be worth having APIs for both "choose one" and "choose multiple" if neither is a common operation. Searching github I see 7K hits for "gen_range", almost all of which are calls to rand. Searching for "choose" gives 5K hits, of which below 1/10 are calls to rand. So gen_range is an order of magnitude more used than choose.

I think I like the option to just add a choose_k method for now. That keeps the name clear but still not too annoying even if you just want one item (whereas choose_multiple(1) arguably is a little awkward). And if people complain, or we see a lot more usage, it'd be a non-breaking change to add choose later.

dhardy · 2018-06-14T07:55:54Z

A quick benchmark shows @pitdicker's pick_multiple has its merits:

test seq_shuffle_100                     ... FAILED
test seq_slice_choose_multiple_10_of_100 ... bench:         561 ns/iter (+/- 21)
test seq_slice_choose_multiple_1k_of_1M  ... bench:      50,009 ns/iter (+/- 16,255)

vs what's in this branch:

test seq_shuffle_100                     ... bench:       1,028 ns/iter (+/- 23)
test seq_slice_choose_multiple_10_of_100 ... bench:         147 ns/iter (+/- 2)
test seq_slice_choose_multiple_1k_of_1M  ... bench:      67,401 ns/iter (+/- 4,221)

(Both exclude my sample-indices optimisations in #479, though for these bounds it shouldn't matter much.)

dhardy · 2018-06-14T08:03:31Z

Re-run, with two different ways of choosing 1 of 1000:

# pitdicker's branch
test seq_slice_choose_1_of_1000          ... bench:           5 ns/iter (+/- 0)
test seq_slice_choose_multiple_10_of_100 ... bench:         562 ns/iter (+/- 14)
test seq_slice_choose_multiple_1_of_1000 ... bench:          56 ns/iter (+/- 3)
test seq_slice_choose_multiple_1k_of_1M  ... bench:      48,229 ns/iter (+/- 5,231)

# this branch
test seq_slice_choose_1_of_1000              ... bench:           4 ns/iter (+/- 0)
test seq_slice_choose_multiple_10_of_100     ... bench:         146 ns/iter (+/- 1)
test seq_slice_choose_multiple_1_of_1000     ... bench:          78 ns/iter (+/- 5)

So I think there's good reason to have a single-sample choose method (#479 may help but definitely not this much).

Edit: I merged #479 into this branch and got:

test seq_iter_choose_from_100                ... bench:         550 ns/iter (+/- 2)
test seq_iter_choose_multiple_10_of_100      ... bench:       1,129 ns/iter (+/- 10)
test seq_iter_choose_multiple_fill_10_of_100 ... bench:       1,099 ns/iter (+/- 57)
test seq_shuffle_100                         ... bench:       1,027 ns/iter (+/- 20)
test seq_slice_choose_1_of_1000              ... bench:           4 ns/iter (+/- 0)
test seq_slice_choose_multiple_10_of_100     ... bench:         112 ns/iter (+/- 1)
test seq_slice_choose_multiple_1_of_1000     ... bench:          28 ns/iter (+/- 2)
test seq_slice_choose_multiple_1k_of_1M      ... bench:      60,218 ns/iter (+/- 1,671)

The seq_slice_choose_multiple_1_of_1000 benchmark went from 78 → 28 ns, but still much greater than 4 ns!

dhardy · 2018-06-14T10:08:52Z

I cleaned up this branch, squashing most of the fixes.

IteratorRandom::choose now uses gen_bool instead of gen_ratio because casting usize → u32 → f64 doesn't make much sense. I also removed the checks, because if you're iterating over 2^53 elements then accuracy isn't going to be your biggest problem!

I kept the method names choose and choose_multiple for now. The brevity of choose_k is nice, but I don't know if it would be obvious enough why we use k.

It would be nice to get this merged now, even if some details might still change. Any more comments before doing so? I'm happy to wait a day or two for reviews.

My observations:

seq: use Floyd's combination algorithm to sample indices #479 has almost no overlap with these changes but does improve performance
Implement various choose() methods #490 has some overlap and some additional features: better tests and weighted sampling
pitdicker/slice_random has an alternative approach to choose_multiple which is sometimes faster, sometimes slower (see above); we may be able to build some optimisations from this and/or can try building an allocation-free alternative (although probably we shouldn't worry much about allocation)

sicking

Looks great to me. Just some minor comments.

sicking · 2018-06-14T19:19:23Z

benches/seq.rs

+    let mut buf = [0; 1];
+    b.iter(|| {
+        for (v, slot) in x.choose_multiple(&mut rng, buf.len()).zip(buf.iter_mut()) {
+            *slot = *v;


I think to make this more comparable to the choose_1_of_1000 test, you'll want to avoid the write into the buffer here. Something like the following feels like the most fair comparison:

b.iter(|| { x.choose_multiple(&mut rng, 1).next() })

Just calling next() doesn't do anything but return an iterator shim designed to be optimised into something else. Maybe .cloned().next()...

Doesn't calling next() return an Option<&usize>?

.cloned() should make it return a copy.

Anyway, this change appears to change performance from 82ns to 79ns (higher baseline because I didn't merge other branch here).

sicking · 2018-06-14T19:43:18Z

benches/seq.rs

+    let mut buf = [0; 10];
+    b.iter(|| {
+        for (v, slot) in x.choose_multiple(&mut rng, buf.len()).zip(buf.iter_mut()) {
+            *slot = *v;


Technically this does measure both the time to call choose_multiple and the time to write into buf, whereas we only really care about the former.

You could do something like x.choose_multiple(&mut rng, 10).sum::<usize>().

But I kind'a worry that the compiler would be smart enough to realize that since all values in x are 1, that it might optimize to simply returning 10. That could be fixed by simply changing one of the values in x though.

But since the writes are consecutive, they should be fast and maybe we shouldn't worry about this? I don't have strong feelings either way.

As a complete aside, it'd be really great if there was a version of Bencher::iter which you could return an Iterator to, and it would consume the iterator. But I'm not sure if there's any appetite for changing that API since apparently the idea is to deprecate it.

I don't think this is important. (I have tested that writing to a buffer is faster than collecting into a vector.)

sicking · 2018-06-14T19:46:34Z

benches/seq.rs

+}
+
+#[bench]
+fn seq_iter_choose_multiple_10_of_100(b: &mut Bencher) {


Does this function need to be flagged with #[cfg(feature = "alloc")]? Or does #[bench] only work in std builds anyway?

I cfg'd the whole module and tested (cargo +nightly test --benches --no-default-features [--features=alloc]). The bit I don't get is why there are no error messages with alloc without std, but there aren't.

The cfg thing made the tests work by effectively removing all the benchmarks — took me a while to work out why no benchmarks were getting run. But we don't need to benchmark without std anyway.

sicking · 2018-06-14T19:58:03Z

src/seq.rs

+/// 
+/// An implementation is provided for slices. This may also be implementable for
+/// other types.
+pub trait SliceRandom {


Should this not be pub since we're exporting it through prelude instead? Or are we fine with committing to exposing this trait as seq::SliceRandom as part of the public API. I don't know what is common practice in Rust. This is also a very minor point which we can worry about polishing later.

The prelude is not supposed to be the primary place to expose any functionality. I don't really see a reason not to expose this name here.

Makes sense to not just rely on prelude. I guess I was partially thinking of if we should expose it as rand::seq::SliceRandom or as rand::SliceRandom (by re-exposing it in lib.rs). Not something I have strong opinions on.

I've been wondering if we should move the Distribution trait to rand::Distribution... but for now lets leave it like this.

sicking · 2018-06-14T19:59:59Z

src/seq.rs

+    ///
+    /// Depending on the implementation, complexity is expected to be `O(m)`,
+    /// where `m = amount`.
+    fn partial_shuffle<R>(&mut self, rng: &mut R, amount: usize)


The docs should mention that the shuffled items are at the end of the passed in array. I.e. that this function shuffles from the end.

Also, why the "Depending on the implementation"? Seems like we can promise that complexity is O(m)?

If we mention this here it becomes part of the function specification, i.e. the implementation can't change it in case it breaks users' assumptions.

Okay for complexity comment.

Given that we're mutating a slice that the caller has access to, I think changing that would be a breaking change either way.

Breaking in what way? Breaking changes usually mean changing the API or violating prior specification, but if there's no specification on this then there's nothing to break.

Don't forget that the remainder (first part of the slice) does get modified too (elements swapped from the last part), so there's not much the user could depend on.

sicking · 2018-06-14T20:01:44Z

src/seq.rs

+    /// This is an efficient method to select `amount` elements at random from
+    /// the slice, provided the slice may be mutated.
+    ///
+    /// If you only need to chose elements randomly and `amount > self.len()/2`


"chose" -> "choose"

sicking · 2018-06-14T20:17:55Z

src/seq.rs

+        for (i, elem) in self.enumerate() {
+            let k = rng.gen_range(0, i + 1 + amount);
+            if k < amount {
+                buf[k] = elem;


We should be consistent about how we do this check. Right now choose_multiple_fill does if k < amount whereas choose_multiple does if let Some(spot) = reservoir.get_mut(k) {. I'd tend to using the latter but either is fine as long as we're consistent.

sicking · 2018-06-14T20:25:16Z

Yeah, I think we should merge this. I'll close #490 since I think the useful parts that's left there are better done as separate PRs for tests and for weighted sampling. The latter definitely needs more discussion about API.

And I think it's more important that we get the API nailed than the implementation. The latter is easy to change in future PRs, but the API is blocking other work.

sicking · 2018-06-15T09:21:24Z

benches/seq.rs

+    let mut buf = [0; 1];
+    b.iter(|| {
+        for (v, slot) in x.choose_multiple(&mut rng, buf.len()).zip(buf.iter_mut()) {
+            *slot = *v;


Doesn't calling next() return an Option<&usize>?

sicking · 2018-06-15T09:21:43Z

benches/seq.rs

+}
+
+#[bench]
+fn seq_iter_choose_multiple_10_of_100(b: &mut Bencher) {


sicking · 2018-06-15T09:24:04Z

src/seq.rs

+/// 
+/// An implementation is provided for slices. This may also be implementable for
+/// other types.
+pub trait SliceRandom {


Makes sense to not just rely on prelude. I guess I was partially thinking of if we should expose it as rand::seq::SliceRandom or as rand::SliceRandom (by re-exposing it in lib.rs). Not something I have strong opinions on.

sicking · 2018-06-15T09:25:22Z

src/seq.rs

+    ///
+    /// Depending on the implementation, complexity is expected to be `O(m)`,
+    /// where `m = amount`.
+    fn partial_shuffle<R>(&mut self, rng: &mut R, amount: usize)


Given that we're mutating a slice that the caller has access to, I think changing that would be a breaking change either way.

sicking · 2018-06-15T09:36:36Z

src/seq.rs

+
+    fn next(&mut self) -> Option<Self::Item> {
+        // TODO: investigate using SliceIndex::get_unchecked when stable
+        self.indices.next().map(|i| &(*self.slice)[i])


slice::get_unchecked looks like it has been marked stable since 1.0?
https://doc.rust-lang.org/src/alloc/slice.rs.html#419-423

Yes, but all we know about self.slice is the type bounds: 'a, S: Index<usize, Output = T> + ?Sized + 'a, T: 'a. We need to use the SliceIndex trait, which apparently is going to be marked stable soon, but with most of the methods still unstable.

dhardy · 2018-06-15T12:33:36Z

Updated (rebase, but only minor tweaks)

dhardy force-pushed the seq branch 2 times, most recently from d697fd0 to 8f17ca6 Compare May 27, 2018 14:25

This comment has been minimized.

Sign in to view

dhardy force-pushed the seq branch from 8f17ca6 to d756e0b Compare May 28, 2018 14:28

dhardy mentioned this pull request May 28, 2018

Single item version of sample_iter ? #432

Closed

dhardy added P-postpone Waiting on something else B-API Breakage: API labels Jun 1, 2018

dhardy mentioned this pull request Jun 7, 2018

Sequence sampling: seq, WeightedChoice dhardy/rand#82

Closed

5 tasks

dhardy added V-0.6 and removed P-postpone Waiting on something else labels Jun 7, 2018

dhardy force-pushed the seq branch from d756e0b to d23735b Compare June 8, 2018 10:26

dhardy added F-new-int Functionality: new, within Rand T-sequences Topic: sequences E-question Participation: opinions wanted X-discussion Type: discussion and removed V-0.6 labels Jun 8, 2018

sicking approved these changes Jun 8, 2018

View reviewed changes

dhardy commented Jun 9, 2018

View reviewed changes

sicking mentioned this pull request Jun 12, 2018

Implement various choose() methods #490

Closed

dhardy force-pushed the seq branch from 27e323a to 8b14ab1 Compare June 14, 2018 09:55

sicking approved these changes Jun 14, 2018

View reviewed changes

sicking approved these changes Jun 15, 2018

View reviewed changes

sicking reviewed Jun 15, 2018

View reviewed changes

dhardy added 7 commits June 15, 2018 13:32

Sequence functionality: API revisions

81b7f41

Move choose, choose_mut and shuffle from Rng to SliceExt

f64ce3a

Implement new choose_multiple functions

545895b

Implement IteratorRandom::choose

0de02f1

Implement SliceRandom::partial_shuffle

b20944d

Move seq benches to new file

c083d58

Revise seq benches

1b118c2

dhardy force-pushed the seq branch from 8b14ab1 to 1b118c2 Compare June 15, 2018 12:33

dhardy merged commit 8f2290f into master Jun 15, 2018

dhardy deleted the seq branch June 15, 2018 14:15

dhardy mentioned this pull request Jun 20, 2018

Tracker: 0.6 release #520

Closed

28 tasks

sicking mentioned this pull request Jul 13, 2018

Deprecate Rng::sample and Rng::sample_iter #552

Closed

dhardy modified the milestone: 0.6 release Aug 23, 2018

sync-by-unito bot mentioned this pull request Dec 15, 2020

Bump rand_core from 0.5.1 to 0.6.0 AleoHQ/leo#503

Merged

Sequence functionality: API revisions #483

Sequence functionality: API revisions #483

Conversation

dhardy commented May 26, 2018 • edited

dhardy commented May 27, 2018 • edited

dhardy commented May 27, 2018 • edited

This comment has been minimized.

vks commented May 28, 2018

dhardy commented May 28, 2018 • edited

pitdicker commented May 28, 2018

dhardy commented May 28, 2018

pitdicker commented May 28, 2018

dhardy commented Jun 2, 2018

sicking commented Jun 3, 2018

dhardy commented Jun 8, 2018

sicking left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhardy Jun 9, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sicking commented Jun 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhardy commented Jun 9, 2018

dhardy commented Jun 13, 2018

vks commented Jun 13, 2018

dhardy commented Jun 13, 2018 • edited

sicking commented Jun 13, 2018

dhardy commented Jun 14, 2018

dhardy commented Jun 14, 2018 • edited

dhardy commented Jun 14, 2018

sicking left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhardy Jun 15, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhardy commented May 26, 2018 •

edited

dhardy commented May 27, 2018 •

edited

dhardy commented May 27, 2018 •

edited

dhardy commented May 28, 2018 •

edited

dhardy Jun 9, 2018 •

edited

dhardy commented Jun 13, 2018 •

edited

dhardy commented Jun 14, 2018 •

edited

dhardy Jun 15, 2018 •

edited