New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Iterator::size_hint() to speed up IteratorRandom::choose #593
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -188,20 +188,57 @@ pub trait IteratorRandom: Iterator + Sized { | |
fn choose<R>(mut self, rng: &mut R) -> Option<Self::Item> | ||
where R: Rng + ?Sized | ||
{ | ||
if let Some(elem) = self.next() { | ||
let mut result = elem; | ||
|
||
// Continue until the iterator is exhausted | ||
for (i, elem) in self.enumerate() { | ||
let denom = (i + 2) as f64; // accurate to 2^53 elements | ||
let (mut lower, mut upper) = self.size_hint(); | ||
let mut consumed = 0; | ||
let mut result = None; | ||
|
||
if upper == Some(lower) { | ||
// Remove this once we can specialize on ExactSizeIterator | ||
return if lower == 0 { None } else { self.nth(rng.gen_range(0, lower)) }; | ||
} else if lower <= 1 { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's this path for? Seems to me that it's only purpose is to eliminate a possible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My goal was to ensure that the new code should reduce down to the old code once the compiler does constant folding of an "unhinted" iterator. And similarly that doing constant folding of a perfectly hinted iterator would reduce to SliceRandom::choose. I.e. the first two categories discussed in the initial comment should reduce to optimal code once constant folding is done. That's why both of these paths are there. Both could be removed without loosing any correctness, and in both cases we just end up adding a few branches to the runtime code. (The difference for "perfectly hinted" is slightly bigger for iterators of size exactly 1, but that's a rare use case anyway). We definitely could remove these paths. I don't have a strong opinion either way. The performance difference will be small and will mainly depend on how slow other operations are, i.e. how many items the "unhinted" iterator has, and how fast RngCore implementation is used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Aha, makes some sense, though I still don't really understand the reason to use My gut feeling is that the cases worth bothering with are (a) perfectly hinted iterators, (b) iterators where some lower bound is known, and (c) iterators with no hinting. So you could potentially insert a Are chunked iterators common? It's not a pattern I remember seeing with iterators, but I guess it may be relevant to buffered input. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When
No idea. As mentioned in other comments, the idea is that for all other cases, the code will reduce to the optimal approach. So the cost to support chunked iterators is only one of source complexity, neither one of runtime cost, nor of binary size. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing this path improves benchmarks slightly. The only functional difference from the second path within the loop is that the The Simpler code and faster benchmarks implies we're probably better off without this code? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a strong preference either way. Generally it's a good idea to be careful about optimizing too much based on just microbenchmarks, since that often doesn't reflect real-world performance. In theory this code should help "unhinted" iterators, though as you point out by very little since it mainly saves a few branches. In neither case will we call into the RNG to generate numbers. So yeah, probably worth removing since the win seems quite small. I can see two reasons why removing this code would speed up anything:
Which benchmarks specifically got faster with this removed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The unhinted one:
|
||
result = self.next(); | ||
if result.is_none() { | ||
return result; | ||
} | ||
consumed = 1; | ||
let hint = self.size_hint(); | ||
lower = hint.0; | ||
upper = hint.1; | ||
} | ||
|
||
// Continue until the iterator is exhausted | ||
loop { | ||
if lower > 1 { | ||
let ix = rng.gen_range(0, lower + consumed); | ||
let skip; | ||
if ix < lower { | ||
result = self.nth(ix); | ||
skip = lower - (ix + 1); | ||
} else { | ||
skip = lower; | ||
} | ||
if upper == Some(lower) { | ||
return result; | ||
} | ||
consumed += lower; | ||
if skip > 0 { | ||
self.nth(skip - 1); | ||
} | ||
} else { | ||
let elem = self.next(); | ||
if elem.is_none() { | ||
return result; | ||
} | ||
consumed += 1; | ||
let denom = consumed as f64; // accurate to 2^53 elements | ||
if rng.gen_bool(1.0 / denom) { | ||
result = elem; | ||
} | ||
} | ||
Some(result) | ||
} else { | ||
None | ||
|
||
let hint = self.size_hint(); | ||
lower = hint.0; | ||
upper = hint.1; | ||
} | ||
} | ||
|
||
|
@@ -519,20 +556,85 @@ mod test { | |
assert_eq!(v.choose_mut(&mut r), None); | ||
} | ||
|
||
#[derive(Clone)] | ||
struct UnhintedIterator<I: Iterator + Clone> { | ||
iter: I, | ||
} | ||
impl<I: Iterator + Clone> Iterator for UnhintedIterator<I> { | ||
type Item = I::Item; | ||
fn next(&mut self) -> Option<Self::Item> { | ||
self.iter.next() | ||
} | ||
} | ||
|
||
#[derive(Clone)] | ||
struct ChunkHintedIterator<I: ExactSizeIterator + Iterator + Clone> { | ||
iter: I, | ||
chunk_remaining: usize, | ||
chunk_size: usize, | ||
hint_total_size: bool, | ||
} | ||
impl<I: ExactSizeIterator + Iterator + Clone> Iterator for ChunkHintedIterator<I> { | ||
type Item = I::Item; | ||
fn next(&mut self) -> Option<Self::Item> { | ||
if self.chunk_remaining == 0 { | ||
self.chunk_remaining = ::core::cmp::min(self.chunk_size, | ||
self.iter.len()); | ||
} | ||
self.chunk_remaining = self.chunk_remaining.saturating_sub(1); | ||
|
||
self.iter.next() | ||
} | ||
fn size_hint(&self) -> (usize, Option<usize>) { | ||
(self.chunk_remaining, | ||
if self.hint_total_size { Some(self.iter.len()) } else { None }) | ||
} | ||
} | ||
|
||
#[derive(Clone)] | ||
struct WindowHintedIterator<I: ExactSizeIterator + Iterator + Clone> { | ||
iter: I, | ||
window_size: usize, | ||
hint_total_size: bool, | ||
} | ||
impl<I: ExactSizeIterator + Iterator + Clone> Iterator for WindowHintedIterator<I> { | ||
type Item = I::Item; | ||
fn next(&mut self) -> Option<Self::Item> { | ||
self.iter.next() | ||
} | ||
fn size_hint(&self) -> (usize, Option<usize>) { | ||
(::core::cmp::min(self.iter.len(), self.window_size), | ||
if self.hint_total_size { Some(self.iter.len()) } else { None }) | ||
} | ||
} | ||
|
||
#[test] | ||
fn test_iterator_choose() { | ||
let mut r = ::test::rng(109); | ||
let mut chosen = [0i32; 9]; | ||
for _ in 0..1000 { | ||
let picked = (0..9).choose(&mut r).unwrap(); | ||
chosen[picked] += 1; | ||
} | ||
for count in chosen.iter() { | ||
let err = *count - 1000 / 9; | ||
assert!(-25 <= err && err <= 25); | ||
let r = &mut ::test::rng(109); | ||
fn test_iter<R: Rng + ?Sized, Iter: Iterator<Item=usize> + Clone>(r: &mut R, iter: Iter) { | ||
let mut chosen = [0i32; 9]; | ||
for _ in 0..1000 { | ||
let picked = iter.clone().choose(r).unwrap(); | ||
chosen[picked] += 1; | ||
} | ||
for count in chosen.iter() { | ||
let err = *count - 1000 / 9; | ||
assert!(-25 <= err && err <= 25); | ||
} | ||
} | ||
|
||
assert_eq!((0..0).choose(&mut r), None); | ||
test_iter(r, 0..9); | ||
test_iter(r, [0, 1, 2, 3, 4, 5, 6, 7, 8].iter().cloned()); | ||
#[cfg(feature = "alloc")] | ||
test_iter(r, (0..9).collect::<Vec<_>>().into_iter()); | ||
test_iter(r, UnhintedIterator { iter: 0..9 }); | ||
test_iter(r, ChunkHintedIterator { iter: 0..9, chunk_size: 4, chunk_remaining: 4, hint_total_size: false }); | ||
test_iter(r, ChunkHintedIterator { iter: 0..9, chunk_size: 4, chunk_remaining: 4, hint_total_size: true }); | ||
test_iter(r, WindowHintedIterator { iter: 0..9, window_size: 2, hint_total_size: false }); | ||
test_iter(r, WindowHintedIterator { iter: 0..9, window_size: 2, hint_total_size: true }); | ||
|
||
assert_eq!((0..0).choose(r), None); | ||
assert_eq!(UnhintedIterator{ iter: 0..0 }.choose(r), None); | ||
} | ||
|
||
#[test] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we should remove later — it's valid to give an exact size hint without implementing the latter trait (e.g. because the size bounds are not always known).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can certainly go either way. The code should still compile to something quite similar since the
upper == Some(lower)
check further down is still there. But we do a few branches which the compiler likely won't be able to optimize away before getting there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's seems there's a big difference when this is removed — around 6ms vs 3.7ms on the fully hinted
seq_iter_choose_from_1000
test — so worth keeping I think.