Use Iterator::size_hint() to speed up IteratorRandom::choose #593

sicking · 2018-08-21T09:42:13Z

This fixes #511.

I've approached hinting as three different categories:

"unhinted" iterators which always return (0, None). I.e. ones that use the default implementation of size_hint.
"perfectly" hinted iterators where size_hint always returns (N, Some(N))
Anything else.

I suspect the first two categories are most common, and in theory the compiler should be able to optimize these two categories to an optimal implementation with this code.

Unfortunately I'm seeing about a 10% slowdown compared to the existing code for iterators that fall into the first category. There's no inherent reason for this slowdown, but just appears to be the optimizer not doing as good of a job as it's doing on the current code.

For the first category I'm getting optimal performance. I.e. vec.iter().choose(&mut r) is as performant as vec.choose(&mut r).

An alternative would be to not worry about the third category and just do:

    fn choose<R>(mut self, rng: &mut R) -> Option<Self::Item>
        where R: Rng + ?Sized
    {
        let (lower, upper) = self.size_hint();
        if upper == Some(lower) {
            return if lower == 0 { None } else { self.nth(rng.gen_range(0, lower)) };
        }

        ... old code ...

With this I'm just seeing a 5% slowdown for unhinted iterators. Again, there's no reason the optimizer shouldn't be able to optimize away this slowdown, but it's not doing so for me.

Note that both the 10% and the 5% is within the cargo bench-reported noise.

AppVeyorBot · 2018-08-21T09:47:25Z

❌ Build rand 1.0.69 failed (commit 9ee9124fba by @sicking)

AppVeyorBot · 2018-08-21T09:53:06Z

❌ Build rand 1.0.70 failed (commit 8dd0f8477c by @sicking)

AppVeyorBot · 2018-08-21T10:14:25Z

✅ Build rand 1.0.71 completed (commit 337045d12c by @sicking)

AppVeyorBot · 2018-08-21T10:26:01Z

✅ Build rand 1.0.72 completed (commit 0e742eb193 by @sicking)

dhardy

Well @sicking, this did make the logic a bit more complex than I'd expected. I think the logic is correct.

5-10% performance decrease over general iterators for a big increase in cases with known size sounds like a win for me, even if the overhead is a bit higher than expected.

You could try instead only calling size_hint once or twice (i.e. removing the skip logic from the loop); does this improve un-skippable iterators? In my experience most iterators either have known length or completely unknown, or in a few cases may have known minimum. It seems unlikely that size_hint will be useful later — although it's possible this could be useful somewhere I guess.

dhardy · 2018-08-22T09:50:31Z

src/seq/mod.rs

+
+        if upper == Some(lower) {
+            // Remove this once we can specialize on ExactSizeIterator
+            return if lower == 0 { None } else { self.nth(rng.gen_range(0, lower)) };


I don't know if we should remove later — it's valid to give an exact size hint without implementing the latter trait (e.g. because the size bounds are not always known).

Yeah, we can certainly go either way. The code should still compile to something quite similar since the upper == Some(lower) check further down is still there. But we do a few branches which the compiler likely won't be able to optimize away before getting there.

It's seems there's a big difference when this is removed — around 6ms vs 3.7ms on the fully hinted seq_iter_choose_from_1000 test — so worth keeping I think.

dhardy · 2018-08-22T10:19:03Z

src/seq/mod.rs

+        if upper == Some(lower) {
+            // Remove this once we can specialize on ExactSizeIterator
+            return if lower == 0 { None } else { self.nth(rng.gen_range(0, lower)) };
+        } else if lower <= 1 {


What's this path for? Seems to me that it's only purpose is to eliminate a possible None result early, yet it doesn't always do that. Does it improve performance?

My goal was to ensure that the new code should reduce down to the old code once the compiler does constant folding of an "unhinted" iterator. And similarly that doing constant folding of a perfectly hinted iterator would reduce to SliceRandom::choose. I.e. the first two categories discussed in the initial comment should reduce to optimal code once constant folding is done.

That's why both of these paths are there. Both could be removed without loosing any correctness, and in both cases we just end up adding a few branches to the runtime code.

(The difference for "perfectly hinted" is slightly bigger for iterators of size exactly 1, but that's a rare use case anyway).

We definitely could remove these paths. I don't have a strong opinion either way. The performance difference will be small and will mainly depend on how slow other operations are, i.e. how many items the "unhinted" iterator has, and how fast RngCore implementation is used.

Aha, makes some sense, though I still don't really understand the reason to use lower <= 1 here. You could simply drop the condition (equivalent of true) and would get the same behaviour on unhinted iterators.

My gut feeling is that the cases worth bothering with are (a) perfectly hinted iterators, (b) iterators where some lower bound is known, and (c) iterators with no hinting. So you could potentially insert a lower > 0 case (instead of the loop case) and use this code for the lower == 0 case — but I think only benchmarks can tell which one is best.

Are chunked iterators common? It's not a pattern I remember seeing with iterators, but I guess it may be relevant to buffered input.

Aha, makes some sense, though I still don't really understand the reason to use lower <= 1 here

When lower > 1 it is better to use gen_range to pick one of the first lower items, than to unconditionally grab the first item and then iterate.

Are chunked iterators common?

No idea.

As mentioned in other comments, the idea is that for all other cases, the code will reduce to the optimal approach. So the cost to support chunked iterators is only one of source complexity, neither one of runtime cost, nor of binary size.

Removing this path improves benchmarks slightly. The only functional difference from the second path within the loop is that the gen_bool bit is skipped.

The Bernoulli code actually has special handling for the p==1.0 case anyway and doesn't touch the RNG.

Simpler code and faster benchmarks implies we're probably better off without this code?

I don't have a strong preference either way. Generally it's a good idea to be careful about optimizing too much based on just microbenchmarks, since that often doesn't reflect real-world performance.

In theory this code should help "unhinted" iterators, though as you point out by very little since it mainly saves a few branches. In neither case will we call into the RNG to generate numbers. So yeah, probably worth removing since the win seems quite small.

I can see two reasons why removing this code would speed up anything:

We might end up inlining things better since the function is smaller.

The "window hinted" test might get faster since we do one branch less. However this benchmark seems less important to me than the "unhinted" and the fully hinted ones.

Which benchmarks specifically got faster with this removed?

The unhinted one:

test seq_iter_choose_from_1000 ... bench: 3,756 ns/iter (+/- 108) = 2129 MB/s test seq_iter_unhinted_choose_from_1000 ... bench: 5,311 ns/iter (+/- 98) test seq_iter_window_hinted_choose_from_1000 ... bench: 1,789 ns/iter (+/- 44) # after removing this branch: test seq_iter_choose_from_1000 ... bench: 3,746 ns/iter (+/- 60) = 2135 MB/s test seq_iter_unhinted_choose_from_1000 ... bench: 5,007 ns/iter (+/- 106) test seq_iter_window_hinted_choose_from_1000 ... bench: 1,769 ns/iter (+/- 42)

sicking · 2018-08-24T07:54:16Z

You could try instead only calling size_hint once or twice (i.e. removing the skip logic from the loop); does this improve un-skippable iterators? In my experience most iterators either have known length or completely unknown, or in a few cases may have known minimum. It seems unlikely that size_hint will be useful later — although it's possible this could be useful somewhere I guess.

I'm not sure I understand what is being asked here. I agree that most iterators will either have a known length, or a completely unknown one. That is the first two categories in the initial comment above.

For iterators with a known length, size_hint will only be called once. (This is true both if we keep the two code paths at the top or if we remove them).

For iterators with an unknown length, size_hint will be called once for each element. However for such iterators the implementation is most likely simply to return (0, None), which will get inlined as a constant. The code was carefully written such that if the compiler does proper constant folding and inlining (which is generally the case for Rust), the code will reduce to the old code.

The only two cases where we'll have multiple meaningful calls to a size_hint function is:

When using an Iterator through dynamic dispatch. I.e.

fn myfunc(i: &mut dyn Iterator<Item = usize>) -> usize {
    i.choose(&mut thread_rng()).unwrap()
}

Performance in this case is going to be worse than current code. But I hope this is a very rare pattern, especially given how easy Rust makes it to avoid dynamic dispatch.

When an iterator doesn't fall into either the "perfectly hinted" or the "unhinted" categories. I.e. where there is some size information, but not perfect one. Which it sounds like we both consider a rare case. For this category the patch assumes that if someone implemented size_hint to do something different than the default implementation, that it'll sometimes be able to return a lower bound larger than 1. Performance here depends on how often that is, as well as on the performance of the size_hint function and the RngCore use. We're trading more calls to size_hint for fewer calls to RngCore.

We could go the alternative route mentioned in the initial comment. That'll ensure that we never call size_hint more than once. It would make the dynamic dispatch case faster, but the third category of iterators likely slower. It will not affect "perfectly hinted" or "unhinted" iterators.

dhardy · 2018-08-24T08:13:02Z

Ah, that's the design rationale I was missing 👍

sicking · 2018-08-24T09:09:32Z

This was likely my last actual code contribution to rand for the foreseeable future. Both because I have less time these days, but also because I have accomplished as much of my goals as I think I can.

So especially appreciate the quick review and merge on this one!

I also added #594, but that's just documentation changes so not counting that one 😄

It's been great to get to contribute to rand! Both as a learning experience, and hopefully as something which will improve people's experience using rand and rust.

So wanted to give a thanks to @dhardy and @pitdicker for your time in discussions and doing reviews.

Best of luck getting to 1.0. Looking forward to seeing what it'll look like.

dhardy · 2018-08-24T10:17:14Z

Thank you so much for the contributions too. There have been a few differences of opinion but also a lot of high quality code! Without volunteers like yourself this project would probably still be sitting at 0.3 with only minor patches landing 😄

sicking force-pushed the iterchoose branch from 6aee0f2 to 9ff000a Compare August 21, 2018 09:44

sicking force-pushed the iterchoose branch from 9ff000a to e94bfb1 Compare August 21, 2018 10:01

Use Iterator::size_hint() to speed up IteratorRandom::choose

be1158d

sicking force-pushed the iterchoose branch from e94bfb1 to be1158d Compare August 21, 2018 10:12

dhardy reviewed Aug 23, 2018

View reviewed changes

dhardy added the C-optimisation label Aug 23, 2018

dhardy merged commit 82fa573 into rust-random:master Aug 24, 2018

sicking deleted the iterchoose branch August 24, 2018 08:16

sicking restored the iterchoose branch August 24, 2018 08:16

sicking deleted the iterchoose branch August 24, 2018 09:00

dhardy mentioned this pull request Sep 20, 2018

Merge 0.5 #599

Merged

sync-by-unito bot mentioned this pull request Dec 15, 2020

Bump rand_core from 0.5.1 to 0.6.0 AleoHQ/leo#503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Iterator::size_hint() to speed up IteratorRandom::choose #593

Use Iterator::size_hint() to speed up IteratorRandom::choose #593

sicking commented Aug 21, 2018

AppVeyorBot commented Aug 21, 2018

AppVeyorBot commented Aug 21, 2018

AppVeyorBot commented Aug 21, 2018

AppVeyorBot commented Aug 21, 2018

dhardy left a comment

dhardy Aug 22, 2018

sicking Aug 23, 2018

dhardy Sep 20, 2018

dhardy Aug 22, 2018

sicking Aug 24, 2018

dhardy Aug 24, 2018

sicking Aug 24, 2018

dhardy Sep 20, 2018

sicking Sep 23, 2018

dhardy Sep 23, 2018

sicking commented Aug 24, 2018

dhardy commented Aug 24, 2018

sicking commented Aug 24, 2018 •

edited

dhardy commented Aug 24, 2018

Use Iterator::size_hint() to speed up IteratorRandom::choose #593

Use Iterator::size_hint() to speed up IteratorRandom::choose #593

Conversation

sicking commented Aug 21, 2018

AppVeyorBot commented Aug 21, 2018

AppVeyorBot commented Aug 21, 2018

AppVeyorBot commented Aug 21, 2018

AppVeyorBot commented Aug 21, 2018

dhardy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sicking commented Aug 24, 2018

dhardy commented Aug 24, 2018

sicking commented Aug 24, 2018 • edited

dhardy commented Aug 24, 2018

sicking commented Aug 24, 2018 •

edited