Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Iterator::size_hint() to speed up IteratorRandom::choose #593

Merged
merged 1 commit into from Aug 24, 2018

Conversation

sicking
Copy link
Contributor

@sicking sicking commented Aug 21, 2018

This fixes #511.

I've approached hinting as three different categories:

  • "unhinted" iterators which always return (0, None). I.e. ones that use the default implementation of size_hint.
  • "perfectly" hinted iterators where size_hint always returns (N, Some(N))
  • Anything else.

I suspect the first two categories are most common, and in theory the compiler should be able to optimize these two categories to an optimal implementation with this code.

Unfortunately I'm seeing about a 10% slowdown compared to the existing code for iterators that fall into the first category. There's no inherent reason for this slowdown, but just appears to be the optimizer not doing as good of a job as it's doing on the current code.

For the first category I'm getting optimal performance. I.e. vec.iter().choose(&mut r) is as performant as vec.choose(&mut r).

An alternative would be to not worry about the third category and just do:

    fn choose<R>(mut self, rng: &mut R) -> Option<Self::Item>
        where R: Rng + ?Sized
    {
        let (lower, upper) = self.size_hint();
        if upper == Some(lower) {
            return if lower == 0 { None } else { self.nth(rng.gen_range(0, lower)) };
        }

        ... old code ...

With this I'm just seeing a 5% slowdown for unhinted iterators. Again, there's no reason the optimizer shouldn't be able to optimize away this slowdown, but it's not doing so for me.

Note that both the 10% and the 5% is within the cargo bench-reported noise.

@AppVeyorBot
Copy link

Build rand 1.0.69 failed (commit 9ee9124fba by @sicking)

@AppVeyorBot
Copy link

Build rand 1.0.70 failed (commit 8dd0f8477c by @sicking)

@AppVeyorBot
Copy link

Build rand 1.0.71 completed (commit 337045d12c by @sicking)

@AppVeyorBot
Copy link

Build rand 1.0.72 completed (commit 0e742eb193 by @sicking)

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well @sicking, this did make the logic a bit more complex than I'd expected. I think the logic is correct.

5-10% performance decrease over general iterators for a big increase in cases with known size sounds like a win for me, even if the overhead is a bit higher than expected.

You could try instead only calling size_hint once or twice (i.e. removing the skip logic from the loop); does this improve un-skippable iterators? In my experience most iterators either have known length or completely unknown, or in a few cases may have known minimum. It seems unlikely that size_hint will be useful later — although it's possible this could be useful somewhere I guess.


if upper == Some(lower) {
// Remove this once we can specialize on ExactSizeIterator
return if lower == 0 { None } else { self.nth(rng.gen_range(0, lower)) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should remove later — it's valid to give an exact size hint without implementing the latter trait (e.g. because the size bounds are not always known).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can certainly go either way. The code should still compile to something quite similar since the upper == Some(lower) check further down is still there. But we do a few branches which the compiler likely won't be able to optimize away before getting there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's seems there's a big difference when this is removed — around 6ms vs 3.7ms on the fully hinted seq_iter_choose_from_1000 test — so worth keeping I think.

if upper == Some(lower) {
// Remove this once we can specialize on ExactSizeIterator
return if lower == 0 { None } else { self.nth(rng.gen_range(0, lower)) };
} else if lower <= 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this path for? Seems to me that it's only purpose is to eliminate a possible None result early, yet it doesn't always do that. Does it improve performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal was to ensure that the new code should reduce down to the old code once the compiler does constant folding of an "unhinted" iterator. And similarly that doing constant folding of a perfectly hinted iterator would reduce to SliceRandom::choose. I.e. the first two categories discussed in the initial comment should reduce to optimal code once constant folding is done.

That's why both of these paths are there. Both could be removed without loosing any correctness, and in both cases we just end up adding a few branches to the runtime code.

(The difference for "perfectly hinted" is slightly bigger for iterators of size exactly 1, but that's a rare use case anyway).

We definitely could remove these paths. I don't have a strong opinion either way. The performance difference will be small and will mainly depend on how slow other operations are, i.e. how many items the "unhinted" iterator has, and how fast RngCore implementation is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, makes some sense, though I still don't really understand the reason to use lower <= 1 here. You could simply drop the condition (equivalent of true) and would get the same behaviour on unhinted iterators.

My gut feeling is that the cases worth bothering with are (a) perfectly hinted iterators, (b) iterators where some lower bound is known, and (c) iterators with no hinting. So you could potentially insert a lower > 0 case (instead of the loop case) and use this code for the lower == 0 case — but I think only benchmarks can tell which one is best.

Are chunked iterators common? It's not a pattern I remember seeing with iterators, but I guess it may be relevant to buffered input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, makes some sense, though I still don't really understand the reason to use lower <= 1 here

When lower > 1 it is better to use gen_range to pick one of the first lower items, than to unconditionally grab the first item and then iterate.

Are chunked iterators common?

No idea.

As mentioned in other comments, the idea is that for all other cases, the code will reduce to the optimal approach. So the cost to support chunked iterators is only one of source complexity, neither one of runtime cost, nor of binary size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this path improves benchmarks slightly. The only functional difference from the second path within the loop is that the gen_bool bit is skipped.

The Bernoulli code actually has special handling for the p==1.0 case anyway and doesn't touch the RNG.

Simpler code and faster benchmarks implies we're probably better off without this code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference either way. Generally it's a good idea to be careful about optimizing too much based on just microbenchmarks, since that often doesn't reflect real-world performance.

In theory this code should help "unhinted" iterators, though as you point out by very little since it mainly saves a few branches. In neither case will we call into the RNG to generate numbers. So yeah, probably worth removing since the win seems quite small.

I can see two reasons why removing this code would speed up anything:

  • We might end up inlining things better since the function is smaller.
  • The "window hinted" test might get faster since we do one branch less. However this benchmark seems less important to me than the "unhinted" and the fully hinted ones.

Which benchmarks specifically got faster with this removed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unhinted one:

test seq_iter_choose_from_1000               ... bench:       3,756 ns/iter (+/- 108) = 2129 MB/s
test seq_iter_unhinted_choose_from_1000      ... bench:       5,311 ns/iter (+/- 98)
test seq_iter_window_hinted_choose_from_1000 ... bench:       1,789 ns/iter (+/- 44)
# after removing this branch:
test seq_iter_choose_from_1000               ... bench:       3,746 ns/iter (+/- 60) = 2135 MB/s
test seq_iter_unhinted_choose_from_1000      ... bench:       5,007 ns/iter (+/- 106)
test seq_iter_window_hinted_choose_from_1000 ... bench:       1,769 ns/iter (+/- 42)

@sicking
Copy link
Contributor Author

sicking commented Aug 24, 2018

You could try instead only calling size_hint once or twice (i.e. removing the skip logic from the loop); does this improve un-skippable iterators? In my experience most iterators either have known length or completely unknown, or in a few cases may have known minimum. It seems unlikely that size_hint will be useful later — although it's possible this could be useful somewhere I guess.

I'm not sure I understand what is being asked here. I agree that most iterators will either have a known length, or a completely unknown one. That is the first two categories in the initial comment above.

For iterators with a known length, size_hint will only be called once. (This is true both if we keep the two code paths at the top or if we remove them).

For iterators with an unknown length, size_hint will be called once for each element. However for such iterators the implementation is most likely simply to return (0, None), which will get inlined as a constant. The code was carefully written such that if the compiler does proper constant folding and inlining (which is generally the case for Rust), the code will reduce to the old code.

The only two cases where we'll have multiple meaningful calls to a size_hint function is:

  • When using an Iterator through dynamic dispatch. I.e.
fn myfunc(i: &mut dyn Iterator<Item = usize>) -> usize {
    i.choose(&mut thread_rng()).unwrap()
}

Performance in this case is going to be worse than current code. But I hope this is a very rare pattern, especially given how easy Rust makes it to avoid dynamic dispatch.

  • When an iterator doesn't fall into either the "perfectly hinted" or the "unhinted" categories. I.e. where there is some size information, but not perfect one. Which it sounds like we both consider a rare case. For this category the patch assumes that if someone implemented size_hint to do something different than the default implementation, that it'll sometimes be able to return a lower bound larger than 1. Performance here depends on how often that is, as well as on the performance of the size_hint function and the RngCore use. We're trading more calls to size_hint for fewer calls to RngCore.

We could go the alternative route mentioned in the initial comment. That'll ensure that we never call size_hint more than once. It would make the dynamic dispatch case faster, but the third category of iterators likely slower. It will not affect "perfectly hinted" or "unhinted" iterators.

@dhardy
Copy link
Member

dhardy commented Aug 24, 2018

Ah, that's the design rationale I was missing 👍

@dhardy dhardy merged commit 82fa573 into rust-random:master Aug 24, 2018
@sicking sicking deleted the iterchoose branch August 24, 2018 08:16
@sicking sicking restored the iterchoose branch August 24, 2018 08:16
@sicking sicking deleted the iterchoose branch August 24, 2018 09:00
@sicking
Copy link
Contributor Author

sicking commented Aug 24, 2018

This was likely my last actual code contribution to rand for the foreseeable future. Both because I have less time these days, but also because I have accomplished as much of my goals as I think I can.

So especially appreciate the quick review and merge on this one!

I also added #594, but that's just documentation changes so not counting that one 😄

It's been great to get to contribute to rand! Both as a learning experience, and hopefully as something which will improve people's experience using rand and rust.

So wanted to give a thanks to @dhardy and @pitdicker for your time in discussions and doing reviews.

Best of luck getting to 1.0. Looking forward to seeing what it'll look like.

@dhardy
Copy link
Member

dhardy commented Aug 24, 2018

Thank you so much for the contributions too. There have been a few differences of opinion but also a lot of high quality code! Without volunteers like yourself this project would probably still be sitting at 0.3 with only minor patches landing 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimise IteratorRandom::choose for size_hint or ExactSizeIterator
3 participants