Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHANGE: Performance Optimizations for IteratorRandom::choose() and related methods #1266

Open
wainwrightmark opened this issue Nov 16, 2022 · 7 comments

Comments

@wainwrightmark
Copy link
Contributor

Summary

IteratorRandom::choose() and related methods call gen_range() much more frequently than strictly necessary when choosing from iterators without size hints.
By reducing the number of these calls it is possible to speed up the relevant benchmarks by about 1.5-2x

Specifically

# Using Pcg32
test seq_iter_unhinted_choose_from_1000      ... bench:       4,049 ns/iter (+/- 377) #old
test seq_iter_unhinted_choose_from_1000      ... bench:       2,730 ns/iter (+/- 335) #new

# Using ChaChaRng
test seq_iter_unhinted_choose_from_1000      ... bench:       6,831 ns/iter (+/- 317) #old
test seq_iter_unhinted_choose_from_1000      ... bench:       3,447 ns/iter (+/- 627) #new

This would not require a breaking change to the API or violate any explicit promises but if this change is made then different items will be randomly returned from the choose() method and the Rng will be in a different state afterwards which may affect some users.

Details

The IteratorRandom::choose() and related methods would have to change.
The way they work now (for iterators without size hints) is that for every item, they generate a random number in the range 0..n where n is the number of items seen so far. If the generated number is zero (i.e. a 1/n chance) the current result changes to the new number.

This means that for the first twenty items, twenty random numbers are generated. My suggestion is to instead generate a number in the range 0..20! (20 factorial is the largest factorial less than u64::MAX) and use this random number instead of the first twenty random numbers.

A number in that range has a 1 in 2 chance of being 0 mod 2, then after division by 2, it has a 1 in 3 chance of being 0 mod 3 and so on up to 20.

After the first twenty numbers you then have to use 33!/20! which only gives you 13 numbers and the efficiency gradually decreases with larger n.
This seems like a lot of work but it's still much cheaper than generating new random numbers, especially with cryptographically secure rngs.

There is no measurable performance change when using choose() with a fixed slice iterator (the code for that part is unchanged).
When using an iterator with a size hint but not a fixed size, performance seems to have degraded very slightly. It would be possible to apply this optimization to that case as well though.

I believe the results of the functions would be consistent between 32 and 64 bit architectures (I am generating u64, not usize). I don't have 32 bit hardware to test on so I can't comment on the performance differences.

The following methods would also probably benefit from similar optimizations:

IteratorRandom::choose_stable()
IteratorRandom::choose_multiple_fill()
IteratorRandom::choose_multiple()
SliceRandom::shuffle()
SliceRandom::partial_shuffle()

Motivation

Performance.

Alternatives

The code could be left as it is and performance hungry users could use my crate which has a fast version of the choose function (I made the crate just to have a random_max function but got carried away optimizing it).

If this is considered worth doing, I'm happy to submit a PR. I have written most of the code already for my own crate.

Have a lovely day, Mark

@dhardy
Copy link
Member

dhardy commented Nov 17, 2022

I like the idea Mark. This would be a value-breaking change (acceptable at this time).

Could you run a couple more benchmarks, including for very short iterators and multiple RNGs? If they look reasonable (not too much penalty with very short iterators) then I'd like to see a PR.

@wainwrightmark
Copy link
Contributor Author

I made a PR. I actually found an even faster way of doing it, that uses the minimum possible number of gens (average of 2 per iterator item). That method won't work for shuffle() and partial_shuffle() though and I'm unsure which is better for choose_multiple() so I might make a separate PR for those.

@TheIronBorn
Copy link
Collaborator

choose_max from your crate sounds useful as well. I've written my own simple versions multiple times but it'd be nice to have a robust one

@wainwrightmark
Copy link
Contributor Author

choose_max from your crate sounds useful as well. I've written my own simple versions multiple times but it'd be nice to have a robust one

Good idea. Will submit a PR at some point.

@dhardy
Copy link
Member

dhardy commented Jan 5, 2023

I was going to close this now that #1268 is merged, but we should answer this first:

choose_max from your crate sounds useful as well. I've written my own simple versions multiple times but it'd be nice to have a robust one

But is this useful enough to include in rand when it already exists in the kindness crate? Rand is already quite big. On the other hand, it's not much new code. Arguments for inclusion please, otherwise it's a no.

@TheIronBorn
Copy link
Collaborator

It’s quite common in machine learning and mathematical optimization at least. Before kindness I was writing my own but it wasn’t as robust

@wainwrightmark
Copy link
Contributor Author

Probably not a huge deal but the coin_flipper code which would be used by choose_max and friends is private (as it should be) so it has to be duplicated in the kindness and rand crates. All the methods are #[inline] though so this probably won't have binary size implications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants