Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output of gen_range is platform dependent for usize #1399

Open
ttencate opened this issue Feb 20, 2024 · 5 comments
Open

Output of gen_range is platform dependent for usize #1399

ttencate opened this issue Feb 20, 2024 · 5 comments

Comments

@ttencate
Copy link

The output of Rng::gen_range is different depending on whether usize is 32 bits or 64 bits on the current platform, even if the requested range is small (0..12 in my case). For example:

    use rand::Rng;
    use rand_core::SeedableRng;

    let mut rng = rand_pcg::Pcg32::seed_from_u64(0);
    for _ in 0..16 {
        let x: usize = rng.gen_range(0..12);
        print!("{} ", x);
    }
    println!();

On a 64-bits system, the output is: 4 8 6 0 5 1 5 8 9 10 11 7 6 11 1 8
On a 32-bits system, the output is: 4 1 8 0 6 1 0 8 5 11 1 9 5 5 8 6

This does not affect SliceRandom::choose, thanks to some special handling in gen_index:

rand/src/seq/mod.rs

Lines 699 to 710 in 7ff0fc9

// Sample a number uniformly between 0 and `ubound`. Uses 32-bit sampling where
// possible, primarily in order to produce the same output on 32-bit and 64-bit
// platforms.
#[inline]
fn gen_index<R: Rng + ?Sized>(rng: &mut R, ubound: usize) -> usize {
if ubound <= (core::u32::MAX as usize) {
rng.gen_range(0..ubound as u32) as usize
} else {
rng.gen_range(0..ubound)
}
}

This means that, contrary to my expectations, my_slice.choose(&mut rng) and my_slice[rng.gen_range(0..my_slice.len())] are not equivalent (even ignoring the case of an empty slice).

I'm not sure if this qualifies as a bug, but it definitely surprised me, and took me some time to debug. I'm also not sure if it can be fixed, and if fixing it would constitute a breaking change.

In any case it could be documented, here in the book and here on Rng::gen_range.

@dhardy
Copy link
Member

dhardy commented Feb 20, 2024

We could add gen_index to the Rng trait and improve docs (by mentioning in gen_range).

Should we, in this case, make Rng::gen_index accept a range like Rng::gen_range?

We could go further and remove support for usize from Standard and gen_range, requiring usage of a special method like gen_index or casting... this would be quite disruptive.

@vks
Copy link
Collaborator

vks commented Mar 21, 2024

For this use case, we could also change gen_range to generate a u32 the range is small enough (for any integer type). This would make generating usize portable without adding more API surface.

@newpavlov
Copy link
Member

What about 16 bit targets? Would they generate u32 as well?

@vks
Copy link
Collaborator

vks commented Mar 22, 2024

Yes, we only have RngCore::next_u64 and RngCore::next_u32. The portability issue arises, because 64-bit platforms use next_u64 to generate usize, while 32-bit platforms use next_u32. If both used next_u32, the problem would be fixed. (For large ranges we would still have to use next_u64 on 64-bit platforms, but they would not fit into a 32-bit usize anyway.)

@dhardy
Copy link
Member

dhardy commented Mar 22, 2024

Similar, recent issues: #1373 #1415. Also, discussions by rand developers: comment in #1196, #809, #805. We should consider this again.

To be precise, there are two issues:

  1. In some cases, especially rng.gen_range(0..my_slice.len()), there is a hidden portability hazard.
  2. As mentioned above, my_slice.choose(&mut rng) and my_slice[rng.gen_range(0..my_slice.len())] are not equivalent.

Solutions

Document the portability hazard

In the book: Portability of usize

We could document this hazard in more places, especially on Rng::gen_range.

Test the range

We use this internal method for choose, added in #809:

fn gen_index<R: Rng + ?Sized>(rng: &mut R, ubound: usize) -> usize {
    if ubound <= (core::u32::MAX as usize) {
        rng.gen_range(0..ubound as u32) as usize
    } else {
        rng.gen_range(0..ubound)
    }
}

This is an extra conditional-jump on 64-bit platforms, but one which almost always resolves to the first branch in practice, and thus likely has a little cost in most cases.

We could use this approach also for the impl of Uniform (gen_range) for usize and isize.

Above + remove non-portable Standard impls

The leaves non-portable behaviour, which could be resolved by additionally removing usize and isize impls from Standard.

Note: sample_single_inclusive(0, usize::MAX, rng) should probably optimise down suitably as an alternative. This is still not great for anyone wanting a generic solution, but it's really hard to imagine why anyone would actually want that given that usize is used for sizes which are never uniformly distributed over all usize values (this even applies to offsets, unless your memory size is exactly usize::MAX + 1 bytes).

Remove all non-portable impls from Standard and Uniform

We implement equivalent code to the above, but not under the UniformSampler trait. Instead of Rng::gen_range, people must use Rng::gen_index.

This is more disruptive and less generic friendly than the above, for no obvious benefit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants