Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SampleUniform for char #1068

Merged
merged 6 commits into from Nov 27, 2020
Merged

Implement SampleUniform for char #1068

merged 6 commits into from Nov 27, 2020

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Nov 19, 2020

This enables rng.gen_range('A'..='Z').

I don't recall why we didn't do this before but I don't see any real issues. Thoughts @vks / @newpavlov?

@vks
Copy link
Collaborator

vks commented Nov 19, 2020

It would be nice to document the behavior that skips some chars.

fn new_(mut low: u32, mut high: u32) -> Self {
if low >= 0xD800 {
low -= 0xE000 - 0xD800;
}
Copy link
Member

@newpavlov newpavlov Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth to add the following helper functions:

/// UTF-16 surrogate range start
const RANGE_START: u32 = 0xD800
/// UTF-16 surrogate range size
const RANGE_SIZE: u32 = 0xE000 - RANGE_START;

/// Convert `char` to compressed `u32`
fn to_comp_u32(c: char) -> u32 {
    match c as u32 {
        c if c >= RANGE_START => c - RANGE_SIZE,
        c => c,
    }
}

/// Convert compressed `u32` to `char`
fn from_comp_u32(mut c: u32) -> u32 {
    if c >= RANGE_START {
        c += RANGE_SIZE;
    }
    // SAFETY: checked that `c` is a legal unicode value, since `c` is initially
    // constructed from `char` it can not be larger than 0xFFFE, i.e. safety of
    // this function relies on the constructor methods
    unsafe { core::char::from_u32_unchecked(c) }
}

In addition to make code easier to understand it would allow to remove the new_ method.

IIRC it's allowed to have private functions safety of which relies on code in the current module (e.g. it applies to the Vec code).

@newpavlov
Copy link
Member

I wonder if in future it will be possible to define a sampler which would accept several ranges:

fn new<const N: usize>(ranges: [RangeInclusive<char>; N]) -> Self { .. }

It would allow us to write new(['0'..='9', 'a'..='z', 'A'..='Z']). Unfortunately it will not be possible to mix Range and RangeInclusive, but I guess you rarely need non-inclusive char ranges.

@dhardy
Copy link
Member Author

dhardy commented Nov 20, 2020

Updated. I prefer to keep new_ over @newpavlov's suggestion since it is the common constructor (although probably it will always be inlined anyway).

src/distributions/uniform.rs Outdated Show resolved Hide resolved
@newpavlov
Copy link
Member

I prefer to keep new_

The idea was to reduce number of if branches and the constant uses. The initialization methods can look like this:

fn new<B1, B2>(low_b: B1, high_b: B2) -> Self
where
    B1: SampleBorrow<Self::X> + Sized,
    B2: SampleBorrow<Self::X> + Sized,
{
    let low = to_comp_u32(*low_b.borrow());
    let high = to_comp_u32(*high_b.borrow());
    let sampler = UniformInt::<u32>::new(low, high);
    Self { sampler }
}

fn new_inclusive<B1, B2>(low_b: B1, high_b: B2) -> Self
where
    B1: SampleBorrow<Self::X> + Sized,
    B2: SampleBorrow<Self::X> + Sized,
{
    let low = to_comp_u32(*low_b.borrow());
    let high = to_comp_u32(*high_b.borrow());
    let sampler = UniformInt::<u32>::new_inclusive(low, high);
    Self { sampler }
}

fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> Self::X {
    from_comp_u32(self.sampler.sample(rng))
}

I think it will be a bit easier to understand. Note that we do not need asserts here. Plus it will be possible to explain in detail our handling of the char gap in the helper function comments.

@dhardy
Copy link
Member Author

dhardy commented Nov 21, 2020

The idea was to reduce number of if branches

It's the same number of branches (other than you removing some asserts — ones that should be there for good error reporting and should allow the ones within UniformInt to be optimised out).

But okay, it does look a little neater.

@dhardy
Copy link
Member Author

dhardy commented Nov 23, 2020

I implemented your suggestion @newpavlov. The error messages would be identical (other than code location), so they are mostly redundant.

@dhardy
Copy link
Member Author

dhardy commented Nov 23, 2020

It would allow us to write new(['0'..='9', 'a'..='z', 'A'..='Z'])

This would require a different implementation. It could be generalised but then the distribution object would require allocation (unless we used something like tinyvec to avoid this in common cases).

@dhardy
Copy link
Member Author

dhardy commented Nov 25, 2020

Updated with a couple of fixes. @vks @newpavlov are you happy with this now?

@vks
Copy link
Collaborator

vks commented Nov 25, 2020

Looks good to me!

@dhardy
Copy link
Member Author

dhardy commented Nov 26, 2020

Apparently this is a problem now:

Found invalid urls in rand_distr/struct.Uniform.html:

	Fragment #method.gen_range at /home/travis/build/rust-random/rand/target/doc/rand/rng/trait.Rng.html does not exist!

Found invalid urls in rand_distr/struct.Standard.html:

	Fragment #method.fill at /home/travis/build/rust-random/rand/target/doc/rand/rng/trait.Rng.html does not exist!

Found invalid urls in rand_distr/uniform/struct.Uniform.html:

	Fragment #method.gen_range at /home/travis/build/rust-random/rand/target/doc/rand/rng/trait.Rng.html does not exist!

Found invalid urls in rand_distr/uniform/index.html:

	Fragment #method.gen_range at /home/travis/build/rust-random/rand/target/doc/rand/rng/trait.Rng.html does not exist!

The Uniform type is declared in rand::distributions and imported into rand_distr; in the former location the link Rng::gen_range works, while in the latter rand::Rng::gen_range works... unfortunately neither works from the other place. Relative URL links won't work either.

I guess the solution is to import Rng in rand_distr/lib.rs.

@dhardy
Copy link
Member Author

dhardy commented Nov 26, 2020

Looks good to me!

BTW once CI passes I still need a "green review" from one of you to be able to merge this.

@vks
Copy link
Collaborator

vks commented Nov 26, 2020

BTW once CI passes I still need a "green review" from one of you to be able to merge this.

Sorry, I struggled a bit how to do that with the GitHub app...

@dhardy
Copy link
Member Author

dhardy commented Nov 26, 2020

The same error is still reported... ideas? I can't actually reproduce the error message locally, I just noticed the same symptoms.

@vks
Copy link
Collaborator

vks commented Nov 27, 2020

I also don't get the warning, but the link is not working anyway. IIRC, we always had that problem with the reexported distributions?

@dhardy
Copy link
Member Author

dhardy commented Nov 27, 2020

Maybe (doesn't look like a new problem), but I thought CI used to pass. Whatever.

@dhardy dhardy merged commit eb02f0e into rust-random:master Nov 27, 2020
@vks
Copy link
Collaborator

vks commented Nov 27, 2020

So it seems an invalid link is generated: https://rust-random.github.io/rand/rand/rng/trait.Rng.html#method.gen_range

Note that the rng module is private, so the correct link should be the reexported Rng: https://rust-random.github.io/rand/rand/trait.Rng.html#method.gen_range

Maybe a rustdoc bug?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants