Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Alphanumeric::gen_string #1128

Closed
vks opened this issue May 22, 2021 · 4 comments · Fixed by #1133
Closed

Implement Alphanumeric::gen_string #1128

vks opened this issue May 22, 2021 · 4 comments · Fixed by #1133

Comments

@vks
Copy link
Collaborator

vks commented May 22, 2021

Background

Some people wonder how to use Alphanumeric to generate a string (see #935). While we give an example in the documentation, it might be more obvious to provide a method for this, especially because the efficient implementation requires unsafe code.

Applications are the generation of unique IDs and nonces for APIs such as OAuth.

Feature request

Provide a method of Alphanumeric for generating strings.

Open questions:

  • Should we provide gen_string supporting only CryptoRngs and gen_string_insecure supporting any RNG?
  • Should we
    1. always generate a new string (fn gen_string(&self, &mut Rng, usize) -> String),
    2. modify an existing slice (fn gen_string(&self, &mut Rng, &mut str)) or
    3. push to an existing string (fn gen_string(&self, &mut Rng, &mut String, usize))?

i. is probably most efficient, because we can call String::from_utf8_unchecked, but it cannot reuse an existing buffer.
ii. is more flexible in that regard, but it requires an initialized buffer.
iii. is the most flexible, but offers little over converting to char and collecting.

@dhardy
Copy link
Member

dhardy commented May 22, 2021

Yes, it seems desirable to do something to solve this.

Perhaps we should put gen_string on the Rng trait instead? That way it's just:

impl Rng {
    fn gen_string(&mut self, len: usize) -> String;
}

Options (ii) and (iii) might be more flexible, but all of these are easy to implement and from what I understand the issue here is just to make generating a String easy. @abonander ? (Potentially we can support multiple variants, but I suggest not without reason.)

@abonander
Copy link

abonander commented May 22, 2021

Option iii can still be more performant than .take(N).map(char::from).collect() as it can use String::as_mut_vec() internally, at least for Alphanumeric.

The problem with Rng::gen_string() is it's not clear from the method what's actually going to be in the string. Alphanumeric seems like the sane default but Standard<char> also exists and some users might expect it to be based on that instead.

At the very least, maybe gen_alphanum_string() or something; if we want to maximize convenience, maybe just make it a free function like random() that uses thread_rng() internally? (This would actually satisfy my needs perfectly.)

pub fn random_alphanumeric_string(len: usize) -> String { ... }

In addition, for a more flexible API, I'm thinking:

pub trait Distribution {
    // ...

    pub fn gen_string(&self, rng: impl Rng, string: &mut String, len: usize) where Self: CharDistribution + Sized;
}

// new marker trait makes this a backwards-compatible addition as well?
pub trait CharDistribution {}

impl CharDistribution for Alphanumeric {}
impl CharDistribution for Uniform<char> {}
impl CharDistribution for Standard<char> {}

impl Distribution<u8> for Alphanumeric {
        pub fn gen_string(&self, rng: impl Rng, string: &mut String, len: usize) where Self: CharDistribution + Sized {
            // SAFETY: `Alphanumeric` is guaranteed to sample valid single-byte UTF-8
            unsafe { string.as_mut_vec().extend(self.sample_iter(rng).take(len)) }
        }
}

impl Distribution<char> for Standard<char> {
        pub fn gen_string(&self, rng: impl Rng, string: &mut String, len: usize) where Self: CharDistribution + Sized {
            // guarantee at most one allocation by reserving for the absolute worst-case
            string.reserve(len * 4);
            string.extend(self.sample_iter(rng).take(len));
            string.shrink_to_fit();
        }
}

impl Distribution<char> for Uniform<char> {
        pub fn gen_string(&self, rng: impl Rng, string: &mut String, len: usize) where Self: CharDistribution + Sized {
            // same as `Standard<char>`
            // although this could instead reserve based on the `char::utf8_len()` of the upper bound
        }
}

Alternatively instead of a new marker trait this could be a provided method for where Self: Distribution<char> (if that's allowed, I think that's a cyclic reference) with a default implementation that's identical to that of Standard<char> and then overridden by Alphanumeric and/or Uniform<char>.

In this case I don't think there's a need for a separate gen_string_insecure() as someone reaching this far into the library likely has alreadyu made an educated decision about which RNG they're using, and the convenient free function just does the "right" thing by default by using thread_rng().

@dhardy
Copy link
Member

dhardy commented May 24, 2021

Option iii can still be more performant than .take(N).map(char::from).collect() as it can use String::as_mut_vec() internally

Is this important? The advantage over gen_string is only when the buffer from an existing String can be re-used, I think, but do you often want to do that or would you normally just generate one String? I'd prefer the simpler API if there's not a reason for this. Optimisations where they do significant good, otherwise end-users can still roll their own impl when they need the best optimisation.

So the question is: how many people will use this, and which API will most of those want?

BTW if we support (iii) it should probably be called gen_to_string.

At the very least, maybe gen_alphanum_string() or something

Good point. So it's either that or Alphanumeric::gen_string(&mut rng).

In addition, for a more flexible API, I'm thinking:

Sorry, I don't like that. Why are we adding a new trait, and if so why modify Distribution?

This is a significant expansion of scope with unclear demand, plus it loses some of the original rationale: you don't need unsafe for the most efficient implementation.

(Also I don't get why you use &mut String input, then allocate, then reallocate (shrink). The only point of passing in &mut String I can see is to avoid allocations in frequent re-usage.)

@abonander
Copy link

(Also I don't get why you use &mut String input, then allocate, then reallocate (shrink). The only point of passing in &mut String I can see is to avoid allocations in frequent re-usage.)

The .shrink_to_fit() is whatever, the main idea was to do .reserve(len * 4) to allocate at most once. It was only an elevator pitch anyway.

If the idea is to keep to a limited scope then Alphanumeric::gen_string(impl Rng, usize) is sufficient for my purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants