Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A from_str(_radix) analogue #64

Open
rpjohnst opened this issue Jun 21, 2020 · 9 comments
Open

A from_str(_radix) analogue #64

rpjohnst opened this issue Jun 21, 2020 · 9 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@rpjohnst
Copy link

Working with mostly-UTF-8 byte strings, the two biggest things my current project is missing are substring matching (which this crate provides) and integer parsing (which std provides only for str).

This is easily worked around by calling str::from_utf8 first, because if the input is not valid UTF-8 then it would never make it through a from_str/parse or from_str_radix anyway.

But it would be convenient to have these functions in bstr as well, to avoid the overhead from that redundant from_utf8 call.

@BurntSushi
Copy link
Owner

Yeah, I do kind of think this sort of thing is in scope. I've certainly wanted functions like this in the past as well.

Adding these sorts of routines for integers seems pretty reasonable, but this does open the door for wanting analogous parsing routines for floating point. That is definitely not a road I want to go down. It looks like the lexical project is already meeting this need anyway: https://docs.rs/lexical-core/0.7.4/lexical_core/

So I'd probably want to draw a line in the sand here: integer parsing APIs are okay, but not floats.

@BurntSushi BurntSushi added enhancement New feature or request good first issue Good for newcomers labels Jul 5, 2022
@BurntSushi
Copy link
Owner

This feels like a good "first issue" to me. In particular, routines could be added that are implemented by first converting the input to a &str and then reusing std's integer parsing. This isn't optimal, but we can always optimize it later.

For anyone who wants to work on this, it would probably be good to start by posting an API proposal in this issue. The API is probably the hardest part of this issue to be honest. For example, do we want a new sub-module with a bunch of concrete functions for each integer? Do we want to define a trait that we implement for all of the integer APIs? I kind of lean towards the latter.

alexjago added a commit to alexjago/bstr that referenced this issue Jul 19, 2022
@alexjago
Copy link

alexjago commented Jul 19, 2022

API Proposal: impl FromBStrRadix for {integer}

Rationale

Currently, we call e.g. i32::from_str_radix(src, radix).

By analogy, it seems correct to call i32::from_bstr_radix(src, radix).

Defining a new trait with that function and then implementing it for the integers makes that work (at the small cost of importing the trait).

It seems reasonable to name the trait after the function.

Implementation

I'm not quite ready to make a PR but I do have a fork up:
alexjago@3487786

What's in it: a new file src/from_bstr_radix.rs with the trait, impl and tests; plus two lines of changes to src/lib.rs to use the module and export the trait.

There are a couple of slight differences to the stdlib function.

  • Obviously, it takes src: &BStr rather than src: &str. Question: should this be an AsRef<[u8]> instead?
  • radix is the output type rather than always u32 (easily changeable if desired).
  • The error type is slightly different. The stdlib returns Err(ParseIntError {kind: IntErrorKind}) but since kind is private and there's no constructor, I just return Err(IntErrorKind::{variant}) directly. IMHO this is actually more useful.

edit to add: well, IntErrorKind would be more useful if it didn't require 1.55. Without it, might as well just wrap the std integer parsing.

@BurntSushi
Copy link
Owner

@alexjago Thanks! Nice work. Some responses:

  • It should definitely take AsRef<[u8]>, not &BStr. &[u8] is what all of the APIs in bstr are defined on.
  • I'm not sure why radix would be tied to the output type? u32 feels right to me here.
  • The error type should definitely not be just a bare IntErrorKind. We'll want to define our own ParseIntError. We can expose the kind through a method, just like std does.

The "bstr" in the FromBStrRadix name bugs me a bit, particularly since we're not taking a BStr but a &[u8]. Probably a better name is FromBytesRadix::from_bytes_radix. I don't love it, but I like it better than FromBStrRadix.

Feel free to submit a PR and we can work through the actual code itself.

@BurntSushi
Copy link
Owner

Requiring Rust 1.55 is fine. I'm going to bump the MSRV of bstr up higher. See the 1.0 issue.

@alexjago
Copy link

There's another API question which occurred to me around panicking and allocating.

The stdlib function (and therefore this one, for now) has an assert on the radix value... which includes the erroneous value in the error message.

That's an allocation, right? I don't think any other part of the functionality really needs to allocate.

So it's probably fixable with some clever #[cfg] work (non-alloc could just... not have a message).

But it also might be an option to redesign the API so that out-of-range radices are a recoverable error, presumably with an additional variant of IntErrorKind. Obviously if we do that then we can't just re-export the stdlib enum.

@BurntSushi
Copy link
Owner

You're saying an assert is an alloc? No, it isn't.

Let's stick with how std does things I think wherever we can. The radix is almost always going to be a static input determined by the programmer, so it seems sensible to panic for invalid radix values.

@alexjago
Copy link

Good to know that asserts don't alloc. (I got this misunderstanding because the format! macro returns String, but I suppose the underlying machinery just writes to some buffer...)

And yes, I agree that sticking to how std does things makes life more straightforward for everyone.

@BurntSushi
Copy link
Owner

Well, the assert might alloc when the assert fails. Sure. But that's okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants