Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a Base64 type #42

Open
Korvox opened this issue Aug 23, 2017 · 8 comments
Open

Adding a Base64 type #42

Korvox opened this issue Aug 23, 2017 · 8 comments

Comments

@Korvox
Copy link

Korvox commented Aug 23, 2017

Rust is at its best when you are using strong typing to indicate meaning. While in essence a base64 encoded string is always a [u8], functions that want a base64 encoded string would often rather be more specific than just taking String, &str, [u8], Vec<u8>, etc. Hence, a Base64<C> type - representing an encoded string, it could have convenience methods for converting into / from all the various string representations and collections, and would let library authors be more explicit in wanting a base64.

It also seems to be to be much more newbie friendly to show them an api like:

fn send(data: Base64<UrlSafe>)...

Instead of just asking for a string and saying it should be base64:

// Please give me a base64 encoded url-safe string!
fn send(data: String)...

There certainly is an argument publically facing APIs shouldn't be asking for data in base64, it should be doing that conversion internally. But even internal to consumer libraries, having concrete types is still very useful for readability and maintainability - you encode something somewhere, and then have to keep annotating your uses of it in other functions and structs as being base64.

This is similar to how Url and Uri types work in other crates. They internally use Strings or Vecs but expose a type to avoid ambiguity when handling it, and then have a range of convenience into's and from's to get coerce them into the standard types.

The <C> generic is so that you can require the Default or UrlSafe encoding scheme - otherwise you have to manually inspect an unsanitized input string from anywhere you cannot perfectly trust to check for the illegal characters. Another option would have to have Base64 be an enum of Default and UrlSafe, but that becomes a runtime variant match check with cumbersome need to match on it.

A related issue is #17, where having an actual Base64 type would obviously implement the specialized display.

I wouldn't mind forking and trying to draft out a Base64 type if you are at all interested!

@marshallpierce
Copy link
Owner

It's an interesting idea, though I think I'd have to see some examples of it in use to get a better idea of how the API should be structured. (For that matter, I'm not sure it even needs to be part of this crate.)

If you had the opportunity to create a Base64<T> from some bytes and hand it off to something else, wouldn't you be better off refactoring that destination to accept a &[u8] rather than a Base64<T>? In other words, I suspect the common case would be to refactor send() to just take a &[u8] and let it encode in the correct fashion (though wanting to pass in a cached copy of commonly used base64 would be a use case for a Base64-as-a-type solution) I think the opportunities to have a type-system-guaranteed handoff from something that a base64 encoding routine could emit to something that cares about its encoded form (presumably because it wants to decode it) may be few and far between. But, I could well be missing something...

Also, it's not just as simple as Default and UrlSafe: there's also padding and wrapping.

@Korvox
Copy link
Author

Korvox commented Aug 23, 2017

Also, it's not just as simple as Default and UrlSafe: there's also padding and wrapping.

You can infer based on the Base64 you get if its padded or wrapped - its apparent in the structure. You cannot tell, however, if it was meant to be UrlSafe or not, if the string never uses the Default or UrlSafe substitution characters. That being said, it is all fairly subjective on whether that is even valuable knowledge or not! It would be really strange to put someone into a situation as the owner of a Base64 object to try appending two Base64s with no conflict even if they used the two different encoding schemes (albeit in the same vein appending Base64's where one is padded / wrapped and one is not would also require handling).

you be better off refactoring that destination to accept a &[u8]

That is the whole argument. Right now it is perfectly normal when using base64 to pass around the encoded data as any of the various forms of u8 containers. But sending around any of those containers is information lossy because the data structure doesn't enforce its interpretation as base64.

For that matter, I'm not sure it even needs to be part of this crate.

Absolutely agreed. I think this weekend I'll try impl'ing this as a separate crate. If it becomes a popular UX pattern it could be merged any time after that.

@marshallpierce
Copy link
Owner

One small point -- you can't necessarily infer padding or wrapping because you might only ever see input that has length % 4 == 0, and you can't infer wrapping, because you might only ever see input shorter than whatever wrapping length you cared about.

Right now it is perfectly normal when using base64 to pass around the encoded data as any of the various forms of u8 containers. But sending around any of those containers is information lossy because the data structure doesn't enforce its interpretation as base64.

I was referring to passing around the data that you would get if you decoded the base64, not the ascii bytes that could be interpreted as base64.

Anyway, I'm curious to see what you come up with.

@Korvox
Copy link
Author

Korvox commented Sep 3, 2017

Almost forgot to do this! Just threw together an example over here. [1]

You are right, in that there is no real intuitive way to persist the config data in the struct without allocations (Configs would have to be zero sized phantomdata), and you need it around all the way for decoding purposes and for validation.

That being said, I think I discovered why having a Base64 type is really useful. It makes the whole interaction error-free. You can only construct a Base64 with bytes, the encoding cannot fail, and thus any Base64 that exists must be encoded and thus decoding cannot fail. That is where strong typing is really useful, and eliminates the entire error class surrounding trying to decode unencoded data.

The only piece that is missing is deserialization for this type that validates its actually base64 when being passed generic data. I'll try to do that soon™. That would be where you want the conditionality. For libraries in Rust, passing around base64 as a concrete type with decode guarantees is a real ergonomic win in my book.

[1] There are some hacks here around the public base64 api, like unwrapping the decode.

@Korvox Korvox changed the title Adding a Base64<C> type Adding a Base64 type Sep 3, 2017
@shaleh
Copy link

shaleh commented Dec 24, 2017

As a Haskell hacker too, I agree. Strong types communicate intent as well as contracts.

@AlexanderThaller
Copy link

AlexanderThaller commented Apr 9, 2018

Would be also nice while using serde. Something like this comes to mind:

[
  {
    "CreateIndex": 100,
    "ModifyIndex": 200,
    "LockIndex": 200,
    "Key": "zip",
    "Flags": 0,
    "Value": "dGVzdA==",
    "Session": "adf4238a-882b-9ddc-4a9d-5b6758e4159e"
  }
]
#[derive(Serialize, Deserialize)]
struct RootInterface {
  CreateIndex: i64,
  ModifyIndex: i64,
  LockIndex: i64,
  Key: String,
  Flags: i64,
  Value: Base64,
  Session: String,
}

@marshallpierce
Copy link
Owner

Interesting. I do like the idea in general of having less magic in a custom serializer and more explicit DTOs. Perhaps there's room for that in the base64-serde crate.

@ggriffiniii
Copy link
Contributor

I realize this is an ancient issue, but I thought I would mention that I implemented an alternative base64 crate that may be a little better fit for a strongly typed concept like this. It's called radix64 and the big difference that may help here is that each configuration is a distinct type Std, StdNoPad, UrlSafe, UrlSafeNoPad, Crypt. Having the type of encoding as part of the rust type seems like it would benefit this concept so just thought I would mention it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants