New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extant base64-encoded data is accepted; can it be rejected? #181
Comments
Trailing bits are a different thing -- see section 3.5. The last chunk of base64 symbols may have unused bits depending on input size, and that config setting controls whether or not having those bits be non-zero is reported as an error or not. (Other buggy impls may set those bits.) Padding has no utility. It should never have been part of the spec, IMO, but either way it is not needed to decode. The spec leaves open room for implementations to not pad, which is widely done in practice, so this crate doesn't check for |
The part of the spec you link to is for
Do you have data to back up this claim? |
My use case is in trying to decide whether a specification for a service interface modelling language should reject unpadded base64-encoded data. The spec is implemented in several programming langauges, and Rust seems to be alone in its handling here, so it's a source of possible behavior deviations among the different language implementations. |
I acknowledge that padding is generally useless nowadays in most use cases, but my interpretation of the spec is that by default, unpadded base64-encoded data should be rejected unless you have special knowledge of the protocol/transport. I think this should translate, in the context of a generic base64 library like this one, into rejecting things like That interpretation would be a breaking change for this crate. What do you think of including an option in |
Here is some data to back up this claim. This is how some major programming languages' standard libraries (or canonical/foundational base64 library implementations) handle unpadded base64-encoded data like PythonLink to playground: https://replit.com/@dazedviper/Base64-Python#main.py
JavaScriptThis is in the Firefox browser:
This is in Chromium browser:
Node's Haskell's
|
OK, I'm convinced it's worth detecting for you weirdos who wish base64 was always canonical. :) As it happens, #182 just popped up today, which would be addressed by same thing. And no, I don't have data re: absence of padding in practice -- it's just something I've noticed in my travels because, as you can imagine, I have a professional interest in base64. ;) |
Oh wow, I didn't know about that paper nor had I considered using this behavior for attacks. I am now entirely convinced that rejecting unpadded encoded data should be the default behavior. What are the odds that we both report this within days... 🤯 |
See if #198 addresses your use case. |
Released in 0.20.0. |
YmxvYg=
is technically invalid base64 according to the spec (its Unicode code point length is not divisible by 4, it's missing a padding character), but this crate is able to decode it without issues; I'm guessing because it can unambiguously decode it (decode_allow_trailing_bits
has no effect):Yields:
Can I configure the crate so that if fails at decoding inputs like this one?
The text was updated successfully, but these errors were encountered: