Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base64 Malleability protection - support for canonical decoder #182

Closed
kchalkias opened this issue Apr 13, 2022 · 6 comments
Closed

Base64 Malleability protection - support for canonical decoder #182

kchalkias opened this issue Apr 13, 2022 · 6 comments
Assignees

Comments

@kchalkias
Copy link

kchalkias commented Apr 13, 2022

Although the related Base64 RFC allows for non-canonical implementations, there exist many applications where canonicity is of high importance. Current implementation does not provide a canonical_decode mode and the recent AsiaCCS 2022 paper https://eprint.iacr.org/2022/361 (from Mysten Labs, Facebook research and GMU) compares this base64 crate with the base64ct one.
A number of real world attacks have been identified already:
--- I happen to lead this research, quoting some of the attacks we actually performed live in large scale systems ---

TL;DR almost every base64 library behaves differently in the default decoding functionality. I organized related hackathons, participated in bug bounties & ran surveys on what the avg developer would pick to encode bytes (or files) to String. Unfortunately, the majority of engs assume Base64 is by design canonical, and that the same input bytes can only result to a unique base64 output. WRONG, encoders can be manipulated & decoders won't capture it!

Exploits of this attack in large scale systems are possible today; we managed to inform a few companies whose products assumed base64 uniqueness:

  • in one of them we could buy sports tickets for free (we informed the admin of a popular event ticket website re this exploit)
  • in another we could bypass rate limiting and perform DoS by submitting huge base64 strings
  • in others we broke canonical JSON guarantees, even when the potential victim used a correctly implemented canonical JSON parser
  • in many, we have proven that base64 primary keys in databases do NOT provide logical uniqueness if the base64 String is received directly from the client. We managed to add the same userID multiple times in a db table.
@marshallpierce
Copy link
Owner

Thanks for the report. It looks like optionally detecting invalid padding is what's needed to allow canonical decoding, if I'm reading this correctly.

@kchalkias
Copy link
Author

kchalkias commented Apr 13, 2022

Thanks for the report. It looks like optionally detecting invalid padding is what's needed to allow canonical decoding, if I'm reading this correctly.

Yes exactly, padding rules should be consistent and always return an error if it's not canonical. The same input should always be successfully decoded from a single base64 representation (its canonical format). Feel free to use the test vectors from the paper as well, we'll soon add more official ones.

@str4d
Copy link

str4d commented Sep 10, 2022

Another place that a canonical decoder mode would be useful is implementing the strict parsing of RFC 7468 PEM-style text encodings.

I myself believed that I was correctly implementing the above with this crate, because base64::decode_config_slice takes a config argument, and I was passing in STANDARD (which is documented as Standard character set with padding) rather than STANDARD_NO_PAD. base64::decode_config_slice does not document that it ignores the padding setting of the config, and I missed the note in the crate root documentation that No padding at all is valid.

@marshallpierce
Copy link
Owner

See what you think of #198. Does it fully address your concerns?

@marshallpierce
Copy link
Owner

Released in 0.20.0. @kchalkias let me know if this implementation doesn't fully address the issue.

@kchalkias
Copy link
Author

Thanks @marshallpierce, will test behavior, please feel free to also have a test using the test-vectors provided in the paper in table 2: https://eprint.iacr.org/2022/361.pdf (will open an issue just for tracking)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants