marshallpierce · marshallpierce · Dec 10, 2022 · Jul 20, 2022 · Aug 26, 2022 · Nov 26, 2022
diff --git a/README.md b/README.md
@@ -8,7 +8,11 @@ Made with CLion. Thanks to JetBrains for supporting open source!
 
 It's base64. What more could anyone want?
 
-This library's goals are to be *correct* and *fast*. It's thoroughly tested and widely used. It exposes functionality at multiple levels of abstraction so you can choose the level of convenience vs performance that you want, e.g. `decode_engine_slice` decodes into an existing `&mut [u8]` and is pretty fast (2.6GiB/s for a 3 KiB input), whereas `decode_engine` allocates a new `Vec<u8>` and returns it, which might be more convenient in some cases, but is slower (although still fast enough for almost any purpose) at 2.1 GiB/s.
+This library's goals are to be *correct* and *fast*. It's thoroughly tested and widely used. It exposes functionality at
+multiple levels of abstraction so you can choose the level of convenience vs performance that you want,
+e.g. `decode_engine_slice` decodes into an existing `&mut [u8]` and is pretty fast (2.6GiB/s for a 3 KiB input),
+whereas `decode_engine` allocates a new `Vec<u8>` and returns it, which might be more convenient in some cases, but is
+slower (although still fast enough for almost any purpose) at 2.1 GiB/s.
 
 ## Example
 
@@ -32,7 +36,8 @@ See the [docs](https://docs.rs/base64) for all the details.
 
 Remove non-base64 characters from your input before decoding.
 
-If you have a `Vec` of base64, [retain](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain) can be used to strip out whatever you need removed.
+If you have a `Vec` of base64, [retain](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.retain) can be used to
+strip out whatever you need removed.
 
 If you have a `Read` (e.g. reading a file or network socket), there are various approaches.
 
@@ -43,15 +48,45 @@ If you have a `Read` (e.g. reading a file or network socket), there are various
 
 [line-wrap](https://crates.io/crates/line-wrap) does just that.
 
+### I want canonical base64 encoding/decoding.
+
+First, don't do this. You should no more expect Base64 to be canonical than you should expect compression algorithms to
+produce canonical output across all usage in the wild (hint: they don't).
+However, [people are drawn to their own destruction like moths to a flame](https://eprint.iacr.org/2022/361), so here we
+are.
+
+There are two opportunities for non-canonical encoding (and thus, detection of the same during decoding): the final bits
+of the last encoded token in two or three token suffixes, and the `=` token used to inflate the suffix to a full four
+tokens.
+
+The trailing bits issue is unavoidable: with 6 bits available in each encoded token, 1 input byte takes 2 tokens,
+with the second one having some bits unused. Same for two input bytes: 16 bits, but 3 tokens have 18 bits. Unless we
+decide to stop shipping whole bytes around, we're stuck with those extra bits that a sneaky or buggy encoder might set
+to 1 instead of 0.
+
+The `=` pad bytes, on the other hand, are entirely a self-own by the Base64 standard. They do not affect decoding other
+than to provide an opportunity to say "that padding is incorrect". Exabytes of storage and transfer have no doubt been
+wasted on pointless `=` bytes. Somehow we all seem to be quite comfortable with, say, hex-encoded data just stopping
+when it's done rather than requiring a confirmation that the author of the encoder could count to four. Anyway, there
+are two ways to make pad bytes predictable: require canonical padding to the next multiple of four bytes as per the RFC,
+or, if you control all producers and consumers, save a few bytes by requiring no padding (especially applicable to the
+url-safe alphabet).
+
+All `Engine` implementations must at a minimum support treating non-canonical padding of both types as an error, and
+optionally may allow other behaviors.
+
 ## Rust version compatibility
 
-The minimum required Rust version is 1.57.0.
+The minimum supported Rust version is 1.57.0.
 
 # Contributing
 
-Contributions are very welcome. However, because this library is used widely, and in security-sensitive contexts, all PRs will be carefully scrutinized. Beyond that, this sort of low level library simply needs to be 100% correct. Nobody wants to chase bugs in encoding of any sort.
+Contributions are very welcome. However, because this library is used widely, and in security-sensitive contexts, all
+PRs will be carefully scrutinized. Beyond that, this sort of low level library simply needs to be 100% correct. Nobody
+wants to chase bugs in encoding of any sort.
 
-All this means that it takes me a fair amount of time to review each PR, so it might take quite a while to carve out the free time to give each PR the attention it deserves. I will get to everyone eventually!
+All this means that it takes me a fair amount of time to review each PR, so it might take quite a while to carve out the
+free time to give each PR the attention it deserves. I will get to everyone eventually!
 
 ## Developing
 
@@ -63,13 +98,22 @@ rustup run nightly cargo bench
 
 ## no_std
 
-This crate supports no_std. By default the crate targets std via the `std` feature. You can deactivate the `default-features` to target core instead. In that case you lose out on all the functionality revolving around `std::io`, `std::error::Error` and heap allocations. There is an additional `alloc` feature that you can activate to bring back the support for heap allocations.
+This crate supports no_std. By default the crate targets std via the `std` feature. You can deactivate
+the `default-features` to target `core` instead. In that case you lose out on all the functionality revolving
+around `std::io`, `std::error::Error`, and heap allocations. There is an additional `alloc` feature that you can activate
+to bring back the support for heap allocations.
 
 ## Profiling
 
-On Linux, you can use [perf](https://perf.wiki.kernel.org/index.php/Main_Page) for profiling. Then compile the benchmarks with `rustup nightly run cargo bench --no-run`.
+On Linux, you can use [perf](https://perf.wiki.kernel.org/index.php/Main_Page) for profiling. Then compile the
+benchmarks with `rustup nightly run cargo bench --no-run`.
 
-Run the benchmark binary with `perf` (shown here filtering to one particular benchmark, which will make the results easier to read). `perf` is only available to the root user on most systems as it fiddles with event counters in your CPU, so use `sudo`. We need to run the actual benchmark binary, hence the path into `target`. You can see the actual full path with `rustup run nightly cargo bench -v`; it will print out the commands it runs. If you use the exact path that `bench` outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want to `cargo clean` so you have only one `benchmarks-` binary (they tend to accumulate).
+Run the benchmark binary with `perf` (shown here filtering to one particular benchmark, which will make the results
+easier to read). `perf` is only available to the root user on most systems as it fiddles with event counters in your
+CPU, so use `sudo`. We need to run the actual benchmark binary, hence the path into `target`. You can see the actual
+full path with `rustup run nightly cargo bench -v`; it will print out the commands it runs. If you use the exact path
+that `bench` outputs, make sure you get the one that's for the benchmarks, not the tests. You may also want
+to `cargo clean` so you have only one `benchmarks-` binary (they tend to accumulate).
 
 ```bash
 sudo perf record target/release/deps/benchmarks-* --bench decode_10mib_reuse
@@ -81,7 +125,10 @@ Then analyze the results, again with perf:
 sudo perf annotate -l
 ```
 
-You'll see a bunch of interleaved rust source and assembly like this. The section with `lib.rs:327` is telling us that 4.02% of samples saw the `movzbl` aka bit shift as the active instruction. However, this percentage is not as exact as it seems due to a phenomenon called *skid*. Basically, a consequence of how fancy modern CPUs are is that this sort of instruction profiling is inherently inaccurate, especially in branch-heavy code.
+You'll see a bunch of interleaved rust source and assembly like this. The section with `lib.rs:327` is telling us that
+4.02% of samples saw the `movzbl` aka bit shift as the active instruction. However, this percentage is not as exact as
+it seems due to a phenomenon called *skid*. Basically, a consequence of how fancy modern CPUs are is that this sort of
+instruction profiling is inherently inaccurate, especially in branch-heavy code.
 
 ```text
  lib.rs:322    0.70 :     10698:       mov    %rdi,%rax
@@ -103,10 +150,10 @@ You'll see a bunch of interleaved rust source and assembly like this. The sectio
     0.00 :        106ab:       je     1090e <base64::decode_config_buf::hbf68a45fefa299c1+0x46e>
 ```
 
-
 ## Fuzzing
 
-This uses [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz). See `fuzz/fuzzers` for the available fuzzing scripts. To run, use an invocation like these:
+This uses [cargo-fuzz](https://github.com/rust-fuzz/cargo-fuzz). See `fuzz/fuzzers` for the available fuzzing scripts.
+To run, use an invocation like these:
 
 ```bash
 cargo +nightly fuzz run roundtrip
@@ -115,7 +162,6 @@ cargo +nightly fuzz run roundtrip_random_config -- -max_len=10240
 cargo +nightly fuzz run decode_random
 ```
 
-
 ## License
 
 This project is dual-licensed under MIT and Apache 2.0.

diff --git a/RELEASE-NOTES.md b/RELEASE-NOTES.md
@@ -1,8 +1,12 @@
 # 0.20.0
 
-## Next
+## 0.20.0-beta.1
+
+### Breaking changes
 
 - Update MSRV to 1.57.0
+- Decoding can now either ignore padding, require correct padding, or require no padding. The default is to require correct padding.
+  - The `NO_PAD` config now requires that padding be absent when decoding.
 
 ## 0.20.0-alpha.1
 

diff --git a/fuzz/Cargo.toml b/fuzz/Cargo.toml
@@ -11,7 +11,8 @@ cargo-fuzz = true
 [dependencies]
 rand = "0.6.1"
 rand_pcg = "0.1.1"
-ring = "0.13.5"
+sha2 = "0.10.6"
+
 [dependencies.base64]
 path = ".."
 [dependencies.libfuzzer-sys]

diff --git a/fuzz/fuzzers/roundtrip_no_pad.rs b/fuzz/fuzzers/roundtrip_no_pad.rs
@@ -1,11 +1,14 @@
 #![no_main]
-#[macro_use] extern crate libfuzzer_sys;
+#[macro_use]
+extern crate libfuzzer_sys;
 extern crate base64;
 
-use base64::engine::fast_portable;
+use base64::engine::{self, fast_portable};
 
 fuzz_target!(|data: &[u8]| {
-    let config = fast_portable::FastPortableConfig::new().with_encode_padding(false);
+    let config = fast_portable::FastPortableConfig::new()
+        .with_encode_padding(false)
+        .with_decode_padding_mode(engine::DecodePaddingMode::RequireNone);
     let engine = fast_portable::FastPortable::from(&base64::alphabet::STANDARD, config);
 
     let encoded = base64::encode_engine(&data, &engine);

diff --git a/fuzz/fuzzers/utils.rs b/fuzz/fuzzers/utils.rs
@@ -1,18 +1,20 @@
 extern crate rand;
 extern crate rand_pcg;
-extern crate ring;
+extern crate sha2;
 
-use base64::{alphabet, engine::fast_portable};
+use base64::{alphabet, engine::{self, fast_portable}};
 use self::rand::{Rng, SeedableRng};
 use self::rand_pcg::Pcg32;
-use self::ring::digest;
+use self::sha2::Digest as _;
 
 pub fn random_engine(data: &[u8]) -> fast_portable::FastPortable {
     // use sha256 of data as rng seed so it's repeatable
-    let sha = digest::digest(&digest::SHA256, data);
+    let mut hasher = sha2::Sha256::new();
+    hasher.update(data);
+    let sha = hasher.finalize();
 
     let mut seed: [u8; 16] = [0; 16];
-    seed.copy_from_slice(&sha.as_ref()[0..16]);
+    seed.copy_from_slice(&sha.as_slice()[0..16]);
 
     let mut rng = Pcg32::from_seed(seed);
 
@@ -22,9 +24,16 @@ pub fn random_engine(data: &[u8]) -> fast_portable::FastPortable {
         alphabet::STANDARD
     };
 
+    let encode_padding = rng.gen();
+    let decode_padding = if encode_padding {
+        engine::DecodePaddingMode::RequireCanonical
+    } else {
+        engine::DecodePaddingMode::RequireNone
+    };
     let config = fast_portable::FastPortableConfig::new()
-        .with_encode_padding(rng.gen())
-        .with_decode_allow_trailing_bits(rng.gen());
+        .with_encode_padding(encode_padding)
+        .with_decode_allow_trailing_bits(rng.gen())
+        .with_decode_padding_mode(decode_padding);
 
     fast_portable::FastPortable::from(&alphabet, config)
 }
diff --git a/src/decode.rs b/src/decode.rs
@@ -13,6 +13,7 @@ use std::error;
 #[derive(Clone, Debug, PartialEq, Eq)]
 pub enum DecodeError {
     /// An invalid byte was found in the input. The offset and offending byte are provided.
+    /// Padding characters (`=`) interspersed in the encoded form will be treated as invalid bytes.
     InvalidByte(usize, u8),
     /// The length of the input is invalid.
     /// A typical cause of this is stray trailing whitespace or other separator bytes.
@@ -22,9 +23,12 @@ pub enum DecodeError {
     InvalidLength,
     /// The last non-padding input symbol's encoded 6 bits have nonzero bits that will be discarded.
     /// This is indicative of corrupted or truncated Base64.
-    /// Unlike InvalidByte, which reports symbols that aren't in the alphabet, this error is for
+    /// Unlike `InvalidByte`, which reports symbols that aren't in the alphabet, this error is for
     /// symbols that are in the alphabet but represent nonsensical encodings.
     InvalidLastSymbol(usize, u8),
+    /// The nature of the padding was not as configured: absent or incorrect when it must be
+    /// canonical, or present when it must be absent, etc.
+    InvalidPadding,
 }
 
 impl fmt::Display for DecodeError {
@@ -35,6 +39,7 @@ impl fmt::Display for DecodeError {
             Self::InvalidLastSymbol(index, byte) => {
                 write!(f, "Invalid last symbol {}, offset {}.", byte, index)
             }
+            Self::InvalidPadding => write!(f, "Invalid padding"),
         }
     }
 }
@@ -46,6 +51,7 @@ impl error::Error for DecodeError {
             Self::InvalidByte(_, _) => "invalid byte",
             Self::InvalidLength => "invalid length",
             Self::InvalidLastSymbol(_, _) => "invalid last symbol",
+            Self::InvalidPadding => "invalid padding",
         }
     }
 
@@ -192,10 +198,12 @@ pub fn decode_engine_slice<E: Engine, T: AsRef<[u8]>>(
 #[cfg(test)]
 mod tests {
     use super::*;
-    use crate::{encode::encode_engine_string, tests::assert_encode_sanity};
-
-    use crate::engine::Config;
-    use crate::tests::random_engine;
+    use crate::{
+        alphabet,
+        encode::encode_engine_string,
+        engine::{fast_portable, fast_portable::FastPortable, Config},
+        tests::{assert_encode_sanity, random_engine},
+    };
     use rand::{
         distributions::{Distribution, Uniform},
         Rng, SeedableRng,
@@ -350,12 +358,13 @@ mod tests {
 
     #[test]
     fn decode_engine_estimation_works_for_various_lengths() {
+        let engine = FastPortable::from(&alphabet::STANDARD, fast_portable::NO_PAD);
         for num_prefix_quads in 0..100 {
             for suffix in &["AA", "AAA", "AAAA"] {
                 let mut prefix = "AAAA".repeat(num_prefix_quads);
                 prefix.push_str(suffix);
                 // make sure no overflow (and thus a panic) occurs
-                let res = decode_engine(prefix, &DEFAULT_ENGINE);
+                let res = decode_engine(prefix, &engine);
                 assert!(res.is_ok());
             }
         }

diff --git a/src/encode.rs b/src/encode.rs
@@ -123,7 +123,7 @@ pub fn encode_engine_string<E: Engine, T: AsRef<[u8]>>(
 ///     &base64::engine::DEFAULT_ENGINE);
 ///
 /// // shorten our vec down to just what was written
-/// buf.resize(bytes_written, 0);
+/// buf.truncate(bytes_written);
 ///
 /// assert_eq!(s, base64::decode(&buf).unwrap().as_slice());
 /// ```