Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust: various unsafe traits/functions meant for gencode in public API #6627

Open
TethysSvensson opened this issue May 10, 2021 · 28 comments
Open

Comments

@TethysSvensson
Copy link
Contributor

The following safe code will segfault:

fn main() {
    let ptr: &&u8 = flatbuffers::follow_cast_ref(&[1, 2, 3, 4, 5, 6, 7, 8], 0);
    println!("{}", **ptr);
}
@CasperN
Copy link
Collaborator

CasperN commented May 10, 2021

Thanks for documenting these. I'm going to close the other issues and combine them with this one:

For most of these, since these are typically used within the Flatbuffers library functions or generated code and I think the usual usage is safe, I'm not going to make fixing this the highest priority. The safety model we're going for is that flatbuffers are constructed with a FlatBufferBuilder and read from the root_* variants -- those APIs must be safe. We could mark the above types/traits them as unsafe to be rigorous, and/or move them to a module like flatbuffers::for_gencode_only which will make their internal use more obvious (though obviously that'd still be sub optimal).

I do want to deprecate SafeSliceAccess, though. That one was an especially bad idea, and getting unaligned access is in the public API (sigh).

@CasperN
Copy link
Collaborator

CasperN commented May 10, 2021

Let's try to fix this for the 3.0 release #6636

@jorgecarleitao
Copy link

jorgecarleitao commented Oct 31, 2021

This is quite serious:

namespace MyGame.Sample;

enum Color:byte { Red = 0, Green, Blue = 2 }

is compiled to

// automatically generated by the FlatBuffers compiler, do not modify



use std::mem;
use std::cmp::Ordering;

extern crate flatbuffers;
use self::flatbuffers::{EndianScalar, Follow};

#[allow(unused_imports, dead_code)]
pub mod my_game {

  use std::mem;
  use std::cmp::Ordering;

  extern crate flatbuffers;
  use self::flatbuffers::{EndianScalar, Follow};
#[allow(unused_imports, dead_code)]
pub mod sample {

  use std::mem;
  use std::cmp::Ordering;

  extern crate flatbuffers;
  use self::flatbuffers::{EndianScalar, Follow};

#[deprecated(since = "2.0.0", note = "Use associated constants instead. This will no longer be generated in 2021.")]
pub const ENUM_MIN_COLOR: i8 = 0;
#[deprecated(since = "2.0.0", note = "Use associated constants instead. This will no longer be generated in 2021.")]
pub const ENUM_MAX_COLOR: i8 = 2;
#[deprecated(since = "2.0.0", note = "Use associated constants instead. This will no longer be generated in 2021.")]
#[allow(non_camel_case_types)]
pub const ENUM_VALUES_COLOR: [Color; 3] = [
  Color::Red,
  Color::Green,
  Color::Blue,
];

#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Default)]
#[repr(transparent)]
pub struct Color(pub i8);
#[allow(non_upper_case_globals)]
impl Color {
  pub const Red: Self = Self(0);
  pub const Green: Self = Self(1);
  pub const Blue: Self = Self(2);

  pub const ENUM_MIN: i8 = 0;
  pub const ENUM_MAX: i8 = 2;
  pub const ENUM_VALUES: &'static [Self] = &[
    Self::Red,
    Self::Green,
    Self::Blue,
  ];
  /// Returns the variant's name or "" if unknown.
  pub fn variant_name(self) -> Option<&'static str> {
    match self {
      Self::Red => Some("Red"),
      Self::Green => Some("Green"),
      Self::Blue => Some("Blue"),
      _ => None,
    }
  }
}
// ...
impl<'a> flatbuffers::Follow<'a> for Color {
  type Inner = Self;
  #[inline]
  fn follow(buf: &'a [u8], loc: usize) -> Self::Inner {
    let b = unsafe {
      flatbuffers::read_scalar_at::<i8>(buf, loc)
    };
    Self(b)
  }
}

impl flatbuffers::Push for Color {
    type Output = Color;
    #[inline]
    fn push(&self, dst: &mut [u8], _rest: &[u8]) {
        unsafe { flatbuffers::emplace_scalar::<i8>(dst, self.0); }
    }
}
// ...

which allows both read and write out of bounds in safe code. Specifically, the following
tests do not pass miri:

mod generated;

#[cfg(test)]
mod test {
    use super::*;

    use flatbuffers::Follow;
    use flatbuffers::Push;

    #[test]
    fn write() {
        let color = generated::Color::Blue;
        color.push(&mut [], &[]);
    }

    #[test]
    fn read() {
        generated::Color::follow(&[1], 1);
    }
}

and constitute CWE-787 and CWE-125 respectively, ranking 1st and 3rd in the most dangerous vulnerabilities at https://cwe.mitre.org/top25/archive/2021/2021_cwe_top25.html.

@CasperN
Copy link
Collaborator

CasperN commented Oct 31, 2021

Yep a bunch of the traits, especially Follow, are unsafe and should be refactored as such. Using it outside the generated code is not intended and a great way of shooting yourself in the foot. @aardappel, I'm not going to be available to contribute for a while, but marking Follow and friends stuff unsafe is an easy first contribution if you find anyone willing.

@aardappel
Copy link
Collaborator

Not sure what rustaceans are around to take this on..
@rw

@marmeladema
Copy link
Contributor

Is this all about marking those traits and their methods unsafe? If so, I can probably take this.

@CasperN
Copy link
Collaborator

CasperN commented Nov 23, 2021

Yes and please do!

@marmeladema
Copy link
Contributor

I started by marking Follow::follow as unsafe in #6951

@rw
Copy link
Collaborator

rw commented Nov 23, 2021

@CasperN WDYT about making "follow" and "push" checked and/or typesafe, so that users can't read or write invalid data? I'd like to avoid turning everything in the library into unsafe when we could possibly solve the problem in a more Rust-y way.

@marmeladema
Copy link
Contributor

@rw @CasperN in the mcve provided here, the vulnerabilities come from read_scalar and emplace_scalar which are not doing any bound checks, probably for performance reasons.

One thing we could do is add a debug assert so that at least it would panic in debug mode. That doesn't help much with release mode though but I see this as a similar problem as integer over/under-flow which are disabled in release mode because the performance impact is too big.

I haven't looked into the generated code too much, but if we are confident it is sound and safe, instead of exposing those traits directly to user, we could generate a sealed trait to only allow the generated code to call push and follow.

@CasperN
Copy link
Collaborator

CasperN commented Dec 2, 2021

One thing we could do is add a debug assert so that at least it would panic in debug mode. That doesn't help much with release mode though but I see this as a similar problem as integer over/under-flow which are disabled in release mode because the performance impact is too big.

Sounds reasonable to me

I haven't looked into the generated code too much, but if we are confident it is sound and safe, instead of exposing those traits directly to user, we could generate a sealed trait to only allow the generated code to call push and follow.

I'd say we're fairly confident but there's always more we can do. Specifically the fuzzing story could be improved: cargo fuzz is a coverage guided fuzzer which, unlike quickcheck, mutates examples to optimize for code coverage, which is totally necessary given the number of branches in monster_test. We may even need custom mutators to make fuzzing efficient. Having that running continuously with ASAN would make me a lot more confident in our safety.

we could generate a sealed trait to only allow the generated code to call push and follow.

I thought about sealed traits too... I'm not sure it will work because the generated code is part of the client's crate, not our own. I guess we could call the trait GeneratedCodeOnlyVerySubtleSeriouslyDontDoItYourself or something, but I don't think there can be compiler enforcement.

@CasperN
Copy link
Collaborator

CasperN commented Dec 3, 2021

WDYT about making "follow" and "push" checked and/or typesafe, so that users can't read or write invalid data? I'd like to avoid turning everything in the library into unsafe when we could possibly solve the problem in a more Rust-y way.

I feel like we've gone back and forth on this discussion a few times. Originally, I advocated for bounds checks but there was pushback for performance reasons. Making follow and friends unsafe is the natural consequence to removing checks. I actually think this is okay, given the existence of the verifier.

When we were working on the verifier I think there were a few ways of thinking about reading data from a flatbuffer

  1. read it without checks, but give users the option to verify first, and opting out requires unsafe
  2. read it with checks and panic
  3. read it with checks and wrap in a result type
  4. (there were actually a few more even more sophisticated solutions, which I loved but decided were impractical1)

(2) was not great as the checks add unnecessary overhead to every field access, after verification. (3) was probably the most rust-y solution but there'd be a lot of result type propagation which was deemed unergonomic and also repeated access to the same field will incur the overhead of checks. Ultimately, we went with (1) because its the most similar to the C++ API, incurred the least complexity both for maintenance and usability, while solving most of the problem. See #6161 for the full discussion.

Footnotes

  1. We could have 2 kinds of tables, verified and unverified; and 3 accessors for each field in the unverified table; foo (which checks and panics), try_foo (result typed) and unchecked_foo (unsafe/fast). There's also ideas around of deep vs lazy verification of object trees, only verifying subtrees... We can get arbitrarily intricate here

@zamazan4ik
Copy link
Contributor

zamazan4ik commented Dec 7, 2021

Am I right that now reading an untrusted Flatbuffer input with Rust can lead to some segfaults and now there is no way to read values in some checked way? I mean some "Buffer verifier" functionality in the Rusty Flatbuffer implementation.

EDITED: Just read this #6269 and I am a little bit disappointed because seems like the verifier is implemented but the official Flatbuffer page disagrees with me.

@zamazan4ik
Copy link
Contributor

@CasperN @rw maybe you can clarify my question above? (sorry for the explicit ping)

@CasperN
Copy link
Collaborator

CasperN commented Dec 8, 2021

@zamazan4ik

Am I right that now reading an untrusted Flatbuffer input with Rust can lead to some segfaults and now there is no way to read values in some checked way?

No, because we do have a verifier and you need unsafe to opt out of it. Though if you do opt out I think you can get an out-of-bound read. Sorry about the misleading docs, my bad, I think I was last to touch them.

Fwiw, if popular demand in Rust is to add checks and take the performance hit, even with the verifier, I don't mind going in that direction.

@zamazan4ik
Copy link
Contributor

zamazan4ik commented Dec 8, 2021

No, because we do have a verifier and you need unsafe to opt out of it. Though if you do opt out I think you can get an out-of-bound read. Sorry about the misleading docs, my bad, I think I was last to touch them.

Thanks for the clarification. Would be awesome if you will fix the documentation too :)

Fwiw, if popular demand in Rust is to add checks and take the performance hit, even with the verifier, I don't mind going in that direction.

Not sure about other people's use cases. For me is enough that flatbuffers with the verifier don't segfault/panic.

@TheButlah
Copy link

TheButlah commented Aug 8, 2022

Fwiw, if popular demand in Rust is to add checks and take the performance hit, even with the verifier, I don't mind going in that direction.

Essentially, I would love to see flatbuffers as requiring the user to explicitly call a safe runtime checked cast to a verified table, or an unsafe zero-cost cast to a verified table. Then all the other accessors in the generated code get to assume the table was already verified and omit the runtime cost, because it is an invariant on the table type itself.

Looking at the api of Follow, i don't think there is a way for that low level offset-based api to be safe and not bounds checked. So yes I'm in the camp of "mark it unsafe and don't bounds check". The generated accessor code that calls into Follow will be marked safe and also will be zero cost, because those accessors would assume pre-verified tables, and my thought is that flatbuffers 3.0 would make it so that its impossible to get such a table without safely calling the verifier to promote the slice to the table via root(), or unsafely casting the slice to the table with root_unchecked().

@tustvold
Copy link
Contributor

tustvold commented Sep 9, 2022

I've submitted #7518 which I think eliminates any unsoundness, but I'm not intimately familiar with this crate and so would appreciate any additional eyes on it, assuming this is the path we want to take 😄

@TethysSvensson
Copy link
Contributor Author

@tustvold Trying to fix any soundness issue is probably a bit of a high bar. 😉

There are still bugs in the parser like #7007.

@tustvold
Copy link
Contributor

tustvold commented Sep 9, 2022

a bit of a high bar

Eh... Nothing like aiming high 😄 I think it would eliminate any known practical unsoundness issues, or am I missing something?

@TethysSvensson
Copy link
Contributor Author

I still think there are a lot of ways to bypass this. For instance this one:

In schema.fbs:

table Foo {
  x: uint32;
}

In main.rs:

mod schema_generated;

fn main() {
    #[rustfmt::skip]
    let root = flatbuffers::root::<schema_generated::Foo>(&[
        12, 0, 0, 0, // offset to root
        0, 0, // padding
        // <vtable>
        6, 0, // vtable size
        8, 0, // object size
        4, 0, // offset for field x
        // </vtable>
        // <object>
        6, 0, 0, 0, // (negative) offset for vtable
        2, 0, 0, 0,
        // </object>
        0xc3, 0x28, // extra stuff after object
    ])
    .unwrap();
    // re-interpret the uint32 field as a string field (without offset)
    // this calls from_utf8_unchecked internally
    let field = root._tab.get::<&str>(4, None).unwrap();

    // this should never be possible with a valid string
    assert!(std::str::from_utf8(field.as_bytes()).is_err());
}

@tustvold
Copy link
Contributor

tustvold commented Sep 9, 2022

Disappointing, I presumed the verifier was doing utf8 validation. Fortunately that is easy enough to add 😄

@TethysSvensson
Copy link
Contributor Author

@tustvold It is, but the schema does not contain a string -- and the get function allows re-interpreting the data.

@tustvold
Copy link
Contributor

tustvold commented Sep 9, 2022

Aah, awesome, was sure I'd missed some methods, thanks for pointing it. Will do another audit tomorrow morning 👍

@TethysSvensson
Copy link
Contributor Author

Personally I think a lot of these issues would be solved by doing late validation instead of using the verifier. That would also be faster, because you don't have to go over the data twice.

But even if you don't want this, I think re-implementing read_scalar_at as a safe function would help a lot with unsafe infecting everything.

@TethysSvensson
Copy link
Contributor Author

There is also the fact that safe_slice is inherently undefined behavior according to the Rust ABI, which e.g. requires u32 to be 4-byte aligned. This is an example of this undefined behavior:

schema.fbs:

table Foo {
  x: [uint32];
}

main.rs:

mod schema_generated;

static DATA: &[u8] = &[
    0, // A single byte of padding
    12, 0, 0, 0, // offset to root
    0, 0, // padding before vtable
    // <vtable>
    6, 0, // vtable size
    8, 0, // object size
    4, 0, // offset for field x
    // </vtable>
    // <object>
    6, 0, 0, 0, // (negative) offset for vtable
    4, 0, 0, 0, // offset to vector
    // </object>
    3, 0, 0, 0, // vector length
    1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, // vector data
];

fn main() {
    let root = flatbuffers::root::<schema_generated::Foo>(&DATA[1..]).unwrap();
    let slice: &[u32] = root.x().unwrap().safe_slice();
    println!("Value: {slice:?}");
    println!("Required alignment: {}", std::mem::align_of::<u32>());
    println!("Pointer: {:p}", slice.as_ptr());
    println!(
        "Correctly aligned? {}",
        slice.as_ptr() as usize % std::mem::align_of::<u32>() == 0
    );
}

@tustvold
Copy link
Contributor

I've updated #7518 to also address the alignment issues, unfortunately this does require a user-facing breaking change, namely we can't treat as many things as slices, but I think this is fine. There is something a little amusing that we can provide slices of structs but not primitives 😆 PTAL and let me know if I've missed anything

@TethysSvensson
Copy link
Contributor Author

I you need to mark init_from_table as unsafe as well in the generated code. For now that is all I can find, but I would not at all be surprised if more issues are hiding around.

tonyxuqqi added a commit to tonyxuqqi/tikv that referenced this issue Sep 18, 2023
The security issue is google/flatbuffers#6627

Signed-off-by: tonyxuqqi <tonyxuqi@outlook.com>
ti-chi-bot bot pushed a commit to tikv/tikv that referenced this issue Sep 20, 2023
…15628)

ref #15621

The security issue is google/flatbuffers#6627.
Upgrade flatbuffers from 2.1.2 to 23.5.26 to address it.

Signed-off-by: tonyxuqqi <tonyxuqi@outlook.com>
Signed-off-by: Qi Xu <tonyxuqqi@outlook.com>

Co-authored-by: Qi Xu <tonyxuqqi@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants