Fix Rust UB problems #6393

CasperN · 2021-01-08T17:22:12Z

This should fix #6359, #6203, #5854, #5825, and the last issue mentioned in #4916.

Motivation

Miri is a rust interpreter that detects bad memory activity, i.e. UB. I've fixed the detected problems, which turned out to be alignment issues, and added it to RustTest.sh. With the verifier and UB issues sorted (so far as a core Rust tool can detect) I think the library is now acceptably safe according to the common standards of the Rust community.

Details

I didn't run all tests through Miri. Its incredibly slow so I've disabled it for fuzzing and other expensive tests.
verifier_one_byte_errors_do_not_crash is disabled but I did run it manually since its quite important. That took like an hour on my weak little macbook air. Miri tests are only running on linux since it tests an abstract memory model and there shouldn't be any difference on other platforms.

Since Rust's Vec<u8> and [u8] only promises alignment 1, we have to copy out of the buffer and onto the stack to get aligned access. This is a very minor performance hit that's incurred when scalars are accessed. The alternative would be to introduce a nonstandard type such as Vec<u128> or even a custom vector that promises 16byte alignment. That's pretty inconvenient for us and users.

I had to rework the implementation of structs in rust to use an array of bytes rather than repr(C, align=4). This is because we return references to structs from fb-vectors, and these references are alignment 1. Please take a closer look at the new implementation.

Alignment checks from the verifier are now with respect to buffer[0] as per the Flatbuffers specification. They may not be aligned w.r.t. the system because of [u8]'s alignment. I kept this check in case a verified buffer in Rust is assumed aligned when passed to another language.

krojew · 2021-01-08T17:36:25Z

Memory copying in each struct accessor is a bit worrying, although I understand the motivation. We could end up in the same situation as in Java, where we have a performance trap with strings.

aardappel · 2021-01-08T19:32:34Z

I am pretty surprised Rust doesn't have a built-in way to have aligned buffers of u8. That is pretty fundamental for any serialized data, file formats, memory mapping etc. No Rust allocator will align at 1 byte granularity so it isn't even a practical issue.

There seems to be a lot in Rust that favors being pedantic over pragmatism, that is sad.

CasperN · 2021-01-08T20:02:28Z

No Rust allocator will align at 1 byte granularity so it isn't even a practical issue.

Rust by default uses jemalloc and it does actually take advantage of lower alignment requirements to reduce fragmentation. This is why we have users observing alignment errors (#6359).

I am pretty surprised Rust doesn't have a built-in way to have aligned buffers of u8

I mean you can get aligned raw pointers via std::alloc but its lower level than what most people prefer to use.

There'll probably be a way to parameterize Vec<u8> with an allocator that guarantees minimum 16byte alignment, and eventually maybe even a way for us to restrict allowable allocators via the type system. That doesn't exist yet but there's a working group talking about it.

https://github.com/rust-lang/wg-allocators/issues

The biggest issue is &[u8]. That's a very fundamental and widely used type and I don't see a good way for &[u8] promise a minimum alignment since you can take arbitrary subslices.

CasperN · 2021-01-10T01:12:52Z

Ok, looks like all tests pass, can I get LGTMs or 👍s?

rw · 2021-01-10T20:44:44Z

@CasperN Nice work!

What are the benchmarks like with these changes? I wonder if representing structs as arrays will cause the compiler to use the stack more than it should (instead of registers). And I'm curious about the effects of the copying that @krojew mentioned.

CasperN · 2021-01-10T22:29:57Z

@rw

What are the benchmarks like with these changes?

There's some performance regression on my machine, but its not too significant, within the "+/- interval."

Master:

running 16 tests
test create_byte_vector_100_naive       ... bench:         856 ns/iter (+/- 93) = 116 MB/s
test create_byte_vector_100_optimal     ... bench:          28 ns/iter (+/- 4) = 3571 MB/s
test create_canonical_buffer_then_reset ... bench:       1,096 ns/iter (+/- 170) = 277 MB/s
test create_string_10                   ... bench:          35 ns/iter (+/- 25) = 285 MB/s
test create_string_100                  ... bench:          35 ns/iter (+/- 7) = 2828 MB/s
test traverse_canonical_buffer          ... bench:         128 ns/iter (+/- 35) = 2375 MB/s

This branch:

running 16 tests
test create_byte_vector_100_naive       ... bench:         907 ns/iter (+/- 181) = 110 MB/s
test create_byte_vector_100_optimal     ... bench:          29 ns/iter (+/- 4) = 3448 MB/s
test create_canonical_buffer_then_reset ... bench:       1,120 ns/iter (+/- 173) = 267 MB/s
test create_string_10                   ... bench:          30 ns/iter (+/- 6) = 333 MB/s
test create_string_100                  ... bench:          36 ns/iter (+/- 6) = 2750 MB/s
test traverse_canonical_buffer          ... bench:         127 ns/iter (+/- 25) = 2362 MB/s

I wonder if representing structs as arrays will cause the compiler to use the stack more than it should (instead of registers).

Maybe, but I wouldn't look into it. The regression is very tolerable, given we're eliminating UB.

Anything to address before merging?

krojew · 2021-01-11T08:41:26Z

Looks good.

rw · 2021-01-11T23:40:03Z

Nice!

rw · 2021-01-11T23:45:28Z

@CasperN WDYT about us running miri for all tests sometimes?

CasperN · 2021-01-12T13:59:37Z

WDYT about us running miri for all tests sometimes?

It might be good idea but I don't know how long it takes, Miri seems around 5 orders of magnitude slower than normal rust.

eduardosm · 2021-01-15T18:21:48Z

rust/flatbuffers/src/endian_scalar.rs

 #[inline]
 pub fn emplace_scalar<T: EndianScalar>(s: &mut [u8], x: T) {
-    let sz = size_of::<T>();
-    let mut_ptr = (&mut s[..sz]).as_mut_ptr() as *mut T;
-    let val = x.to_little_endian();
+    let x_le = x.to_little_endian();
    unsafe {
-        *mut_ptr = val;
+        core::ptr::copy_nonoverlapping(
+            &x_le as *const T as *const u8,
+            s.as_mut_ptr() as *mut u8,
+            size_of::<T>()
+        );
    }
 }


This is missing checking the length of s and still allows reading uninitialized padding bytes of structs.

What do you mean by "still allows reading uninitialized padding bytes of structs"

struct Struct { x: u8, y: u16, }

This struct has one padding byte, which is uninitialized. Implementing EndianScalar for that struct would allow reading such uninitialized byte.

eduardosm · 2021-01-15T18:25:43Z

rust/flatbuffers/src/endian_scalar.rs

 #[inline]
 pub fn read_scalar<T: EndianScalar>(s: &[u8]) -> T {
-    let sz = size_of::<T>();
-
-    let p = (&s[..sz]).as_ptr() as *const T;
-    let x = unsafe { *p };
-
+    let mut mem = core::mem::MaybeUninit::<T>::uninit();
+    // Since [u8] has alignment 1, we copy it into T which may have higher alignment.
+    let x = unsafe {
+        core::ptr::copy_nonoverlapping(
+            s.as_ptr(),
+            mem.as_mut_ptr() as *mut u8,
+            size_of::<T>()
+        );
+        mem.assume_init()
+    };


This is missing checking the length of s and allows reading invalid representations of T (e.g., when T is an enum).

For length checks, we first verify the flatbuffer before accessing fields -- see #6161 for implementation. Granted, if you directly use the flatbuffers library, you can still misuse this function, but it should only be used by our generated code.

As for enums, I changed the representation in #6098 so Flatbuffers enums won't be represented by Rust enums, but unit structs of i8/u8/.../i64/u64 so all bit representations are valid.

Granted, if you directly use the flatbuffers library, you can still misuse this function, but it should only be used by our generated code.

Isn't the purpose of Rust to avoid UB even when (safe) functions are use incorrectly?

As for enums, I changed the representation in #6098 so Flatbuffers enums won't be represented by Rust enums, but unit structs of i8/u8/.../i64/u64 so all bit representations are valid.

There's still bool.

rw · 2021-01-15T20:55:20Z

@CasperN WDYT about us marking generated functions as unsafe, if they use unsafe? To @eduardosm 's point, it seems like users could potentially think that it's okay to call those functions themselves, when in general we don't want them to.

Edit: or traits as unsafe.

CasperN · 2021-01-15T21:20:07Z

WDYT about us marking generated functions as unsafe, if they use unsafe? To @eduardosm 's point, it seems like users could potentially think that it's okay to call those functions themselves, when in general we don't want them to.

Yea that seems like a simple solution to this

… problems fixed The major change of [flatbuffers 0.8.1](https://docs.rs/flatbuffers/0.8.1/flatbuffers/index.html) since 0.8.0 is google/flatbuffers#6393, which fixed some possible memory alignment issues. In this PR, the ipc/gen/*.rs files are generated by `regen.sh` as before, without any manual change. Closes #9176 from mqy/flatbuffers-0.8.1 Authored-by: mqy <meng.qingyou@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

… problems fixed The major change of [flatbuffers 0.8.1](https://docs.rs/flatbuffers/0.8.1/flatbuffers/index.html) since 0.8.0 is google/flatbuffers#6393, which fixed some possible memory alignment issues. In this PR, the ipc/gen/*.rs files are generated by `regen.sh` as before, without any manual change. Closes apache#9176 from mqy/flatbuffers-0.8.1 Authored-by: mqy <meng.qingyou@gmail.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>

Casper Neo added 2 commits January 7, 2021 22:35

Fix miri problems by assuming alignment is 1 in rust

4027c4e

Removed is_aligned fn from rust verifier.

bab3a74

github-actions bot added c++ codegen Involving generating code from schema rust labels Jan 8, 2021

Add back is_aligned, but make it w.r.t. buffer[0]

3fc7525

This was linked to issues Jan 8, 2021

Rust: run miri in test #6203

Closed

Tracking issue: Rust buffer verification #4916

Closed

Casper Neo added 2 commits January 8, 2021 14:33

touch unused variable

26bfb6d

touch unused variable

cbf0630

Casper Neo added 6 commits January 8, 2021 15:03

+nightly

2870b26

Move Rust miri testing into its own docker

4b708a4

fix bash

a635784

missing one endian conversion

b361742

fix endianness2

3a5dc2a

format stuff

5133f9c

CasperN requested a review from rw January 10, 2021 19:46

CasperN merged commit 408cf58 into google:master Jan 11, 2021

mqy mentioned this pull request Jan 12, 2021

ARROW-11222: [Rust] Catch up with flatbuffers 0.8.1 which had some UB problems fixed apache/arrow#9176

Closed

CasperN mentioned this pull request Jan 12, 2021

Run Rust tests with sanitizers #6401

Open

rockstar mentioned this pull request Jan 15, 2021

Rust: read_scalar(_at) is unsound #5825

Closed

eduardosm reviewed Jan 15, 2021

View reviewed changes

CasperN mentioned this pull request Apr 20, 2021

[Rust] FlatBufferBuilder::create_vector_direct reports undefined behavior when run through Miri #5854

Closed

CasperN deleted the miri2 branch August 24, 2021 13:46

asfimport mentioned this pull request Jan 19, 2021

[Rust] [Arrow] catch up with flatbuffers 0.8.1 apache/arrow#27126

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Rust UB problems #6393

Fix Rust UB problems #6393

CasperN commented Jan 8, 2021

krojew commented Jan 8, 2021

aardappel commented Jan 8, 2021

CasperN commented Jan 8, 2021

CasperN commented Jan 10, 2021

rw commented Jan 10, 2021

CasperN commented Jan 10, 2021

krojew commented Jan 11, 2021

rw commented Jan 11, 2021

rw commented Jan 11, 2021

CasperN commented Jan 12, 2021

eduardosm Jan 15, 2021

CasperN Jan 15, 2021

eduardosm Jan 15, 2021

eduardosm Jan 15, 2021

CasperN Jan 15, 2021

eduardosm Jan 15, 2021 •

edited

rw commented Jan 15, 2021 •

edited

CasperN commented Jan 15, 2021

Fix Rust UB problems #6393

Fix Rust UB problems #6393

Conversation

CasperN commented Jan 8, 2021

Motivation

Details

krojew commented Jan 8, 2021

aardappel commented Jan 8, 2021

CasperN commented Jan 8, 2021

CasperN commented Jan 10, 2021

rw commented Jan 10, 2021

CasperN commented Jan 10, 2021

krojew commented Jan 11, 2021

rw commented Jan 11, 2021

rw commented Jan 11, 2021

CasperN commented Jan 12, 2021

eduardosm Jan 15, 2021

Choose a reason for hiding this comment

CasperN Jan 15, 2021

Choose a reason for hiding this comment

eduardosm Jan 15, 2021

Choose a reason for hiding this comment

eduardosm Jan 15, 2021

Choose a reason for hiding this comment

CasperN Jan 15, 2021

Choose a reason for hiding this comment

eduardosm Jan 15, 2021 • edited

Choose a reason for hiding this comment

rw commented Jan 15, 2021 • edited

CasperN commented Jan 15, 2021

eduardosm Jan 15, 2021 •

edited

rw commented Jan 15, 2021 •

edited