Store pointer bytes in pointers for SB/provenance #272

saethlin · 2022-04-06T01:09:43Z

I'm opening this because I was made aware of #271.

I would be willing to consider a PR that makes this code friendlier to miri's raw pointer tagging

With these changes, this codebases passes Miri with raw pointer tagging.

In any case, if the original data structure can't be made compatible with raw pointer tagging, that is the opposite of a reason to get rid of it. That is what makes it interesting and useful for informing miri design and standard library design; see rust-lang/miri#1936.

I heartily agree. I think a lot of code which is not currently can be made amenable to Stacked Borrows (even with raw pointer tagging). The question is whether people are willing to write code in the patterns that it requires. In this case, the requirement is that pointer bytes be stored in a type which can carry provenance (that is, if you want to resurrect and dereference the pointer). There are two types which are useful for this: raw pointers and MaybeUninit. This PR implements the raw pointer approach.

Here is the MaybeUninit approach: https://github.com/saethlin/semver/blob/396400cd2eaccc82b6cf01f202fd08ef85099f7e/src/identifier.rs#L90-L120

I far prefer the MaybeUninit approach because I can see a way to implement it without writing platform-dependent code (I have no confidence in the 32-bit code in this PR). But it would bump this crate's MSRV up to 1.36, and I don't know if it's an option for this crate.

(yes I know the transmutes in the MaybeUninit implementation don't compile on 32-bit platforms, but that can be fixed I'm just being lazy with code that may not be interesting)

riking · 2022-04-14T04:47:47Z

src/identifier.rs

    // Rustc >=1.53 has intrinsics for counting zeros on a non-zeroable integer.
    // On many architectures these are more efficient than counting on ordinary
    // zeroable integers (bsf vs cttz). On rustc <1.53 without those intrinsics,
    // we count zeros in the u64 rather than the NonZeroU64.
-    #[cfg(no_nonzero_bitscan)]
-    let repr = repr.get();
+    let repr = repr.as_ptr() as usize;


This should probably be repr.as_usize because this is where you extract an integer value

riking · 2022-04-14T04:48:48Z

src/identifier.rs


    // SAFETY: the most significant bit of repr is known to be set, so the value
    // is not zero.
-    unsafe { NonZeroU64::new_unchecked(repr) }
+    Repr::from_ptr(unsafe { NonNull::new_unchecked(ptr.wrapping_sub(prev).wrapping_add(repr)) })


Pre-subtracting the offset has better behavior in debug mode on CHERI because you never produce a wildly out of range pointer

rust-lang/rust#96152

riking · 2022-04-14T04:50:13Z

src/identifier.rs

+        pub unsafe fn from_inline(bytes: u64) -> Repr {
+            Repr {
+                ptr: unsafe { NonNull::new_unchecked(bytes as usize as *mut u8) },
+                remainder: (bytes >> 32) as usize,


Due to the lack of as_u64, I don't see anywhere remainder is ever read. Is it actually a 64 bit value or is it just a isize?

It is read both in the is_inline method, and a pointer to it is produced by inline_as_str. As I said in my initial comment, all this juggling to support 32-bit platforms makes me uncomfortable and I think the MaybeUninit implementation would be simpler, but whether it is viable at all is up to the maintainer.

Oh, that was exceedingly unclear. Maybe that cast chain could be moved to an impl on Repr?

RalfJung · 2022-04-19T19:02:24Z

I heartily agree. I think a lot of code which is not currently can be made amenable to Stacked Borrows (even with raw pointer tagging). The question is whether people are willing to write code in the patterns that it requires. In this case, the requirement is that pointer bytes be stored in a type which can carry provenance (that is, if you want to resurrect and dereference the pointer).

FWIW, this requirement is not something that a future aliasing model is likely to change. It is, in fact, entirely orthogonal to aliasing questions -- this is about whether we support transmuting pointers to integers and back. IOW, the old code would fail even with -Zmiri-strict-provenance -Zmiri-disable-stacked-borrows. Pointer-integer transmutation is a major source of ambiguity in memory models and it is unclear to what extent they can be supported without putting a high burden on optimizations in all code (even code that performs no such transmutation).

dtolnay · 2022-04-30T22:36:56Z

src/identifier.rs

+    #[cfg(target_pointer_width = "32")]
+    #[derive(PartialEq, Eq, Clone, Copy)]
+    pub struct Repr {
+        ptr: NonNull<u8>,
+        remainder: usize,


This is going to result in the front and back 4 bytes being randomly swapped depending on what order these two fields get placed in the struct layout.

fn main() { let prerelease = semver::Prerelease::new("asdfjkl").unwrap(); println!("{}", prerelease); println!("{:?}", prerelease.as_str()); }

jklasdf "jkl\0asdf"

Could we add this one to the list?

dtolnay · 2022-04-30T22:59:38Z

I tried adding a #[repr(C)] to control the field order I commented above, but the test suite still fails on 32-bit regardless of what order I put those two fields. Could you take a look? You can reproduce with cargo miri test --target i686-unknown-linux-gnu.

---- test_new stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `5`,
 right: `1`', tests/test_identifier.rs:17:9

saethlin · 2022-05-01T01:05:18Z

So the answer is that I'm a dumdum and didn't actually run the tests on i686-unknown-linux-gnu. I actually wrote 2 bugs, but I also fixed the field order so that you could (if you really wanted) do the to_u64 conversion with a transmute.

We also learned in rust-lang/rust#96538 that ptr.wrapping_sub(ptr as usize) can be miscompiled currently, so I switched to a sequence of operations that we are pretty sure doesn't miscompile.

dtolnay · 2022-05-01T02:10:53Z

src/identifier.rs

+        #[cfg(target_pointer_width = "32")]
+        pub unsafe fn from_inline(bytes: u64) -> Repr {
+            Repr {
+                ptr: unsafe { NonNull::new_unchecked(bytes as usize as *mut u8) },


I think this assumes the target is little endian. On big endian, the bytes at the lower address in memory are more significant, so the less significant half of bytes would be holding the end of the inline string, which would all be 0 bytes for strings length 4 or shorter -- see the call site here:

semver/src/identifier.rs

Lines 174 to 180 in 0b38170

let mut bytes = [0u8; 8];

// SAFETY: string is big enough to read len bytes, bytes is big

// enough to write len bytes, and they do not overlap.

unsafe { ptr::copy_nonoverlapping(string.as_ptr(), bytes.as_mut_ptr(), len) };

// SAFETY: it's nonzero because the input string was at least 1

// byte of ASCII and did not contain \0.

unsafe { Repr::from_inline(u64::from_ne_bytes(bytes)) }

--target mips64-unknown-linux-gnuabi64 works pretty well for little-endian testing.

dtolnay

I put up my own implementation in #273, which does not involve any architecture-specific conditional compilation (target pointer width or endianness).

Would appreciate review if any of you get a chance.

dtolnay · 2022-05-02T01:51:09Z

Closing in favor of #273.

Thanks anyway for the PR, and for everyone's review comments! This was helpful to inform my approach in the other PR.

Store pointer bytes in pointers for SB/provenance

08066aa

riking reviewed Apr 14, 2022

View reviewed changes

dtolnay requested changes Apr 30, 2022

View reviewed changes

Fix on 32 bit targets, bit-pack using a difference

0b38170

dtolnay requested changes May 1, 2022

View reviewed changes

dtolnay reviewed May 1, 2022

View reviewed changes

dtolnay closed this May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store pointer bytes in pointers for SB/provenance #272

Store pointer bytes in pointers for SB/provenance #272

saethlin commented Apr 6, 2022

riking Apr 14, 2022

riking Apr 14, 2022

saethlin Apr 18, 2022

riking Apr 14, 2022

saethlin Apr 18, 2022

riking Apr 18, 2022

RalfJung commented Apr 19, 2022

dtolnay Apr 30, 2022

dtolnay May 6, 2022

dtolnay commented Apr 30, 2022

saethlin commented May 1, 2022

dtolnay May 1, 2022

RalfJung May 1, 2022

dtolnay left a comment

dtolnay commented May 2, 2022

	let mut bytes = [0u8; 8];
	// SAFETY: string is big enough to read len bytes, bytes is big
	// enough to write len bytes, and they do not overlap.
	unsafe { ptr::copy_nonoverlapping(string.as_ptr(), bytes.as_mut_ptr(), len) };
	// SAFETY: it's nonzero because the input string was at least 1
	// byte of ASCII and did not contain \0.
	unsafe { Repr::from_inline(u64::from_ne_bytes(bytes)) }

Store pointer bytes in pointers for SB/provenance #272

Store pointer bytes in pointers for SB/provenance #272

Conversation

saethlin commented Apr 6, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung commented Apr 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dtolnay commented Apr 30, 2022

saethlin commented May 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dtolnay left a comment

Choose a reason for hiding this comment

dtolnay commented May 2, 2022