Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn when transmuting raw pointers #11734

Closed
joshlf opened this issue Oct 29, 2023 · 15 comments
Closed

Warn when transmuting raw pointers #11734

joshlf opened this issue Oct 29, 2023 · 15 comments
Labels
A-lint Area: New lints

Comments

@joshlf
Copy link

joshlf commented Oct 29, 2023

What it does

Warns when calling mem::transmute or transmute_copy where the source or destination types contain raw pointers.

Advantage

Transmuting is an operation which may not preserve provenance of raw pointers, and so represents a subtle footgun.

Drawbacks

There is not always an obvious alternative for Clippy to suggest. The suggestion may simply be something to the effect of "be careful about provenance."

In some specific cases, there may be an applicable suggestion such as the example given below, in which raw pointers are being directly transmuted (as opposed to types which recursively contains raw pointers).

Example

let bar: *const Bar = get_bar();
let foo: *const Foo = unsafe { mem::transmute(bar) };

Could be written as:

let bar: *const Bar = get_bar();
let foo: *const Foo = bar as *const Foo;
@joshlf joshlf added the A-lint Area: New lints label Oct 29, 2023
@joshlf
Copy link
Author

joshlf commented Oct 29, 2023

cc @RalfJung, who might have opinions about when to fire the lint and what to suggest as an alternative.

@RalfJung
Copy link
Member

RalfJung commented Oct 29, 2023 via email

@taiki-e
Copy link
Member

taiki-e commented Oct 30, 2023

A lint for this already exists, although is allowed by default: https://rust-lang.github.io/rust-clippy/master/index.html#/transmute_ptr_to_ptr

@joshlf
Copy link
Author

joshlf commented Nov 2, 2023

At least under our current models, transmuting between pointer types does preserve provenance. Transmuting from *mut T to *mut U is equivalent to a cast. I don't see that changing. Problems arise when transmuting between integers and pointers; transmuting between usize and *mut T is not equivalent to a cast.

Ah I see, I had misunderstood.

What about transmuting references-to-pointers? Ie, does transmuting &*mut T to &*mut U (or doing &*(t as *mut T as *mut U)) preserve provenance?

Your response confirms that zerocopy's transmute! macro will preserve provenance where raw pointers overlap in the source and destination types. I'd also like to know about provenance preservation for the transmute_ref! and transmute_mut! macros, which perform &T -> &U and &mut T -> &mut U respectively.

@joshlf
Copy link
Author

joshlf commented Nov 2, 2023

A lint for this already exists, although is allowed by default: https://rust-lang.github.io/rust-clippy/master/index.html#/transmute_ptr_to_ptr

TIL, thanks!

Sounds like this can be closed.

@joshlf joshlf closed this as not planned Won't fix, can't repro, duplicate, stale Nov 2, 2023
@RalfJung
Copy link
Member

RalfJung commented Nov 2, 2023

What about transmuting references-to-pointers? Ie, does transmuting &*mut T to &mut U (or doing &(t as *mut T as *mut U)) preserve provenance?

&*mut T to &*mut U is transmuting a reference to a reference?

Provenance is always preserved wherever the output has pointer or reference type. (And inside MaybeUninit, and possibly inside unions in general.) It is only not preserved in integers (and padding of course, where nothing is preserved).

@joshlf
Copy link
Author

joshlf commented Nov 2, 2023

What about transmuting references-to-pointers? Ie, does transmuting &*mut T to &mut U (or doing &(t as *mut T as *mut U)) preserve provenance?

&*mut T to &*mut U is transmuting a reference to a reference?

Sorry, I meant "transmuting T to U, where T and U are references-to-pointers"

Provenance is always preserved wherever the output has pointer or reference type. (And inside MaybeUninit, and possibly inside unions in general.) It is only not preserved in integers (and padding of course, where nothing is preserved).

Okay gotcha, so the fact that you're going T -> U vs &T -> &U vs &&T -> &&U (or other sorts of nested indirection, like &Foo<T> -> Foo<U> in struct Foo<T> { a: u8, t: T }) doesn't affect the outcome that provenance is preserved?

@RalfJung
Copy link
Member

RalfJung commented Nov 2, 2023

Well, T -> U is different: if T = *const u32 and U = usize, then provenance is not preserved since the output is an integer. And Foo<T> has a u8 as its first field (assuming repr(C)) on which provenance is not preserved either.

The general model is: whenever you are making a typed copy (such as assignment, passing arguments to a function, receiving the return value of a function), then the source memory is "de-serialized" into a "high-level representation" (like this one), and that high-level representation is then "serialized" into the target memory. This (a) means there is UB if the source memory does not represent any high-level value, such as trying to de-serialize 0x10 as a bool, and (b) it means that all data in padding is lost, since it is not part of the high-level representation, and (c) it means that provenance is lost if it's not in a pointer (or a union, details here are not final yet), since pointers (and some or maybe all unions) are the only high-level values that carry provenance.

transmute from T to U basically works by

  • de-serializing the source memory at type T
  • serializing that high-level value to some intermediate storage
  • de-serializing that storage at type U
  • and serializing whatever we get there into the destination

So when you transmute to/from integers, the data goes through a Value::Int, and there's no provenance in Value::Int, so provenance is lost.

@joshlf
Copy link
Author

joshlf commented Nov 2, 2023

Well, T -> U is different: if T = *const u32 and U = usize, then provenance is not preserved since the output is an integer. And Foo<T> has a u8 as its first field (assuming repr(C)) on which provenance is not preserved either.

Yeah, I specifically meant "T -> U where T and U's raw pointers are at the same byte ranges".

The general model is: whenever you are making a typed copy (such as assignment, passing arguments to a function, receiving the return value of a function), then the source memory is "de-serialized" into a "high-level representation" (like this one), and that high-level representation is then "serialized" into the target memory. This (a) means there is UB if the source memory does not represent any high-level value, such as trying to de-serialize 0x10 as a bool, and (b) it means that all data in padding is lost, since it is not part of the high-level representation, and (c) it means that provenance is lost if it's not in a pointer (or a union, details here are not final yet), since pointers (and some or maybe all unions) are the only high-level values that carry provenance.

I think I understand this model. IIUC, if it is the case that T -> U preserves provenance, then &T -> &U preserves provenance, &mut T -> &mut U preserves provenance, &&T -> &&U preserves provenance, etc? And the same is true if T and U are nested arbitrarily deeply inside reference/pointer/box indirection, in struct fields, etc, so long as the T and U live at the same byte ranges in the source and destination types?

@RalfJung
Copy link
Member

RalfJung commented Nov 2, 2023

I think I understand this model. IIUC, if it is the case that T -> U preserves provenance, then &T -> &U preserves provenance, &mut T -> &mut U preserves provenance, &&T -> &&U preserves provenance, etc? And the same is true if T and U are nested arbitrarily deeply inside reference/pointer/box indirection, in struct fields, etc, so long as the T and U live at the same byte ranges in the source and destination types?

That sounds right, albeit expressed in somewhat complicated ways. Transmutation is a shallow operation, so transmuting &T to &U doesn't care about the T and U at all, it just gives you a pointer at a different type. Transmutation never cares about what is "behind the pointer". So even &*const T to &usize preserves provenance of the outer pointer (and does nothing at all with any other pointer).

Maybe you're concerned with what safe code does with the resulting value after the transmute happens? That's a much broader question, but then you have to consider the safety invariants as well obviously, not just whether provenance is preserved.

@joshlf
Copy link
Author

joshlf commented Dec 5, 2023

I think I understand this model. IIUC, if it is the case that T -> U preserves provenance, then &T -> &U preserves provenance, &mut T -> &mut U preserves provenance, &&T -> &&U preserves provenance, etc? And the same is true if T and U are nested arbitrarily deeply inside reference/pointer/box indirection, in struct fields, etc, so long as the T and U live at the same byte ranges in the source and destination types?

That sounds right, albeit expressed in somewhat complicated ways. Transmutation is a shallow operation, so transmuting &T to &U doesn't care about the T and U at all, it just gives you a pointer at a different type. Transmutation never cares about what is "behind the pointer". So even &*const T to &usize preserves provenance of the outer pointer (and does nothing at all with any other pointer).

What I'm more worried about is whether provenance on the inner pointer is preserved. E.g., if I perform *const *const T -> *const *const U, the referent type is converted from *const T to *const U. I understand that you're saying that, in this example, the provenance of the outer type is preserved (the resulting *const *const U has the same provenance as the original *const *const T), but is the provenance of the inner/referent type also preserved (does the resulting *const U have the same provenance as the original *const T)?

@RalfJung
Copy link
Member

RalfJung commented Dec 5, 2023

The provenance of the inner pointer lives in memory. Transmuting does not change the contents of memory. So no there's no way this can affect the provenance of the inner pointer, just like transmuting &bool to &u8 does not change the value stored in memory behind that reference. (I'm surprised by the question, your model of provenance must be quite different from mine to make this even a question. To be fair it's not like we have good docs on this; I don't know if reading through the memory interface spec in minirust helps but it might.)

the resulting *const U

There's no "resulting *const U". There's a resulting *const *const U. To get a *const U you have to load from memory and what you get then depends solely on the memory contents, not the pointer through which you are watching the memory contents. Those memory contents might change between the time you do the transmute and the time you do the load, so this is really not very connected to the transmute at all. The "pointer through which you are watching" only affects whether you are allowed to load; if it has the wrong provenance the load is UB. But if the load is permitted, then the Abstract Bytes you see are the same regardless of which pointer is used to do the access.

@joshlf
Copy link
Author

joshlf commented Dec 6, 2023

Okay that clarifies it for me, thanks!

(I'm surprised by the question, your model of provenance must be quite different from mine to make this even a question. To be fair it's not like we have good docs on this; I don't know if reading through the memory interface spec in minirust helps but it might.)

I think it's less about my mental model being different and more about it being less complete. In particular, it sounds like your model is that provenance lives inside of a pointer regardless of whether you're accessing that pointer as a local variable or via some kind of indirection. I didn't think that this wasn't the case, but rather I didn't have any belief about that fact one way or the other.

Generally speaking, when it comes to the semantics of unsafe code, I've learned to assume that my mental model might be slightly wrong in various ways. Correspondingly, I've learned that if I make a logical deduction about what code will be sound on the basis of my mental model, there's a chance that that deduction will lead me to an incorrect conclusion. If you are 100% confident in your model, then you can confidently conclude that *const *const T -> *const *const U doesn't affect the provenance which lives inside of the referent pointer. But I can imagine a bunch of ways that provenance might work differently than I expect, and all of these would make that conclusion invalid:

  • Maybe the provenance of all reachable pointers lives in the local variable (ie, the *const *const T contains its own provenance and also the provenance of the referent *const T)
  • Maybe the provenance in *const T lives in a different place within the pointer (say, inside the first byte) than the provenance in *const U (say, inside the last byte)
  • Maybe the provenance in *const T has a different representation than the provenance in *const U, and you need to do an as cast in order to convert between these representations

In practice I know that none of these are true, but my point is just to illustrate that as someone who is trying to language-lawyer existing documentation written by other people, it can be dangerous to assume that you know anything at all 😛

@RalfJung
Copy link
Member

RalfJung commented Dec 7, 2023

In particular, it sounds like your model is that provenance lives inside of a pointer regardless of whether you're accessing that pointer as a local variable or via some kind of indirection.

Yes indeed. Local variables are also just places in memory, the indirection is managed automatically by the Abstract Machine. (See the definition of StackFrame here.)

I didn't think that this wasn't the case, but rather I didn't have any belief about that fact one way or the other.

That's fair. To be clear, I was not criticizing you, just expressing my surprise. It's a constant reminder how little we actually clearly document about provenance currently.

@joshlf
Copy link
Author

joshlf commented Dec 7, 2023

I didn't think that this wasn't the case, but rather I didn't have any belief about that fact one way or the other.

That's fair. To be clear, I was not criticizing you, just expressing my surprise.

No worries, I didn't take it as such 🙂

It's a constant reminder how little we actually clearly document about provenance currently.

Yeah, and this is unfortunately something that demands an absurd level of precision and verbosity compared to "normal" module/type/function/etc documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lint Area: New lints
Projects
None yet
Development

No branches or pull requests

3 participants