Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(c2rust-analyze) Support ptr-to-ptr casts between safely transmutable types, for now limited to same-sized integers #839

Merged
merged 35 commits into from Jun 9, 2023

Conversation

kkysen
Copy link
Contributor

@kkysen kkysen commented Feb 16, 2023

This introduces the concept of equivalent/compatible/safely transmutable types:

/// Determine if `from` can be safely transmuted to `to`,
/// which is defined as `*(from as *const To)` being a safe operation,
/// where `from: *const From` and assuming `*from` already was safe.
///
/// Note that this is one-way, and is slightly different from [`core::mem::transmute`],
/// and more similar to [`core::mem::transmute_copy`].
///
/// This forms a non-symmetric (one-way) equivalence relation, named `~` below.
/// Formally, `A ~ B` iff `*a` and `*(a as *const B)` are safe, where `a: *const A`.
///
/// However, safe transmutability is difficult to check completely,
/// so this function only checks a subset of it,
/// with these formal rules for all types `A`, `B`:
///
/// * `A = B => A ~ B`
/// * `A ~ B => *A ~ *B`
/// * `A ~ B => [A] ~ B`
/// * `A ~ B => [A; N] ~ B`, where `const N: usize`
/// * `A ~ B => [A] ~ [B; N]`, where `const N: usize`
/// * `A ~ B => [A; N] ~ [B]`, where `const N: usize`
/// * `uN ~ iN`, `iN ~ uN`, where `N` is an integer width
///
/// Thus, [`true`] means it is definitely transmutable,
/// while [`false`] means it may not be transmutable.
pub fn is_transmutable_to<'tcx>(from: Ty<'tcx>, to: Ty<'tcx>) -> bool {

Thus, we can now allow ptr-to-ptr casts between safely transmutable pointee types, whereas previously they were only allowed for equal types. In particular, this enables support for string casts, which are produced by c2rust transpile as b"" as *const u8 as *const core::ffi::c_char, where c_char = i8. Thus, this fixes #840.

New tests are added in string_casts.rs to cover various ptr casts, though some of them crash in the rewriter due to having implicitly inserted MIR statements like implicit &raws, which are inserted with addr_of!s. Thus, for some of these (where it works), there are versions with explicit addr_of!s that succeed end-to-end.

@kkysen kkysen force-pushed the kkysen/analyze-string-casts branch 3 times, most recently from fa98935 to 88babe2 Compare February 16, 2023 19:30
Copy link
Contributor

@fw-immunant fw-immunant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kkysen kkysen force-pushed the kkysen/analyze-string-casts branch 2 times, most recently from a7f06c4 to 19d0dac Compare February 16, 2023 22:12
Base automatically changed from kkysen/analyze-strings-casts-tests to master February 16, 2023 22:33
aneksteind
aneksteind previously approved these changes Feb 16, 2023
@kkysen
Copy link
Contributor Author

kkysen commented Feb 17, 2023

@aneksteind, I realized I missed the recursive case here, where a ~ b => *a ~ *b for all a, b where ~ means transmutable, so I've added that now. I'm not sure if this is used anywhere in lighttpd, but it makes the definition more sound and easier to reason about and isn't a large change.

…le types, for now limited to same-sized integers.

This introduces the concept of equivalent/compatible/safely transmutable types.
This forms an equivalence class among types, as the safe transmutability must be mutual
(i.e. transmutable in both directions; no prefix-transmutability).

Thus, we can now allow ptr-to-ptr casts between safely transmutable pointee types,
whereas previously they were only allowed for equal types.
Equal types could have their `PointerId`s unified as they had the same structure,
which is still of safely transmutability types,
which are safely transmutability because they have the same structure/layout.

As safe transmutability is difficult to check abstractly for any two types,
for now we limit it to commonly transmuted types that we know are definitely transmutable:
same-sized integer types (with potentially different signedness).

Thus, this enables support for string casts like
`b"" as *const u8 as *const core::ffi::c_char`, where `c_char = i8`,
which fixes #840.

Note that the above cast (#833) is still not supported due to the string literal `b""` (#837),
but the cast itself (in `string_casts.rs` in `fn cast_only`) works.
…ible/safetly transmutable types, not identical.
…e-way, now allowing for arrays and slices to decay.

This expands the definition of safe transmutability to be one-way.
That is, it checks if `*T as *U` is safe, rather than also `*U as *T`.

Thus, we can now allow for casts decaying
pointers to arrays and slices to pointers to their element type.

`do_unify` is modified to also be one-way,
which it was already in all call sites.

New tests are also added to `string_casts.rs`
for all the types of ptr-to-ptr casts.

Out of the full string cast, `b"" as *const u8 as *const core::ffi::c_char`,
this adds support for the `as *const u8` (from `&[u8; _]`),
so only support for the string literal itself remains.
… expanded defintion of safe transmutability.
@spernsteiner
Copy link
Contributor

It seems odd to declare that usize and *mut u8 are not safely transmutable, when you can safely convert between them using as, and that conversion behaves exactly the same as mem::transmute. But I agree this conversion shouldn't be supported in this PR - usize has nowhere to attach a PointerId, which is basically the same problem we have with general use of void*. Maybe we need a different name for this property?

Also, it would be good to clearly distinguish between the precise logical property, which may be difficult or impossible to check, and the conservative approximation of that property that we use in the implementation. For example, in the trivial functions PR, the precise property is "calls to this function have no effect on pointer permissions in the caller", and the approximation is "there are no raw pointers reachable through the types in this function's signature".

The casts allowed by this PR will require some additional library support during rewriting - there's no way to safely convert &str to &[i8] using only the standard library.

@kkysen
Copy link
Contributor Author

kkysen commented Feb 22, 2023

It seems odd to declare that usize and *mut u8 are not safely transmutable, when you can safely convert between them using as, and that conversion behaves exactly the same as mem::transmute. But I agree this conversion shouldn't be supported in this PR - usize has nowhere to attach a PointerId, which is basically the same problem we have with general use of void*. Maybe we need a different name for this property?

Ptr-to-int transmutes implicitly drop provenance (still being decided) and int-to-ptr transmutes are just UB. They're not safe, and the language I used for this is safe transmutability, so I think we're good here.

See rust-lang/unsafe-code-guidelines#286.

Also, it would be good to clearly distinguish between the precise logical property, which may be difficult or impossible to check, and the conservative approximation of that property that we use in the implementation. For example, in the trivial functions PR, the precise property is "calls to this function have no effect on pointer permissions in the caller", and the approximation is "there are no raw pointers reachable through the types in this function's signature".

I thought I did? Let me check. I at least did in the follow-up PR to this one.

I checked. It's in the documentation here.

The casts allowed by this PR will require some additional library support during rewriting - there's no way to safely convert &str to &[i8] using only the standard library.

Wdym by this? This PR doesn't allow that cast, nor anything to do with &str.

@spernsteiner
Copy link
Contributor

It's in the documentation here.

Ah, got it - the official definition is the bit about whether *(a as *const B) and *(b as *const A) are safe.

Ptr-to-int transmutes implicitly drop provenance (still being decided) and int-to-ptr transmutes are just UB. They're not safe

I haven't read the full thread you linked, but I saw the bit at the end about ptr-to-int transmute dropping provenance, which I believe is the same behavior you get from the safe cast my_ptr as usize. Do you happen to know whether they intend to make mem::transmute::<usize, *mut u8> UB, contrary to the safe cast my_usize as *mut u8?

I guess this is a bit of a moot point, since "safely transmutable" is effectively defined in terms of "whatever the Rust devs decide is well-defined".

Wdym by this? This PR doesn't allow that cast, nor anything to do with &str.

Sorry, I forgot the example was using b"" rather than "". But I think the same point applies. The example b"" as *const u8 as *const core::ffi::c_char will get rewritten into something involving a cast from &[u8] to &[i8]. This operation is well-defined, but std doesn't provide a safe function for performing it (as far as I know). So we'll eventually need to write or import a library that does provide such a function so we can do the necessary rewriting.

@kkysen
Copy link
Contributor Author

kkysen commented Feb 23, 2023

Ah, got it - the official definition is the bit about whether *(a as *const B) and *(b as *const A) are safe.

Yeah.

I haven't read the full thread you linked, but I saw the bit at the end about ptr-to-int transmute dropping provenance, which I believe is the same behavior you get from the safe cast my_ptr as usize. Do you happen to know whether they intend to make mem::transmute::<usize, *mut u8> UB, contrary to the safe cast my_usize as *mut u8?

I guess this is a bit of a moot point, since "safely transmutable" is effectively defined in terms of "whatever the Rust devs decide is well-defined".

The ptr-to-int transmute drops provenance, corresponding to ptr.addr(). That's different from as usize, which is ptr.expose_addr() and preserves provenance. That's why it can't be round tripped back into a pointer, since it has no provence. If it has no provenance, turning it into a pointer is UB since the pointer needs provenance.

See core::ptr::invalid's implementation, which uses transmute precisely because it creates an invalid pointer.

Sorry, I forgot the example was using b"" rather than "". But I think the same point applies. The example b"" as *const u8 as *const core::ffi::c_char will get rewritten into something involving a cast from &[u8] to &[i8]. This operation is well-defined, but std doesn't provide a safe function for performing it (as far as I know). So we'll eventually need to write or import a library that does provide such a function so we can do the necessary rewriting.

It is in core. You can convert safely through CStr:

@kkysen
Copy link
Contributor Author

kkysen commented May 25, 2023

@spernsteiner, could you take a look at the new separated CastKind handling (4bbb306) if that and the reversion in 720c2fb addresses your comment here: #839 (comment)?

Also, regarding #909 (comment), I'm not sure why #919 is causing that crash related to FIXED. Even without #839 at all, this function

pub fn cast_array_to_ptr_explicit(s: &[u8; 0]) {
    std::ptr::addr_of!(*s) as *const u8;
}

causes master to crash with that same error:

thread 'rustc' panicked at 'building TypeDesc for FIXED pointer requires a related pointee type', c2rust-analyze/src/type_desc.rs:75:5

@spernsteiner
Copy link
Contributor

spernsteiner commented May 26, 2023

I'll investigate that panic next week and also take a look at this PR.

spernsteiner added a commit that referenced this pull request Jun 5, 2023
@kkysen
Copy link
Contributor Author

kkysen commented Jun 5, 2023

@spernsteiner, I've merged master into this PR after you just merged #945, so that crash is fixed now.

c2rust-analyze/src/dataflow/type_check.rs Show resolved Hide resolved
c2rust-analyze/src/dataflow/type_check.rs Show resolved Hide resolved
c2rust-analyze/src/dataflow/type_check.rs Outdated Show resolved Hide resolved
c2rust-analyze/src/util.rs Outdated Show resolved Hide resolved
c2rust-analyze/src/util.rs Outdated Show resolved Hide resolved
c2rust-analyze/src/util.rs Show resolved Hide resolved
c2rust-analyze/src/util.rs Outdated Show resolved Hide resolved
…to use well-defined instead of safe and to use "implies" instead of "and".
…rom `{pl,rv}_lty`, as they are interchangeable.
…han `N > 0` to avoid ZSTs, as then the rule would be unsound.
… it's only sound for non-empty slices, but we can't check that at compile-time.
@kkysen kkysen force-pushed the kkysen/analyze-string-casts branch from 6b3bdd8 to 9b43bd6 Compare June 8, 2023 20:33
@kkysen kkysen requested a review from spernsteiner June 8, 2023 20:34
Copy link
Contributor

@spernsteiner spernsteiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good once the conflict + CI is fixed

c2rust-analyze/src/util.rs Outdated Show resolved Hide resolved
@kkysen
Copy link
Contributor Author

kkysen commented Jun 9, 2023

@spernsteiner, given #934 is all approved now, if you plan to merge it tomorrow, I'll wait until that merges to rebase/merge this before merging it into master, as I also want to check if #934 fixes the implicit &raw issue (#909).

@kkysen
Copy link
Contributor Author

kkysen commented Jun 9, 2023

@spernsteiner, given #934 is all approved now, if you plan to merge it tomorrow, I'll wait until that merges to rebase/merge this before merging it into master, as I also want to check if #934 fixes the implicit &raw issue (#909).

Actually, there seem to be issues with #934 running on the string_casts.rs tests, and I think it's issues with #934, so I'm going to merge this PR first and then you can take a look at those issues before merging #934.

@kkysen kkysen merged commit d0fa6ea into master Jun 9, 2023
9 checks passed
@kkysen kkysen deleted the kkysen/analyze-string-casts branch June 9, 2023 08:19
kkysen added a commit that referenced this pull request Jun 12, 2023
…tring literals in `string_casts.rs` and `lighttpd-minimal`.
kkysen added a commit that referenced this pull request Jun 12, 2023
…iled string literals in `string_casts.rs` and `lighttpd-minimal`.
kkysen added a commit that referenced this pull request Jun 12, 2023
…955)

With #839 (adding string cast support) and #934 (fixing #909) now
merged, we can now enable transpiled string literals in
`string_casts.rs` and `lighttpd-minimal`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(c2rust-analyze) Handle compatible ptr-to-ptr casts
4 participants