Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Represent void* types as &[MaybeUninit<u8>] #2065

Closed
ryancerium opened this issue Sep 26, 2022 · 19 comments
Closed

Feature request: Represent void* types as &[MaybeUninit<u8>] #2065

ryancerium opened this issue Sep 26, 2022 · 19 comments

Comments

@ryancerium
Copy link
Contributor

ryancerium commented Sep 26, 2022

Motivation

In 0.40.0 const void* was represented as a &[u8] and a void * was represented as a mut& [u8]. This was an improvement over using separate core::ffi::c_void and cbattribute parameters where you could pass in the wrong size for the buffer. That might be a feature in some cases.

With a couple of helper functions the single parameter was easy to use:

unsafe fn as_u8_slice<T: Sized>(t: &T) -> &[u8] {
    std::slice::from_raw_parts(t as *const T as *const u8, std::mem::size_of::<T>())
}

unsafe fn as_u8_slice_mut<T: Sized>(t: &mut T) -> &mut [u8] {
    std::slice::from_raw_parts_mut(t as *mut T as *mut u8, std::mem::size_of::<T>())
}

If the metadata for the Win32 parameter is optional, the Rust projection should be an Option<&[u8]>.

If the Rust buffer parameter is None, then 0 should be passed to the cbattribute Win32 function parameter.

Drawbacks

It requires unsafe to convert a &T to a &[u8]. It also requires unsafe to convert it to a c_void and get the size of the object, which might be more error prone.

Rationale and alternatives

Current use of c_void and a separate size parameter isn't very Rusty and allows for simple copy/paste mistakes.

Edit: The padding bytes in an object are uninitialized, so it's not safe to convert the object into a slice. A & [MaybeUninit<u8>] is safe however.

Additional context

0.40.0 had this feature and it worked quite well I thought.

@ryancerium ryancerium added the enhancement New feature or request label Sep 26, 2022
@kennykerr
Copy link
Collaborator

That transformation was removed in #2046 in response to #2044.

@kennykerr
Copy link
Collaborator

Happy to consider a compromise solution. It's certainly useful when it works. 😊

@kennykerr kennykerr added question Further information is requested and removed enhancement New feature or request labels Sep 26, 2022
@ryancerium
Copy link
Contributor Author

In #2044 I think the chief complaint is this:

There is no API guarantee that &FILE_BASIC_INFO is punnable to &[u8] without a blessed way of converting between the two

Is it possible to make a sound API for that? Are my little one-liners even sound?

Within the confines of safe Rust, any &T has to be a reference to a valid T, which will have std::mem::size_of::<T>() initialized bytes. In unsafe Rust you could do something stupid like 0usize as *mut c_void as *mut FILE_BASIC_INFO, which is no different than C++'s const T& t = *((T*) 0);. Don't do stupid things, Sparky.

I'm not sure about arrays of objects though; I haven't run into them in my small window manager yet.

@ryancerium
Copy link
Contributor Author

ryancerium commented Sep 26, 2022

@rylev What are the sound options for code that's something like:

let mut file_basic_info: FILE_BASIC_INFO = Default::default();
SetFileInformationByHandle(
    handle,
    FileBasicInfo,
    as_u8_slice(&file_basic_info));

@wesleywiser
Copy link

wesleywiser commented Sep 27, 2022

Within the confines of safe Rust, any &T has to be a reference to a valid T, which will have std::mem::size_of::() initialized bytes.

Unfortunately, that's not always true. T may contain padding bytes which are uninitialized and so the conversion from &T to &[u8; std::mem::sizeof::<T>()] will read uninitialized memory which is UB. Some Rust libraries like bytemuck or typic can help you perform this kind of conversion in a safe & sound way. In the future, this will probably be built into Rust via the Safe Transmute Project.

@ryancerium
Copy link
Contributor Author

Does std::slice::from_raw_parts read the memory? I thought it just made a slice (pointer+size) but never dereferenced the memory. The Win32 implementation would (presumably) cast the slice pointer to a C++ T*, which necessarily has the same layout as the Rust &T, right? So nobody would in practice be dereferencing the padding bytes.

Is passing in a manually created pointer+size defined behavior?

@wesleywiser
Copy link

Yeah, my answer was a bit imprecise. std::slice::from_raw_parts doesn't dereference the memory but I think this still runs afoul of the following:

It is the programmer's responsibility when writing unsafe code to ensure that any safe code interacting with the unsafe code cannot trigger these behaviors.
...
Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):
...

  • An integer (i*/u*), floating point value (f*), or raw pointer obtained from uninitialized memory, or uninitialized memory in a str.
    ...
  • A reference or Box that is dangling, unaligned, or points to an invalid value.
    ...

The data the slice points to (which is the same rationale as the reference case) is invalid because it contains an integer obtained from uninitialized memory.

There's related discussion on the bytemuck crate here: Lokathor/bytemuck#86

@ryancerium
Copy link
Contributor Author

Do slices contain the data, or point to the data? I always thought they were a pointer+size, which meant the subscript operation is a pointer offset operation, which would be UB to access the padding bytes, but not UB at long as they're never read or written. Since the Win32 function isn't modifying the padding bytes, it's modifying the initialized data bytes for the struct in question, it could be ok.

I'm still curious if there's any UB with passing in a Rust pointer. That seems more dangerous to me, even if there's no UB.

I'm not knowledgeable about this stuff at all, I'm really just trying to understand when UB does occur and when it could occur but only if you take that next step. Thanks for your help and patience explaining it to me.

@rylev
Copy link
Contributor

rylev commented Sep 28, 2022

The slice is a pointer plus a length, but the docs for from_raw_parts make it clear that the memory the pointer points to must be valid for reads. I'm unsure if this holds too if you're creating a &[MaybeUninit<u8>] though...

@wesleywiser
Copy link

I believe performing a conversion from &T to &[MaybeUninit<u8>] would be ok but you still can't read the uninitialized bytes without invoking UB (reading any MaybeUninit<u8> values that correspond to initialized memory in T is fine). The docs for MaybeUninit say:

The compiler, in general, assumes that a variable is properly initialized according to the requirements of the variable’s type. For example, a variable of reference type must be aligned and non-null. This is an invariant that must always be upheld, even in unsafe code. As a consequence, zero-initializing a variable of reference type causes instantaneous undefined behavior, no matter whether that reference ever gets used to access memory

@ryancerium
Copy link
Contributor Author

The documentation for MaybeUninit specifically uses uninitialized arrays as examples, so a [MaybeUninit<u8>] in and of itself is fine: https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#initializing-an-array-element-by-element

There are nightly APIs, MaybeUninit::as_bytes() and MaybeUninit::as_bytes_mut(), that do something very close to what I wrote above, and someone in the tracking issue at least thought about its effect on padding bytes.

Windows-rs could use those functions (well, custom equivalents until they're stabilized) to convert a mut T into a mut MaybeUninit<T> into a &mut [MaybeUnint<u8>] to pass into Win32 functions.

let mut file_basic_info: MaybeUninit<FILE_BASIC_INFO> = MaybeUninit::new(FILE_BASIC_INFO::default());
SetFileInformationByHandle(
    handle,
    FileBasicInfo,
    file_basic_info.as_bytes_mut());
let mut file_basic_info = file_basic_info.assume_init();

So now the new request in this Issue is to use &[MaybeUninit<u8>] and &mut [MaybeUninit<u8>] instead of separate core::ffi::c_void and cbattribute parameters. This would need to be documented like crazy and used in examples extensively so that people know the blessed way of passing in arbitrary things to Win32 void* parameters.

@ryancerium ryancerium changed the title Feature request: Represent void* types as &[u8] again Feature request: Represent void* types as ~&[u8] again~ &[MaybeUninit<u8>] Sep 29, 2022
@ryancerium ryancerium changed the title Feature request: Represent void* types as ~&[u8] again~ &[MaybeUninit<u8>] Feature request: Represent void* types as &[MaybeUninit<u8>] Sep 29, 2022
@kennykerr
Copy link
Collaborator

Sounds like this has potential but depends on experimental APIs, so we may have to postpone until those APIs are stabilized.

@kennykerr kennykerr added enhancement New feature or request blocked and removed question Further information is requested enhancement New feature or request labels Sep 30, 2022
@ryancerium
Copy link
Contributor Author

Nah, it's an easy conversion to write once you know the proper incantation:

trait AsVoidPointer
where
    Self: Sized,
{
    unsafe fn as_const_void_pointer(&self) -> &[core::mem::MaybeUninit<u8>] {
        std::slice::from_raw_parts(
            self as *const Self as *const core::mem::MaybeUninit<u8>,
            std::mem::size_of::<Self>(),
        )
    }

    unsafe fn as_mut_void_pointer(&mut self) -> &mut [core::mem::MaybeUninit<u8>] {
        std::slice::from_raw_parts_mut(
            self as *mut Self as *mut core::mem::MaybeUninit<u8>,
            std::mem::size_of::<Self>(),
        )
    }
}

impl AsVoidPointer for RECT {}
impl AsVoidPointer for FILE_BASIC_INFO {}

I'm not sure how a blanket impl like this example will impact compile times though. Bike shedding the name of the trait and functions would be the hardest part I think, because they're genuinely important.

@kennykerr
Copy link
Collaborator

Sure, but I'd rather not introduce a temporary fix when something may be on its way from the std library. Rather spend the effort getting it stabilized.

@ryancerium
Copy link
Contributor Author

ryancerium commented Oct 4, 2022

The MaybeUninit track requires the Win32 struct to be necessarily wrapped in a MaybeUninit. This makes it harder to pass in function parameters that are references since the MaybeUninit takes ownership of the struct.

// Passing in a parameter is easy-peasy with the trait function defined above
//
pub fn dwm_get_window_attribute_extended_frame_bounds2(
    hwnd: HWND,
    rect: &mut RECT,
) -> windows::core::Result<()> {
    unsafe {
        DwmGetWindowAttribute(hwnd, DWMWA_EXTENDED_FRAME_BOUNDS, rect.as_void_pointer())
    }
}


// Using the MaybeUninit<RECT> to get a &[MaybeUninit<u8>] is cumbersome,
// and requires two extra copies of the data I think
//
pub fn dwm_get_window_attribute_extended_frame_bounds3(
    hwnd: HWND,
    rect: &mut RECT,
) -> windows::core::Result<()> {
    unsafe {
        let r = MaybeUninit::new(rect.clone());
        let result = DwmGetWindowAttribute(hwnd, DWMWA_EXTENDED_FRAME_BOUNDS, r.as_bytes_mut());
        *rect = r.assume_init();
        result
    }
}

@wesleywiser
Copy link

wesleywiser commented Oct 4, 2022

I'm definitely not a Windows API expert so I could very well be mistaken, but I think writing the code like this would be more idiomatic as well as resolve your concern about the double copying:

pub fn dwm_get_window_attribute_extended_frame_bounds3(
    hwnd: HWND
) -> windows::core::Result<RECT> {
    unsafe {
        let r = MaybeUninit::<RECT>::uninit();
        DwmGetWindowAttribute(hwnd, DWMWA_EXTENDED_FRAME_BOUNDS, r.as_bytes_mut())?;
        Ok(r.assume_init())
    }
}

How do you feel about that?

@ryancerium
Copy link
Contributor Author

This is the vastly superior strategy for my toy sample, yes. But now the &mut RECT isn't a parameter, which it may well need to be in the grander scheme of application programming. And even so, it's simpler to skip the MaybeUninit:

pub fn dwm_get_window_attribute_extended_frame_bounds3(
    hwnd: HWND
) -> windows::core::Result<RECT> {
    unsafe {
        let r = RECT::default();
        DwmGetWindowAttribute(hwnd, DWMWA_EXTENDED_FRAME_BOUNDS, r.as_void_pointer())?;
        Ok(r)
    }
}

@kennykerr
Copy link
Collaborator

Thanks for the feedback! This is definitely something to keep in mind as we continue to improve windows-rs.

@wesleywiser and I were chatting about this. While this is an interesting problem, it is also one of the grey areas of writing unsafe code that is still being figured out and best to be overly cautious for now. So, I'm reluctant to jump ahead with a solution when it works just fine right now. We can let this concept bake in the language and std library before jumping in with extra support. I'm going to close this issue for now and we can revisit in future, but feel free to keep the conversation going.

@ryancerium
Copy link
Contributor Author

Yeah, passing in arbitrary objects as void * is somewhere to tread cautiously. Hopefully the community helps create a best-practice for FFI in this area.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants