Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HLOCAL and HGLOBAL typedefs #1296

Closed
AArnott opened this issue Oct 5, 2022 · 18 comments
Closed

Add HLOCAL and HGLOBAL typedefs #1296

AArnott opened this issue Oct 5, 2022 · 18 comments
Assignees
Labels
enhancement New feature or request usability Touch-up to improve the user experience for a language projection

Comments

@AArnott
Copy link
Member

AArnott commented Oct 5, 2022

Consider LocalAlloc, which takes two parameters (one being a pointer-sized integer) and returns a pointer. In the metadata, both pointer-sized values are represented merely as native int (well, one is native uint, but that's irrelevant). As a result, a projection cannot discern between what should be expressed as an integer vs. a pointer. In the LocalAlloc case, the optimal declaration would be

void* LocalAlloc(LOCAL_ALLOC_FLAGS, nuint);

LocalFree is another example, where both the parameter and the return type are actually supposed to be pointers.

And the info we want is in the headers. For example in the headers, SIZE_T is often used for a pointer-sized integer, whereas HLOCAL is used where we want a pointer. HLOCAL is a typedef of HGLOBAL which is a typedef of HANDLE, which ... is undefined except that it's pointer-sized and should generally be considered an opaque value.

So we need two things:

  1. A mechanism in the metadata to distinguish between pointers and pointer-sized integers.
  2. A place in this repo (maybe a .rsp file) in which to indicate which typedefs to express as pointers instead of as pointer-sized integers.

Consider maybe creating 3 classifications: pointers, pointer-sized integers, and pointer-sized opaque values. in C#, we have distinct types we can use for all 3 of these.

See also #983, which may be considered a very specific form of this problem.

@mikebattista mikebattista added enhancement New feature or request usability Touch-up to improve the user experience for a language projection labels Oct 5, 2022
@kennykerr
Copy link
Contributor

LocalAlloc seems like a very obscure example. It doesn't return a pointer because in some cases the resulting "memory object" needs to be accessed via LocalLock and thus is not merely a pointer (and may not be a pointer at all). The fact that LocalAlloc returns IntPtr seems appropriate, although it might make more sense to do as the Windows headers do and actually have an HLOCAL type that binds these functions together.

Contrast that with the far more common (and efficient) HeapAlloc function which returns an actual pointer.

@AArnott
Copy link
Member Author

AArnott commented Oct 8, 2022

Thank you for your insights, @kennykerr. Using HLOCAL in this case may work out, although maybe I'd want to special case it in C# so that it was particularly easy to convert to or use as a pointer since that's probably(?) the majority case.

@ChrisDenton
Copy link
Contributor

ChrisDenton commented Feb 15, 2023

I think it's important to better distinguish between pointers and pointer-sized integers. The pointer vs. integer distinction matters to languages such as Rust. And the metadata can't always guess if something is "really" a pointer or not.

Take a struct like OVERLAPPED. The hEvent field in the takes a HANDLE which is in the metadata is defined as a native int, unlike the C/C++ headers that define HANDLE as void*. The ReadFileEx function has this to say about the hEvent field:

The ReadFileEx function ignores the OVERLAPPED structure's hEvent member. An application is free to use that member for its own purposes in the context of a ReadFileEx call. ReadFileEx signals completion of its read operation by calling, or queuing a call to, the completion routine pointed to by lpCompletionRoutine, so it does not need an event handle.

So hEvent is really a union of a HANDLE or a pointer or some other pointer-sized thing, which is just another way of saying void*. This is not really that uncommon in callback functions or the structs that are passed to them.

I think the best thing to do is to follow the C/C++ headers more accurately.

@riverar
Copy link
Collaborator

riverar commented Feb 15, 2023

Handles are not pointers so this looks correct though.

@ChrisDenton
Copy link
Contributor

Handles are not pointers so this looks correct though.

That has a couple of problems:

  • In the SDK headers, a HANDLE is used interchangeably as both an actual HANDLE and as a HANDLE_OR_POINTER type (obv. not an actual type). I outlined one such case above. Either the metadata should use void* to cover both cases or alternatively we would need to audit all the places HANDLE is used.
  • In C/C++ code using the standard SDK headers, they're being stored and passed as pointers. Whereas in Rust native int is being (correctly) mapped to isize (a pointer-sized integer). This is an issue for cross-language LTO. For example, LLVM will see a lot of int->pointer->int->pointer->int casts.

@riverar
Copy link
Collaborator

riverar commented Feb 15, 2023

A similar discussion on HANDLE provenance occurred here microsoft/windows-rs#1643

tl;dr: HANDLE in the headers are void* as there was no other native integer type to use for the same effect. They are not pointers. Whether or not a user stuffs a integer, pointer, char, etc. into a handle, like hEvent above, is not relevant to metadata. Also, whether or not a Windows API implementation is using valid pointers as HANDLEs is also not relevant to metadata or developers.

I'm not aware of any cases where a returned handle is designed to be used as a pointer by the developer.

@ChrisDenton
Copy link
Contributor

A similar discussion on HANDLE provenance occurred here microsoft/windows-rs#1643

And also here and here (where I'm coming from).

HANDLE in the headers are void* as there was no other native integer type to use for the same effect. They are not pointers.

But, whatever the reason for it, they are stored and passed as pointers in C++ code that uses the headers.

I'm not aware of any cases where a returned handle is designed to be used as a pointer by the developer.

That's all very well and good but existing C++ libraries do and some parts of the docs even encourage it.

@riverar
Copy link
Collaborator

riverar commented Feb 15, 2023

My understanding is that the call for handle strict provenance research required folks to go back to the handle source and figure out what the value truly represents. Or in other words, tease apart the intent from the implementation. I did a bunch of that work, scouring through Windows 1.0 manuals and notes, code, etc. and couldn't find a case where a handle returned by Windows APIs was meant to be used as a true pointer to a chunk of memory. The manuals really did document this as an opaque integer. So I'm in mind, the provenance of handles, as an integer type, for the Win32 API surface has been effectively established/settled.

That said, I recognize--and you point out--there are cases where Windows APIs have handle-typed inputs that can be used to smuggle pointers around. In those cases, I believe we have to determine whether or not the intent of that parameter is to pass pointers around.

So looking at OVERLAPPED specifically... The hEvent field needs to be set to a valid event handle for ReadFile, WriteFile, etc. to signal completion. Not a pointer to anything. ReadFileEx, WriteFileEx, etc. alternatively rely on a completion routine, which means the hEvent is unused and may be uninitialized memory. Still not expected to be a pointer to anything.

The sticking point appears to be that the docs also indicate that apps are free to use that member for whatever they want. Ugh. Metadata could just slap a pointer type on that field and call it day. But in a hypothetical system where pointer provenence is validated (e.g. CHERI), problems will arise in scenarios where the field is used correctly by devs to hold normal event handles!

I'm not sure there's a clear path forward here.

We could...

  1. Add two distinct definitions of the OVERLAPPED structure--one with native integer hEvent for devs that use ReadFile/WriteFile/etc. and one with a pointer type for devs that use ReadFileEx/WriteFileEx/etc. But this feels like a slippery slope--developers can stuff pointers into any non-pointer parameter/field, so why are treating this one as special?

  2. Leave it as a native integer, as per its original documented intent. The API, when it reads this field, requires it to be a valid event handle/non-pointer. When the API doesn't read this field, its type is application defined. Developers will have to cast away pointer information when storing their pointers here, which is lame.

  3. Change it to a pointer type. It may more accurately reflect how the structure field is used in some circles using ReadFileEx/WriteFileEx but comes at the expense of impacting others just using ReadFile. (Now they need to cast non-pointer event handles as pointers!)

@riverar
Copy link
Collaborator

riverar commented Feb 15, 2023

Or

  1. Continue the HANDLEs are pointers lie and change all handles to be pointers. This effectively kicks the provenance can down the road, so to speak. It would be difficult to run an app that relies on this metadata on any sort of provenance-validating system. And you'd really upset Aria (@Gankra). 😅

    It may align metadata closer with the C headers, I admit, but one of the goals of this metadata project is to augment the existing C headers with additional information on how to correctly and safely use those APIs.

@AArnott
Copy link
Member Author

AArnott commented Feb 16, 2023

change all handles to be pointers

There actually is one handle typedef in the metadata that is strictly 32-bit, IIRC. So we shouldn't change all of them, regardless.
I'm in favor of treating all handle types as opaque values (C# IntPtr as the field type within the typedef). If some handles are actually used (and perhaps even documented in text) as pointers, and that leads to some casting to be required, I'm ok with that. Although if the normative documentation or headers guarantee a handle is just a pointer, the metadata representing this somehow may be useful to some projections.
I'd also like pointer-sized integers to be represented as such (C# nint), and real pointers as actual pointers.

There's been a lot of talk about specific APIs in this discussion, and TBH I haven't studied it all. My main interest is that the metadata have a prescribed way by which an API can be distinguished between pointer sized ints and pointers, and that the bulk of the APIs be represented accurately. After that, specific APIs can be switched from one representation to another by individual issues and discussions.

@KalleOlaviNiemitalo
Copy link

There actually is one handle typedef in the metadata that is strictly 32-bit, IIRC.

Is it MSIHANDLE? This one is a bit special in that Windows can expect an application to parse it from a string; see the CommandLine column of MsiEmbeddedChainer Table.

HINSTANCE and HMODULE are documented to be base addresses; but LoadLibraryEx can also set some of the least significant bits to indicate how the library was loaded.

@AArnott
Copy link
Member Author

AArnott commented Feb 21, 2023

Is it MSIHANDLE?

Yes, that's the one.

@riverar
Copy link
Collaborator

riverar commented Feb 21, 2023

Consider LocalAlloc, which takes two parameters (one being a pointer-sized integer) and returns a pointer.

LocalAlloc returns an opaque handle because the heap allocated memory can wiggle around (see LMEM_MOVEABLE flag). Users must pass this handle to LocalLock to obtain a pointer to the first byte of the memory block.

Edit: Just realized I just repeated what Kenny said above #1296 (comment). Ignore me.

Are there any actionable tasks/issues in this thread?

@AArnott
Copy link
Member Author

AArnott commented Feb 22, 2023

From my issue description:

Consider maybe creating 3 classifications: pointers, pointer-sized integers, and pointer-sized opaque values. in C#, we have distinct types we can use for all 3 of these.

I'd like to get a clear idea of what the metadata does or may be able to offer for these. Here's my draft:

Classification Metadata representation
Pointer SomeStruct*
Pointer-sized integers native int (aka IntPtr)
Pointer-sized opaque values typedef structs (e.g. HANDLE)

LocalAlloc, LocalLock and LocalFree (and maybe some other of the Local* methods) currently use native int in place of the HLOCAL typedef used in the docs and header files. That would be in violation of the 3rd row above.
Can this be fixed?

@riverar
Copy link
Collaborator

riverar commented Feb 22, 2023

That looks fair to me; so we need to change the LocalXxx APIs to return a HANDLE. Probably worth covering GlobalXxx too.

@AArnott
Copy link
Member Author

AArnott commented Feb 22, 2023

HANDLE? Who not HLOCAL as per the header files? Since the handle should be freed via LocalFree, having its own typedef will avoid folks freeing it with CloseHandle when they should use LocalFree.

@riverar
Copy link
Collaborator

riverar commented Feb 22, 2023

HLOCAL indeed. Will make sure you're tagged on review.

@workingjubilee
Copy link

I feel it's worth noting, for the various kinds of smuggling data:

  • It's still valid to shuffle integers around on CHERI-style architectures, then offset them on an already-valid pointer, in the "relative addressing" style.
  • In fact, CHERI architectures can use a "hybrid" ABI that allows doing this to all reads and writes in a section of a program's address space that were compiled to use "legacy pointers", implicitly offsetting them on a "full" capability held in a particular register.
  • It's also valid to smuggle data in the low bits of most CHERI pointers today, especially if known-valid due to alignment. The currently-used 129-bit capability pointer has an exponent, like a floating-point number does, so their bounds can be deliberately slightly imprecise. It both helps legacy software, and without this compression, a 64-bit address space might need capability pointers of as much as 257 bits!

@mikebattista mikebattista assigned mikebattista and unassigned chenss3 Mar 5, 2023
@mikebattista mikebattista changed the title Metadata should distinguish between pointer-sized integers and pointers Add HLOCAL and HGLOBAL typedefs Mar 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request usability Touch-up to improve the user experience for a language projection
Projects
None yet
Development

No branches or pull requests

8 participants