New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[strict provenance] Provide a way to "create allocations at a fixed address" #98593
Comments
Until such a function exists, one can use |
One thing I'd like to mention at the start is that hardware addresses should not be modeled as "owned" by anything inside Rust. They're generally volatile, and generally what you read from them might change without Rust doing anything on its end. At best we should think of them as Rust sharing the address with some external force. Because of this, I think a name like "alloc" would be a misstep. People have the notion/understanding that when you alloc some memory you own it while it's live, and then you (or some drop glue) explicitly does a "free", and and it's dead. That's not really how hardware access works at all, so for a new situation we pick a new name. Particularly, lots of existing and well working rust code doesn't bother with a "free" step at all when using hardware pointers. You just make up a pointer that you know is good, you read or write, and then you forget the pointer, and someone else might even be holding a pointer to that same location while you're doing all this, and it's all fine. If a person wants to build an ownership-style API on top of this using the borrow checker they can do that (there's several such examples), but that's separate from the base requirements of the abstract machine interacting with the hardware. |
Perhaps directly labelling these addresses as volatile might be the best way to go? This would also help for FFI as well, since you could make the distinction between an address for something that is "owned" inside the Rust code until passed back via FFI, or something that could concurrently be modified outside of Rust, by hardware or another thread. |
@Lokathor what you say also confirms my thoughts above that we want to use the same provenance for all calls to this function, and not generate a fresh one. That provenance is assumed to be exposed from the start, so it is basically public and can change any time control is given to other code (including via another thread). Using volatile accesses is a separate concern; Rust assumes for all pointers that if you do 2 non-atomic loads immediately after one another, they will give the same result -- there is no good way to opt-out of that assumption just for a specific provenance; this needs a more explicit marker at the relevant access. (I.e., making it volatile.)
There is no such thing as a volatile address. Being volatile is the property of an access. There are plenty of discussions around the semantics of volatile, their interaction with reference types, and so on; I would like to keep them out of here. This thread is solely about the name, signature, and provenance interaction of the function that is intended to replace those casts. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
There are plenty of cases where you want access at a particular address that has nothing to do with volatile. For example, you might write a kernel and know that the firmware put some data structure with important info at a hard-coded address. So, this API certainly should not be tied to volatile in any way. Just like ownership/borrowing, APIs for convenient (and maybe even safe) volatile access can be built on top of this lower-level primitive. That's why I marked the previous posts about volatile as off-topic. |
I'll raise this at the next embedded WG meeting (tues 28th 20:00 CEST) to see if anyone else has thoughts.
I'd think so, we will often have a pointer to a large struct or array or quite possibly an entire memory region that is only for MMIO with fixed addresses (e.g. on Cortex-M, 0x4000_0000 through 0x5FFF_FFFF is all peripheral MMIO). |
And then would you want it to be UB to use that pointer outside the given range? Or what would the exact consequence be of setting a particular size? |
If it didn't take a size, would you assume a single word provenance or the whole memory, like an escaped pointer? I was imagining you'd want to be able to restrict it to the smallest usable provenance, which is generally known statically or easily bounded to a memory region. If there's not really any advantage to the compiler to knowing the possible valid range of the pointer, then I suppose just taking the address and giving it a whole-memory provenance is a simpler API. My guess for a typical embedded use-case is you'd create one of these pointers for each instance of a peripheral on your chip, so each has a non-overlapping size of say <100 words, and probably create them on-demand each time you accessed the peripheral. |
It would be allowed to access all memory that is outside of what the Abstract Machine knows about (
This means multiple calls to that function can overlap anyway, so a size does not give any disjointness guarantees. |
A size/span restriction does not seem useful. |
Right, I should have read the issue more carefully. I agree then, it doesn't seem like there'd be any benefit to giving a size, even if it is known. For naming, I'm also not entirely sure about saying |
Thinking a bit about the use cases here, just to make sure I understand:
You mention specifically "for these pointers created from hard-coded addresses", would the latter two runtime-y cases also follow/benefit from this API? If all three are relevant, I'd propose |
"external address" seems like a good term, |
Yeah I considered "extern", too -- but the term is awfully overloaded.
This is specifically to replace current uses of |
That makes sense for the third point! My second item is more for implementing an operating system, where you might want to map a page, zero it, then pass it on to "userspace". If you were implementing ASLR for example, you'd (more or less) be generating the base mapped address from an RNG, which would probably have come from some kind of integer rather than pointer. That being said, the OS itself probably sees the physical memory address the same as case one. Once the OS establishes the pointer, and passes it to userspace (where userspace is in case three), you're right that userspace doesn't have to re-establish provenance, I guess. |
"foreign" maybe? so there's "program memory" and "foreign memory". The function could be |
@jamesmunns from the kernel perspective this is memory outside of what Rust knows about, so if there are int2ptr casts there then using this function probably makes sense. However now that if you e.g. implement a global allocator, then the memory return by that is considered to be "known to Rust" and thus must not be accessed with that "external"/"foreign" provenance. |
But what if I want to perform |
inline asm provenance creation rules should be the same as FFI provenance creation rules (whatever those are). I don't know if that's fully decided yet, but in all other situations inline asm works "basically like FFI with a weird calling convention", and I see no reason to break from that convention. |
|
one other case to consider is where the program picks an address, and then memmaps to that address, using something like let addr: *mut u8 = 0x12340000 as *mut u8; // pick an arbitrary page-aligned address
let addr = mmap(addr as *mut c_void, 0x10000, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_PRIVATE, fd, 0); imho |
That sounds pretty reasonable to me. |
In the embedded code I've encountered (STM32 mainly), this is the common pattern. unsafe {
// NOTE(unsafe) this reference will only be used for atomic writes with no side effects.
let rcc = &(*RCC::ptr());
// Enable clock.
$GPIOX::enable(rcc);
$GPIOX::reset(rcc);
} There will be an underlying crate that is mostly generated from the MCU vendor's SVD-files (often with some patch set with correction on top). The maps each MCU peripheral to a specific memory address impl GPIOA {
pub const PTR: *const gpioa::RegisterBlock = 0x4002_0000 as *const _;
pub const fn ptr() -> *const gpioa::RegisterBlock {
Self::PTR
}
} The pointer will be into some block, probably "Peripherals" below starting For the proposed provenance API, are we saying that we will declare the entire Peripheral block on startup and then take additional offsets from inside that block for the individual pointers? Given the current embedded landscape, I believe that having an API to declare provenance for an individual pointer would be less of an upheaval than going via a block. (As an aside, I'm also not keen on the name |
Yes that was my intent. We need to keep that code working for backwards-compatibility anyway. |
Makes sense. Why do we need a It could enable better optimizations, since the compiler knows |
It was mostly meant as an explicit bit of documentation, i.e., the programmer making their intent more clear. |
Especially if it is "the programmer making their intent more clear" I think there is use for the size field. Since as far as AM operations are concerned, the intrinsic is already superfluous since |
By that standard, why do we have any of the other functions ( The idea is to provide ways to do all these things while following Strict Provenance. And that means not using
Yes, I want to say that. You cannot access a Rust allocation, exposed or not, with the "foreign" provenance. |
@RalfJung , can you imagine a situation when difference between |
It's not unavoidable though. 🤷
|
Forbidding data races is okay. Forbidding faults is in general not okay, as it would prevent legitimate uses of faults such as pagable kernel heap memory. That actually has an additional constraint, which is that it can be safe to access some memory memory at time A or at time C, but not at time B (between A and C) because time B is one at which paging is not permitted. In short, for kernels and similarly low-level code, one wants to be able to reorder ordinary non-volatile loads and stores across most operations, but not across arbitrary function calls, because these might make the memory temporarily inaccessible. |
Another point to keep in mind: Accessing freed memory is UB. However, the memory allocator (almost by definition) needs to do exactly that! |
Sorry, I should have been more precise -- by "fault" I meant something that the program can actually observe. If the kernel or some other entity does some magic in the background, that is fine. But, crucially, all memory accesses must succeed and not have any side-effect besides reading/writing the given memory location. Otherwise, the compiler would not even be allowed to swap the order of two adjacent reads.
They will have to add suitable barriers or other annotations then. Rust is allowed to reorder some reads around arbitrary function calls. (In fact even C is allowed to do that, if the read-from memory is private to the current function. Rust is just allowed to do it in more situations due to its aliasing restrictions.) Anyway this is off-topic for this thread, since we are not talking about memory accesses here, only about how to get the pointer to some particular kind of memory. This has zero impact on the optimizations that Rust can do when it does not know where the pointer comes from, so everything that the compiler is allowed to do for arbitrary references / raw pointers, it is also allowed to do for references / raw pointers derived from |
I don't know that that means "alloc" is an inappropriate word, though. We describe |
on the name bikeshedding, what about something like I'm ambivalent on taking a size or not. on one hand, generally its known in some form when you're working with MMIO (there's usually a block of the memory space allocated to the device even if some or most of it is unusable), but what size should you use in the case of not knowing? (as unlikely as that seems to me) the point is that this memory is outside of the AM and thus isn't allowed to overlap it, right? would using |
Would this allow creating allocations at address zero? |
I think this will always be UB. Creating the allocation is probably fine, but actually using |
Indeed, Rust assumes that there is never an allocation at 0, so there can't be an exposed provenance that is valid for that address, either. |
Only non- |
The general conclusion in this thread regarding the original question seems to be: why have a Therefore, having a dedicated function for this case of accessing an 'external' address could help users of stable Rust away from |
Currently, when using EDIT: as one example: it would break all my tests in |
Personally I think "each calls returns a unique provenance of a given size" makes the most sense. It covers the most use cases with the least API surface area. Whole memory regions, splitting memory regions, "magical" allocators, etc. I'm not sure how useful it is to have "The One Unknowable Everything-Else Provenance" (i.e. no size parameter, single provenance for all "external"/non-rust addresses). Feels like that's only a very slightly smaller hammer than On that note, what about memory regions that can be relocated at runtime? Is this something that can be at all expressed under strict provenance (other than for discarding and re-creating the existing provenances)? I thought of |
Allocated objects in Rust cannot move. So such relocations (in general) have to be modeled like |
Supposing that this function did take a size, what would the semantics of a pointer created with size zero be? Would calling |
Doesn't Rust disallow memory allocations of size 0 (which is why Vec/Box/etc have special cases for ZSTs)? I would assume the same to apply here |
https://doc.rust-lang.org/nightly/std/alloc/trait.GlobalAlloc.html#safety-1 says it's UB to request allocations of size 0, which is different from creating an invalid pointer which is okay to be used for accesses of size 0 So my assumption is "Rust allocations cannot be zero-sized" would hold here? |
Heap allocations of size zero cannot be created with our global allocator. However, stack variables and global |
heap allocations of size zero can be created with some implementations of |
If only some native allocators support it then rust overall must assume it's not supported. |
yes, except that Rust can't declare that no zero-sized heap allocations are possible, they just can't be done with Rust's standard methods of allocating memory. |
I'm asking, because if so, it would make sense to implement |
ptr::invalid is a reasonable implementation for a zero-sized allocation (that cannot be freed).
|
This issue is part of the Strict Provenance Experiment - #95228
On some platforms it makes sense to just take a hard-coded address, cast it to a pointer, and starting working with that pointer -- because there are external environment assumptions that say that certain things are provided at certain addresses.
This is perfectly fine for Rust as long as that memory is entirely disjoint from all the memory that Rust understands (
static
globals, stack allocations, heap allocations). Basically we can think of there being a single hard-coded provenance for "all the memory that is disjoint from the Abstract Machine", and that is the provenance we would like to use for these pointers created from hard-coded addresses. These restrictions make this operation a lot easier to specify thanfrom_exposed_addr
. (Remember:from_exposed_addr
is outside of Strict Provenance. The goal of this issue is to provide a way to write such code while following Strict Provenance.)In the spirit of the Strict Provenance APIs, that means we probably want a function that does this, and that we can attach suitable documentation to. There are some open questions for the syntax and semantics of that function:
make_alloc
,assume_alloc
,hard_coded_alloc
? I am leaning towards something with "alloc" because this function is basically like an allocator, except that you tell it at which address to allocate and you have to promise that that is Okay To Do.usize
for the address and return*mut T
. Should it also take a size, saying how large this assumed allocation is?Cc @Lokathor who keeps mentioning this usecase every time I want to ban int2ptr casts. ;)
Tagging WG-embedded since that's where this kind of stuff mostly happens (AFAIK)
The text was updated successfully, but these errors were encountered: