Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No sound way to zero-copy from socketaddr_in #323

Open
jeff-hiner opened this issue Aug 4, 2022 · 6 comments
Open

No sound way to zero-copy from socketaddr_in #323

jeff-hiner opened this issue Aug 4, 2022 · 6 comments

Comments

@jeff-hiner
Copy link

This is most painful on Windows, where SOCKADDR_STORAGE is 132 bytes long.

Consider the following naive example code, where the SOCKADDR pointer here is a pointer to an ipv4/ipv6 address obtained via syscall:

unsafe fn sock_addr_to_ip(ptr: *const SOCKADDR, length: i32) -> Option<IpAddr> {
    let sa = socket2::SockAddr::new(*(ptr as *const SOCKADDR_STORAGE), length);
    sa.as_socket().map(|s| s.ip())
}

The issue is that SOCKADDR_STORAGE is larger than SOCKADDR, the pointer is only guaranteed to be valid through length bytes, and so casting it to *const SOCKADDR_STORAGE and dereferencing immediately introduces UB.

The correct code looks more like this:

unsafe fn sock_addr_to_ip(ptr: *const SOCKADDR, length: i32) -> Option<IpAddr> {
    assert!(length as usize <= std::mem::size_of::<SOCKADDR_STORAGE>());
    let mut storage = std::mem::MaybeUninit::<SOCKADDR_STORAGE>::uninit();
    std::ptr::copy_nonoverlapping(ptr, &mut storage as *mut _ as *mut _, length as _);
    let sa = socket2::SockAddr::new(storage.assume_init(), length);
    sa.as_socket().map(|s| s.ip())
}

Since the backing storage for SockAddr is owning, we have to make a copy. It's incredibly awkward and a bit slower. And that's just so we can safely invoke the nice parsing logic in as_socket().

To fix this we'd probably have to add a SockAddrRef<'a> type that doesn't own its backing storage. If that's not immediately feasible, providing a constructor that performs this unsafe copy inside socket2 would also solve the immediate issue.

@Thomasdezeeuw
Copy link
Collaborator

This is most painful on Windows, where SOCKADDR_STORAGE is 132 bytes long.

On Unix it's a 128 bytes, so I don't really see much of a difference...

Consider the following naive example code, where the SOCKADDR pointer here is a pointer to an ipv4/ipv6 address obtained via syscall:

unsafe fn sock_addr_to_ip(ptr: *const SOCKADDR, length: i32) -> Option<IpAddr> {
    let sa = socket2::SockAddr::new(*(ptr as *const SOCKADDR_STORAGE), length);
    sa.as_socket().map(|s| s.ip())
}

The issue is that SOCKADDR_STORAGE is larger than SOCKADDR, the pointer is only guaranteed to be valid through length bytes, and so casting it to *const SOCKADDR_STORAGE and dereferencing immediately introduces UB.

The correct code looks more like this:

unsafe fn sock_addr_to_ip(ptr: *const SOCKADDR, length: i32) -> Option<IpAddr> {
    assert!(length as usize <= std::mem::size_of::<SOCKADDR_STORAGE>());
    let mut storage = std::mem::MaybeUninit::<SOCKADDR_STORAGE>::uninit();
    std::ptr::copy_nonoverlapping(ptr, &mut storage as *mut _ as *mut _, length as _);
    let sa = socket2::SockAddr::new(storage.assume_init(), length);
    sa.as_socket().map(|s| s.ip())
}

Since the backing storage for SockAddr is owning, we have to make a copy. It's incredibly awkward and a bit slower. And that's just so we can safely invoke the nice parsing logic in as_socket().

Have you tried SockAddr::init? It's goal is to be used in system calls where the OS initialises it, e.g. see the accept implementation.

To fix this we'd probably have to add a SockAddrRef<'a> type that doesn't own its backing storage. If that's not immediately feasible, providing a constructor that performs this unsafe copy inside socket2 would also solve the immediate issue.

How would that fix it? you still need a pointer/reference to SOCKADDR_STORAGE to support all kinds of addresses, so you still wouldn't be able to use SOCKADDR. Maybe, and this is a big maybe, it can be done with a raw pointer and on-demand dereference while avoid UB, but I'm not sure.

@jeff-hiner
Copy link
Author

Have you tried SockAddr::init? It's goal is to be used in system calls where the OS initialises it, e.g. see the accept implementation.

In this case I'm getting a linked list of minimally-sized sockaddr_in / sockaddr_in6 populated by an OS call. I don't pass it pointers to structs, as with accept. (The OS call on Windows is GetAdaptersAddresses, if you're curious. It gives you this.)

How would that fix it? you still need a pointer/reference to SOCKADDR_STORAGE to support all kinds of addresses, so you still wouldn't be able to use SOCKADDR. Maybe, and this is a big maybe, it can be done with a raw pointer and on-demand dereference while avoid UB, but I'm not sure.

That's more or less what I'm suggesting, yes.

As a point of order, SOCKADDR_STORAGE / sockaddr_storage is a implementation-specific struct guaranteed to be large enough to hold ANY sockaddr union-ish struct on the system: sockaddr_in, sockaddr_in6, sockaddr_dl, etc. It's purpose is allocating a statically-sized object, passing *mut sockaddr_storage to arbitrary C code and letting it populate arbitrary AF_* types, with the guarantee that no OOB accesses will occur.

I'm going to switch to non-shouty-case from here, because the discussion applies broadly to POSIX as well.

To reiterate, sockaddr_in is only 16 bytes, and for any const access in C FFI it's usable as-is. You can pass its pointer directly to POSIX syscalls. In our case we only need a sockaddr_storage because we've declared our SockAddr as owning, and sockaddr_storage is the "least common denominator" for size. In our implementations we don't actually need a pointer to a full sockaddr_storage, we just need something we can on-demand coerce to a *const sockaddr that follows the C rules. If you start with e.g. a 28 byte sockaddr_in6 struct from arbitrary C code you should be able to store its pointer and length and socket2 should be able to just use it without having to extend or copy. Here are a few approaches:

  1. If SockAddrRef is generic over the length L, you can store a reference &'a[u8; L] instead of sockaddr_storage and do (unsafe) casts from a u8 buffer of length L to const* sockaddr for something like as_ptr. Note that this is of course UB if the type you want to convert to is larger than L, and you do have to verify at some point that it's properly aligned, but this matches what C expects from sockaddr *.
  2. You can internally store &'a[u8] instead of the sockaddr_storage struct. This removes the need for generics, but means you'd defer the length checks to where you call a requiring fn, and you have to do them at runtime.
  3. You can make SockAddrRef generic over a union subtype T that "looks like" a sockaddr. You would call SockAddrRef::new on a pointer and max length (or &'a[u8]), and the return is Result<SockAddrRef<'a, T>, E> with the alignment and size checked. In this case T would be generally a sockaddr_in or sockaddr_in6, with the type mapped from the original bytes. The E return is an enum over "definitely misaligned", "not enough bytes", and "I don't recognize this family". This approach is heavy on generics, and it does mean having to match in the new fn against AF_INET and the like. But out the other end comes the ability to statically say for a type "yes this is absolutely a valid *const sockaddr_in6" and "yes I can give you back a guaranteed safe *const sockaddr.

Making SockAddr itself generic over e.g. sockaddr_in, sockaddr_in6 makes the problem easy from an implementation standpoint, but means you can't parse arbitrary bytes as variable-length sockaddr. So I don't think that's going to help.

@Thomasdezeeuw
Copy link
Collaborator

In this case I'm getting a linked list of minimally-sized sockaddr_in / sockaddr_in6 populated by an OS call. I don't pass it pointers to structs, as with accept. (The OS call on Windows is GetAdaptersAddresses, if you're curious. It gives you this.)

Let's take a step back then. What are you trying to do with the sockaddr_in your getting from the OS? Are you trying to create std::net::SocketAddr and continue using that? Or are you trying to use socket2 functions (which need socket::SockAddr)?

How would that fix it? you still need a pointer/reference to SOCKADDR_STORAGE to support all kinds of addresses, so you still wouldn't be able to use SOCKADDR. Maybe, and this is a big maybe, it can be done with a raw pointer and on-demand dereference while avoid UB, but I'm not sure.

That's more or less what I'm suggesting, yes.

As a point of order, SOCKADDR_STORAGE / sockaddr_storage is a implementation-specific struct guaranteed to be large enough to hold ANY sockaddr union-ish struct on the system: sockaddr_in, sockaddr_in6, sockaddr_dl, etc. It's purpose is allocating a statically-sized object, passing *mut sockaddr_storage to arbitrary C code and letting it populate arbitrary AF_* types, with the guarantee that no OOB accesses will occur.

I'm going to switch to non-shouty-case from here, because the discussion applies broadly to POSIX as well.

To reiterate, sockaddr_in is only 16 bytes, and for any const access in C FFI it's usable as-is. You can pass its pointer directly to POSIX syscalls. In our case we only need a sockaddr_storage because we've declared our SockAddr as owning, and sockaddr_storage is the "least common denominator" for size. In our implementations we don't actually need a pointer to a full sockaddr_storage, we just need something we can on-demand coerce to a *const sockaddr that follows the C rules. If you start with e.g. a 28 byte sockaddr_in6 struct from arbitrary C code you should be able to store its pointer and length and socket2 should be able to just use it without having to extend or copy. Here are a few approaches:

1. If SockAddrRef is generic over the length L, you can store a reference `&'a[u8; L]` instead of `sockaddr_storage` and do (unsafe) casts from a u8 buffer of length L to `const* sockaddr` for something like `as_ptr`. Note that this is of course UB if the type you want to convert to is larger than L, and you do have to verify at some point that it's properly aligned, but this matches what C expects from `sockaddr *`.

2. You can internally store `&'a[u8]` instead of the `sockaddr_storage` struct. This removes the need for generics, but means you'd defer the length checks to where you call a requiring fn, and you have to do them at runtime.

3. You can make SockAddrRef generic over a union subtype T that "looks like" a `sockaddr`. You would call `SockAddrRef::new` on a pointer and max length (or `&'a[u8]`), and the return is `Result<SockAddrRef<'a, T>, E>` with the alignment and size checked. In this case `T` would be generally a `sockaddr_in` or `sockaddr_in6`, with the type mapped from the original bytes. The `E` return is an enum over "definitely misaligned", "not enough bytes", and "I don't recognize this family". This approach is heavy on generics, and it does mean having to match in the `new` fn against AF_INET and the like. But out the other end comes the ability to statically say for a type "yes this is absolutely a valid `*const sockaddr_in6`" and "yes I can give you back a guaranteed safe `*const sockaddr`.

Making SockAddr itself generic over e.g. sockaddr_in, sockaddr_in6 makes the problem easy from an implementation standpoint, but means you can't parse arbitrary bytes as variable-length sockaddr. So I don't think that's going to help.

I don't think it needs to be that complicated I think the following will do:

struct SocketAddrRef<'a> {
    storage: *const (), // Can be one of the socket address storage types (`sockaddr_in`, `sockaddr_in6`, `sockaddr_storage`, etc.)
    len: usize, // Must be `>= size_of::<sa_family_t>()`.
    _phantom: PhantomData<&'a ()>,
}

impl<'a> SocketAddrRef<'a> {
    /// # Safety
    ///
    /// Caller must ensure the pointer and length are correct and valid for the duration of `'a`.
    unsafe fn new<'a>(storage: *const (), len: usize) -> SocketAddrRef<'a> {
        debug_assert!(len >= size_of::<sa_family_t>());
        SocketAddrRef { storage, len, _phantom: PhantomData }
    }
}

@jeff-hiner
Copy link
Author

jeff-hiner commented Aug 8, 2022

Yeah, I think that would work. Would you suggest then implementing as_socket as a runtime match against the family, and a debug_assert against the length?

As to what we're trying to do, we have some unsafe code that's trying to extract IP addresses from those FFI structs. The legacy code was doing a transmute directly into std::net::SocketAddrV4 / SocketAddrV6 which was a bad idea that happened to work, but recent changes in the struct are forcing us to finally fix it. There's not yet a fn in stable Rust to say "give me a std::net::SocketAddr built from this sockaddr_in". Interestingly there's an implementation in nix that would work, but of course this is cross-platform code that has to work on Windows. We're already using socket2 elsewhere for some raw socket stuff and it appears to be almost what we need, so I'm reaching for it here.

@Thomasdezeeuw
Copy link
Collaborator

Yeah, I think that would work. Would you suggest then implementing as_socket as a runtime match against the family, and a debug_assert against the length?

Yes, something like the following.

match unsafe { std::ptr::read(self.storage as *const libc::sa_family_t) {
    AF_INET6 => // ...
}

But note that ptr::read does have various requires for proper reading, so we'll need to document those carefully.

As to what we're trying to do, we have some unsafe code that's trying to extract IP addresses from those FFI structs. The legacy code was doing a transmute directly into std::net::SocketAddrV4 / SocketAddrV6 which was a bad idea that happened to work, but recent changes in the struct are forcing us to finally fix it. There's not yet a fn in stable Rust to say "give me a std::net::SocketAddr built from this sockaddr_in". Interestingly there's an implementation in nix that would work, but of course this is cross-platform code that has to work on Windows. We're already using socket2 elsewhere for some raw socket stuff and it appears to be almost what we need, so I'm reaching for it here.

We went through the same problem (#122). I'm wondering where SockAddrRef is a good type or if sockaddr_in -> std::net::SocketAddr helper function would be easier. Because if we create a new SockAddrRef type people might expect it to be useful beyond converting it to another, which would likely result in a lot of duplicate API (between it and SockAddr).

@jeff-hiner
Copy link
Author

Yeah I think more I was trying to get folks thinking about a non-owned SockAddr with an external-backed pointer to something not a full length sockaddr_storage, and what that would look like. The impls already there are incredibly useful, they are almost what's needed, and honestly the functionality could probably be moved to a trait impl over both the owned and non-owned version (or impl the appropriate Derefs). It's just that the "owned sockaddr_storage" is too tight a bound for a lot of FFI code, and "pass me a *const sockaddr_storage" was a major footgun for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants