Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustified_non_exhaustive_enum() can still cause UB at runtime? #1763

Closed
strohel opened this issue Apr 19, 2020 · 7 comments
Closed

rustified_non_exhaustive_enum() can still cause UB at runtime? #1763

strohel opened this issue Apr 19, 2020 · 7 comments

Comments

@strohel
Copy link
Contributor

strohel commented Apr 19, 2020

I'm researching what bindgen::Builder options to use for my -sys crate.

The rustified_enum() is clearly marked as potentially causing undefined behavior as described in rust-lang/rust#36927 and #667:

Use this with caution, you probably want to use the non_exhaustive flavor of rust enums instead of this one. Take a look at rust-lang/rust#36927 for more information.

The newer rustified_non_exhaustive_enum(), introduced in #1554, that does the same but marks the enum as #[non_exhaustive] is not marked as potentially dangerous, and is even mentioned as a solution in rustified_enum() docs (above).

rustified_non_exhaustive_enum() indeed seems to be used in the wild with assumption the UB cannot occur, e.g. in cholcombe973/libparted-sys#6.

However, I've been reading the RFC 2008 that introduces #[non_exhaustive] by @clarfon, and it doesn't seem to imply any such guarantees. Quite the contrary, it even mentions that compiler may (continue to) remove the wildcard arm (emphasis mine):

Outside the crate that defines the enum, users should be required to add a wildcard arm to ensure forward-compatibility, like so:
And it should not be marked as dead code, even if the compiler does mark it as dead and remove it.

It seems that #[non_exhaustive] is a compile-time check that doesn't have any effect on generated code, i.e. that there is still a chance of UB even with rustified_non_exhaustive_enum()? Or maybe there is some piece I'm missing that makes it safe (compiler being more cautious than RFC)?

For clarity I'm talking about the case where a C library returns an int value that doesn't map to any variants of the enum as known at compile time. I'll be happy to submit a PR to make docstrings clear once we answer the question here.

@clarfonthey
Copy link

clarfonthey commented Apr 19, 2020

If the C library can return values that are not valid under the enum type, that's not the fault of non_exhaustive, just the fault of bindgen, IMHO.

This should definitely be documented somewhere, but I think that a better way of representing "rusty" enums would be using a union, i.e.:

#[repr(C, u32)]
enum RustyEnum {
    VariantOne,
    VariantTwo,
}

union ActualType {
    rusty: RustyEnum,
    catchAll: u32,
}

Then, while you can pass the enums into a function via ActualType { rusty: /* ... */ }, you would have to do a match on the enum itself to get values returned by it.

That said, I think that this is really kind of an extreme case, and if your enum has unrepresented values being returned, the API should be fixed-- it's possible to get UB from external APIs easily and this is just another case of extra care being required to avoid it.

@strohel
Copy link
Contributor Author

strohel commented Apr 19, 2020

Thanks for quick reply, @clarfon. I feel obliged to give a bit more context given I've tagged you out of nowhere:

bindgen allows multiple modes of representing C enums [1], default being a plain number type + a set of constants:

pub const XML_ATTRIBUTE_CDATA: xmlAttributeType = 1;
pub const XML_ATTRIBUTE_ID: xmlAttributeType = 2;
pub type xmlAttributeType = u32;

Which is safe in a way that the author of a higher-level Rust wrapper library creates a Rust enum manually and must ensure the numeric values are safely converted to it.

Another mode rustified_enum() is a useful shortcut when the author of the -sys crate can ensure the C library can only return values representable in Rust (possibly by bundling reviewed C code with the crate itself). It looks like this:

#[repr(u32)]
pub enum xmlAttributeType {
    XML_ATTRIBUTE_CDATA = 1,
    XML_ATTRIBUTE_ID = 2,
}

This issue is about a rather subtle thing: the rustified_non_exhaustive_enum() mode, that sticks #[non_exhaustive] to the enum above, is mentioned as a possible mitigation of the UB arising from C library returning non-representable value -- my concern is that it might not be a correct recommendation.

[1] your suggestion being a possible addition to it, but I don't have necessary knowledge/experience to consider it.

@clarfonthey
Copy link

Yeah, you're definitely right in the fact that it's still UB, #[non_exhaustive] or not.

This definitely should be mentioned in the nomicon at least.

@emilio
Copy link
Contributor

emilio commented Apr 21, 2020

Yeah, this was documented after a bunch of discussion in #1757, but let me know if I'm missing something.

TLDR: yeah, non_exhaustive can still be useful for some use cases, but it does not prevent UB.

@emilio emilio closed this as completed Apr 21, 2020
@emilio
Copy link
Contributor

emilio commented Apr 21, 2020

Oh, sorry, btw, when I wrote it github was ~down so my second comment didn't get posted..

More improvements to the documentation / book here would most definitely be welcome :)

@strohel
Copy link
Contributor Author

strohel commented Apr 21, 2020

Okays I'll submit a little documentation improvement later.

@strohel
Copy link
Contributor Author

strohel commented Apr 22, 2020

Okays I'll submit a little documentation improvement later.

Oh, my bad, I was all the time looking just at documentation as published on docs.rs and didn't realize the docs were already made very clear in master by @pheki in f568d09 -- that's perfect, sorry for the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants