Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Niches of Cell and others still not hidden #87341

Closed
QuineDot opened this issue Jul 21, 2021 · 18 comments · Fixed by #99011
Closed

Niches of Cell and others still not hidden #87341

QuineDot opened this issue Jul 21, 2021 · 18 comments · Fixed by #99011
Assignees
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@QuineDot
Copy link

Original conversation can be found in this IRLO thread, including a request from @RalfJung to file a bug report.

PR 68491 was accepted to hide niches of types that are in an UnsafeCell. However, experiments show that niches are still exposed within Cell, Mutex, and others.

Consider this code, adapted from a playground by @SkiFire13. It currently runs on the playground without panicking and optimizes down to a ret in the Godbolt explorer.

    assert_eq!( 8, mem::size_of::<       UnsafeCell<&()> >()); //  8
    assert_eq!(16, mem::size_of::<Option<UnsafeCell<&()>>>()); // 16 (✗ niche opt)
    assert_eq!( 8, mem::size_of::<             Cell<&()> >()); //  8
    assert_eq!( 8, mem::size_of::<Option<      Cell<&()>>>()); //  8 (✓ niche opt)
    assert_eq!(16, mem::size_of::<          RefCell<&()> >()); // 16
    assert_eq!(24, mem::size_of::<Option<   RefCell<&()>>>()); // 24 (✗ niche opt)
    assert_eq!(24, mem::size_of::<           RwLock<&()> >()); // 24
    assert_eq!(24, mem::size_of::<Option<    RwLock<&()>>>()); // 24 (✓ niche opt)
    assert_eq!(24, mem::size_of::<            Mutex<&()> >()); // 24
    assert_eq!(24, mem::size_of::<Option<     Mutex<&()>>>()); // 24 (✓ niche opt)

I would have expected Option<Cell<&()>> to not exhibit the niche-exploiting size optimization. The RwLock and Mutex cases may or may not be problematic, depending on their platform-specific implementations. But Cell in particular is a #[repr(transparent)] wrapper around UnsafeCell.

Here is another playground by @SkiFire13 from which they observed:

Even more interesting, looks like the current behaviour of a type that contains an UnsafeCell is to allow niche optimizations only if it doesn't contains something else before the UnsafeCell, and that something is not a ZST.

This indicates that the behavior is not Cell-specific. And although there has been some discussion about revealing the niches for Cell in particular, the discussion in the PR and in this hackmd seem to indicate that the niches being exploited is not intentional, even for Cell. The behavior can be observed back until Rust 1.43 when the PR landed.

Meta

Current playground (all versions) and Godbolt on all versions I tried from 1.43 forward.

@QuineDot QuineDot added the C-bug Category: This is a bug. label Jul 21, 2021
@SkiFire13
Copy link
Contributor

(Tiny correction to give proper credits: the first playground link you posted (the one with Mutex ecc ecc) is pretty much the one from @CAD97 from here. My playground (the one with custom types) is also adapted from that one.)

@RalfJung
Copy link
Member

Cc @rust-lang/lang @eddyb

@RalfJung RalfJung added I-nominated T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jul 21, 2021
@RustyYato
Copy link
Contributor

RustyYato commented Jul 21, 2021

Note: Mutex and RwLock are probably using the niche in Box on some platforms. (Raw os locking primitives need to be boxed on some platforms). Otherwise we'd run into this bug

#68206

@eddyb
Copy link
Member

eddyb commented Jul 24, 2021

I guess this logic introduced by #68491 isn't useful unless Cell also had #[repr(no_niche)]:

let niche = if def.repr.hide_niche() {
None
} else {
Niche::from_scalar(dl, Size::ZERO, scalar.clone())
};

The issue is that UnsafeCell's niche was removed, but it was only done without touching the abi field, so it still looks like there's an available niche, it's just that the "cached computation" is missing it. And Cell being a newtype means the niche is rediscovered from abi being Scalar with its validity range having a hole in it (at 0).

What needs to happen for #[repr(no_niche)] (tho tbh I would kind of prefer doing it for the UnsafeCell lang item, I don't remember if there was a reason to add a new #[repr] instead) is what we do for unions, i.e. resetting validity ranges to the full size for the bitwidth of the scalar:

// Normalize scalar_unit to the maximal valid range
let field_abi = match &field.abi {
Abi::Scalar(x) => Abi::Scalar(scalar_unit(x.value)),
Abi::ScalarPair(x, y) => {
Abi::ScalarPair(scalar_unit(x.value), scalar_unit(y.value))
}
Abi::Vector { element: x, count } => {
Abi::Vector { element: scalar_unit(x.value), count: *count }
}
Abi::Uninhabited | Abi::Aggregate { .. } => {
Abi::Aggregate { sized: true }
}
};

However, that might impact e.g. miri negatively, as it may make it seem like Cell<&T> could be null. Then again, IIRC miri's such validation has to traverse the fields, so it would see the &T field inside the UnsafeCell anyway.

@RalfJung
Copy link
Member

However, that might impact e.g. miri negatively, as it may make it seem like Cell<&T> could be null. Then again, IIRC miri's such validation has to traverse the fields, so it would see the &T field inside the UnsafeCell anyway.

Miri should be fine -- it both checks the "validity range" in abi::Scalar, and recursively traverses the fields to look at their types.

@joshtriplett
Copy link
Member

Can we get some clarification on what the question being asked of @rust-lang/lang is here? Is the desire here to fix the semantics (making all of the cell types not have niches)? Or is there a question about the correct semantics? I see in the internals thread several questions about what transmutes are valid (e.g. between Option<NonZeroU32> and u32), but this issue doesn't seem to be referencing those bits of the internals thread.

Could we get a specific ask to go with this nomination?

@pnkfelix
Copy link
Member

pnkfelix commented Aug 3, 2021

It sounds like I should have explicitly included testing of Cell when I wrote PR #68491. At least, based on all the dialogue I can find, we were all operating under the assumption that removing the niche from UnsafeCell would imply it would also be removed from Cell (unless some action were taken to remove it from Cell as well), while the tests given on this issue demonstrate that the assumption did not hold.

However, these experimental results do concern me a little bit. Do they imply that under the current implementation, any wrapper type around UnsafeCell is also going to re-expose the niche? (Or just transparent wrapper types around UnsafeCell?)

Anyway, I'm pretty sure we are justified to make changes to hide any niche under Cell et al. There may be questions of how we deploy that change; its not something we can easily issue a future-incompatibility warning for. And I'm not sure it belongs under an edition boundary. Maybe best to go review how the original deployment for UnsafeCell was handled...

@SkiFire13
Copy link
Contributor

Do they imply that under the current implementation, any wrapper type around UnsafeCell is also going to re-expose the niche? (Or just transparent wrapper types around UnsafeCell?)

Any wrapper that has a non-zero-sized field before the UnsafeCell. #[repr(transparent)] is not required. See this playground (also linked in the first post)

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2021

Can we get some clarification on what the question being asked of @rust-lang/lang is here?

Basically, the question is: should Cell hide its niches? And what about other user-defined UnsafeCell wrappers?

I think the intention was that yes the niche should be hidden for all UnsafeCell wrappers. But I wanted to confirm with t-lang to be sure.

@pnkfelix
Copy link
Member

pnkfelix commented Aug 4, 2021

Do they imply that under the current implementation, any wrapper type around UnsafeCell is also going to re-expose the niche? (Or just transparent wrapper types around UnsafeCell?)

Any wrapper that has a non-zero-sized field before the UnsafeCell. #[repr(transparent)] is not required. See this playground (also linked in the first post)

I was pretty confused by this response, because I took it to mean that the niche is only reexposed when there is another field before the UnsafeCell.

After reading the linked playground, I think that interpretation was backwards. I.e., I think the intended statement of the current behavior is that: Any struct holding an UnsafeCell that has a non-zero-sized field before the UnsafeCell will not re-expose any niche within the UnsafeCell.

(Which sounds pretty weird, in terms of why that ends up being the case where the rule is enforced...)

Anyway, I'm personally in agreement with @RalfJung that the intention was to hide the niche for all occurrences of UnsafeCell in another data type. All of the dialogue I can find supports that interpretation. It looks like I just wildly missed the mark in my implementation.

@SkiFire13
Copy link
Contributor

I was pretty confused by this response, because I took it to mean that the niche is only reexposed when there is another field before the UnsafeCell.

After reading the linked playground, I think that interpretation was backwards. I.e., I think the intended statement of the current behavior is that: Any struct holding an UnsafeCell that has a non-zero-sized field before the UnsafeCell will not re-expose any niche within the UnsafeCell.

You're right, I wrote it backwards. Your final interpretation is the right one.

@eddyb
Copy link
Member

eddyb commented Aug 5, 2021

Anyway, I'm personally in agreement with @RalfJung that the intention was to hide the niche for all occurrences of UnsafeCell in another data type.

My assumption (in #87341 (comment)) was that the initial implementation "just" has a bug, where it's hiding niches but not invalid values (which give rise to niches in the first place).

There may be questions of how we deploy that change; its not something we can easily issue a future-incompatibility warning for. And I'm not sure it belongs under an edition boundary. Maybe best to go review how the original deployment for UnsafeCell was handled...

Did we guarantee anything around Cell? Feels like a crater run might be enough.

@Mark-Simulacrum
Copy link
Member

The lang team discussed this last week and came to the conclusion that:

  • Fixing repr(no_niche) to transitively apply seems like a clear win, we should do this immediately
  • It may be possible that some types that contain UnsafeCell may still be able to allow niches in the UnsafeCell, such as Cell, with the appropriate codegen. But this is future design work and need not block fixing potential gaps in the current implementation.

It was unclear during the meeting whether there is actually a problem with the Mutex/RwLock types, which we believe at least on Linux are currently boxed, so the niche is not arising from the cell (but rather the box), which should be fine. But this doesn't change the fact that we should fix the no_niche repr to do the right thing.

@SkiFire13
Copy link
Contributor

I wonder, do we really need a #[repr(no_niche)]? We already have a #[repr(transparent)] type that disables niches: MaybeUninit. Can't we reuse it? Something like:

use core::mem::MaybeUninit;

#[repr(transparent)]
struct NoNiche<T>(MaybeUninit<T>);

impl<T> NoNiche<T> {
    pub fn new(value: T) -> Self {
        Self(MaybeUninit::new(value))
    }
    pub fn into_inner(self) -> T {
        // Safety: we never store an uninitialized MaybeUninit
        unsafe { self.0.assume_init() }
    }
    pub fn get(&self) -> &T {
        // Safety: we never store an uninitialized MaybeUninit
        unsafe { self.0.assume_init_ref() }
    }
    pub fn get_mut(&mut self) -> &mut T {
        // Safety: we never store an uninitialized MaybeUninit
        unsafe { self.0.assume_init_mut() }
    }
}

@eddyb
Copy link
Member

eddyb commented Aug 30, 2021

I mean, do we even need anything at all, other than the fact that UnsafeCell is known to the compiler?

Users might be fine with getting this effect, if they ever need it for some other reason, through either UnsafeCell or MaybeUninit.

@dtolnay
Copy link
Member

dtolnay commented May 22, 2022

I am labeling this issue I-unsound because inserting niches into types that we've agreed are not supposed to have niches is definitely unsound territory. This bug is causing surprising soundness issues in the ecosystem:

Additionally, there is no good way for libraries to work around this bug. Using UnsafeCell<MaybeUninit<T>> does not work in general because MaybeUninit does not support ?Sized as needed in crossbeam's AtomicCell.

Example of an observable/consequential data race in what should be correct code:

// [dependencies]
// crossbeam-utils = "=0.8.8"

use crossbeam_utils::atomic::AtomicCell;
use std::num::NonZeroU128;
use std::thread;

enum Enum {
    NeverConstructed,
    Cell(AtomicCell<NonZeroU128>),
}

static STATIC: Enum = Enum::Cell(AtomicCell::new(match NonZeroU128::new(1) {
    Some(nonzero) => nonzero,
    None => unreachable!(),
}));

fn main() {
    thread::spawn(|| {
        let cell = match &STATIC {
            Enum::NeverConstructed => unreachable!(),
            Enum::Cell(cell) => cell,
        };
        let x = NonZeroU128::new(0xFFFFFFFF_FFFFFFFF_00000000_00000000).unwrap();
        let y = NonZeroU128::new(0x00000000_00000000_FFFFFFFF_FFFFFFFF).unwrap();
        loop {
            cell.store(x);
            cell.store(y);
        }
    });

    loop {
        if let Enum::NeverConstructed = STATIC {
            unreachable!(":(");
        }
    }
}

@dtolnay dtolnay added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label May 22, 2022
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label May 22, 2022
@oli-obk oli-obk self-assigned this May 22, 2022
Amanieu added a commit to Amanieu/atomic-rs that referenced this issue May 24, 2022
UnsafeCell unfortunately lets the niches of the inner type "leak
through" to the outer type, which can cause unsoundness.

Fixes #29
@apiraino
Copy link
Contributor

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

@rustbot label -I-prioritize +P-high

@rustbot rustbot added P-high High priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels May 26, 2022
@eddyb
Copy link
Member

eddyb commented May 27, 2022

I described the correct fix in #87341 (comment) but didn't realize nobody picked it up last year, whoops.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Jul 12, 2022
`UnsafeCell` blocks niches inside its nested type from being available outside

fixes rust-lang#87341

This implements the plan by `@eddyb` in rust-lang#87341 (comment)

Somewhat related PR (not strictly necessary, but that cleanup made this PR simpler): rust-lang#94527
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Jul 13, 2022
`UnsafeCell` blocks niches inside its nested type from being available outside

fixes rust-lang#87341

This implements the plan by ``@eddyb`` in rust-lang#87341 (comment)

Somewhat related PR (not strictly necessary, but that cleanup made this PR simpler): rust-lang#94527
@bors bors closed this as completed in 1e7d04b Jul 13, 2022
flip1995 pushed a commit to flip1995/rust-clippy that referenced this issue Jul 14, 2022
`UnsafeCell` blocks niches inside its nested type from being available outside

fixes #87341

This implements the plan by `@eddyb` in rust-lang/rust#87341 (comment)

Somewhat related PR (not strictly necessary, but that cleanup made this PR simpler): #94527
workingjubilee pushed a commit to tcdi/postgrestd that referenced this issue Sep 15, 2022
`UnsafeCell` blocks niches inside its nested type from being available outside

fixes #87341

This implements the plan by `@eddyb` in rust-lang/rust#87341 (comment)

Somewhat related PR (not strictly necessary, but that cleanup made this PR simpler): #94527
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness P-high High priority T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.