Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Two-layer Option is folded and may lose information. #81

Open
ilslv opened this issue Nov 16, 2022 · 10 comments
Open

bug: Two-layer Option is folded and may lose information. #81

ilslv opened this issue Nov 16, 2022 · 10 comments

Comments

@ilslv
Copy link

ilslv commented Nov 16, 2022

Because of recursive impl of RefCnt on Option, it's valid to construct ArcSwapAny::<Option<Option<Arc<i32>>>> which doesn't see a difference between None and Some(None).

playground

use std::sync::Arc;

use arc_swap::ArcSwapAny;

fn main() {
    let x = ArcSwapAny::<Option<Option<Arc<i32>>>>::new(None);
    assert_eq!(*x.load(), None);
    
    x.store(Some(Some(Arc::new(1))));
    assert_eq!(x.load().as_ref().unwrap().as_deref(), Some(&1));
    
    x.store(Some(None));
    // thread 'main' panicked at 'assertion failed: `(left == right)`
    //   left: `None`,
    //  right: `Some(None)`', src/main.rs:13:5
    assert_eq!(*x.load(), Some(None));
}

Least disruptive solution would be just to implement RefCnt on Arc<T>, Option<Arc<T>> and mention in the docs, that to implement this trait on your own Arc, you'll have to use custom Option.

@vorner
Copy link
Owner

vorner commented Nov 16, 2022

This does sound like a bug, yes.

But can you explain how this is unsound? As in, is there an undefined behavior involved anywhere?

@ilslv
Copy link
Author

ilslv commented Nov 16, 2022

@vorner

I use sound here in means defined by David Tolnay:

Soundness is a statement about whether all possible uses of a library or language feature uphold the intended invariants. In other words it describes functionality that cannot be misused, neither by mistake nor maliciously.

I'm don't see how just using this library can lead to safety issues. But theoretically manual impl of unsafe methods in RefCnt may operate under some assumptions that aren't met under this bug.

@vorner
Copy link
Owner

vorner commented Nov 16, 2022

Yeh, that statement from him implicitly contains „Cannot be misused in safe rust“ ‒ or at least, that's how I read the whole post and how most of the Rust folks define soundness.

Unsound is something you can make to do undefined behavior without using any unsafe. You suggest you'd need to implement unsafe methods to cause it, which would mean this isn't unsound. Having some wrongly documented assumptions would, of course, be a bug/problem.

So, in that light (unless you can show me a code not using unsafe that not only misbehaves, but causes UB), I'll rename the bug accordingly.

I'll of course look into what to do about it. Your suggestion for a fix looks a bit semver-breaking, but if we are lucky, I think I had some mechanism in place which could work.

@vorner vorner changed the title Unsound behaviour when using raw ArcSwapAny bug: Two-layer Option is folded and may lose information. Nov 16, 2022
@ilslv
Copy link
Author

ilslv commented Nov 16, 2022

@vorner

I've found pretty sketchy, but working solution: playground. But this would definitely increase MSRV.


Unsound is something you can make to do undefined behavior without using any unsafe.

To be fair I kinda disagree with that description. David Tolnay says nothing about UB, only talking about invariants, this includes logical invariants. And this crate exposes API that can be misused in a way, that clearly breaks those logical invariants. I think it's very important to draw distinction between implementation details issues (bugs) and leaky abstractions (unsoundness).

Quoted from kornel:

Unsound means there's a loophole or an edge case that breaks safety or correctness guarantees.

@vorner
Copy link
Owner

vorner commented Nov 16, 2022

That solution is interesting, I'll keep that at hand.

I'm still thinking if I can actually make the nested Option work, instead of forbidding it. Not sure what it would be good for and not sure if it's possible (well, I suspect it is not, which irks me), but will give it a try first.

@ilslv
Copy link
Author

ilslv commented Nov 17, 2022

@vorner

I'm still thinking if I can actually make the nested Option work, instead of forbidding it.

I don't think this is possible. Single Option works, because pointer inside Arc is guaranteed to be different from nullptr. So we can use than one value of the possible pointer values to correspond to a None.
When we start nesting those Options, in addition to None we can encode Some(None), Some(Some(None)), Some(Some(Some(None))) etc. There aren't enough unused pointer values to encode those.

Not sure what it would be good for

I've tried to interpret Option<Option<Arc<T>>> like this, when encountered this issue:

  1. None means that computation of Value never succeeded yet
  2. Some(None) means that a single thread/async task tries to compute Value
  3. Some(Some(T)) represents last successful computation of T

We just swap None with Some(None) or Some(Some(T)) with Some(None), try to compute first/new T and replace with it in case computation succeeded or with originally swapped value in case of an error.
Right solution here would be to move one of outer Options inside Arc: Option<Arc<Option<T>>>

@vorner
Copy link
Owner

vorner commented Nov 17, 2022

There aren't enough unused pointer values to encode those.

Actually, I'm starting to think there are 😈. The pointers that come from Arc and Rc are necessarily aligned (because the pointee holds not only the actual value, but also the ref counts and these are usize). That leaves a lot of these unaligned values free. I think I'm even abusing that fact somewhere already. When the (innermost) None starts, it could start at eg. 13, then add 2 at every layer. This can be easily unpacked.

Another option would be, to somehow make a reserved value in the code (eg. a static variable) for each layer, so that each None and Some(None) would have it's own address.

I need to first think it through and try it, to see if there's impact on performance and to think if this could break someone's code or not. And if I would break something inside the implementation.

@ilslv
Copy link
Author

ilslv commented Nov 17, 2022

Actually, I'm starting to think there are 😈. The pointers that come from Arc and Rc are necessarily aligned (because the pointee holds not only the actual value, but also the ref counts and these are usize). That leaves a lot of these unaligned values free. I think I'm even abusing that fact somewhere already. When the (innermost) None starts, it could start at eg. 13, then add 2 at every layer. This can be easily unpacked.

Sound scary, but doable 😈

Another option would be, to somehow make a reserved value in the code (eg. a static variable) for each layer, so that each None and Some(None) would have it's own address.

I guess this would require Default bound or storing MaybeUninit<T> inside Arc, but not impossible.

@vorner
Copy link
Owner

vorner commented Nov 18, 2022

I'm not sure how soon I'll get around to the experiments. So I'll write down few notes that could help if someone is to try this out:

The unaligned addresses trick

  • There are few of these already taken internally, for some flags. It would have to start with higher ones.
  • There probably are asserts somewhere that check addresses returned from the RefCnt are aligned. These would have to be moved to a different place (probably into the Rc/Arc implementation), because now RefCnt could return these.
  • The scary part is interacting with some unknown downstream implementation that also does some similar trick. Needs a bit of thinking if that could cause trouble and how.
  • There's an assumption that two different types never return the same address. This is used by the depth paying by another thread/helping out. This assumption would get broken here, because None of different types would have the same address. But maybe this would be OK, if the thing tried to pay a debt of the wrong type but wouldn't actually touch the ref counts ‒ that needs checking.

The registration of address

Actually, I don't think I need any Default or MaybeUninit. I don't need a value of the type. I need a unique address and that address could be of something completely else. AFAIK pointers can point to the wrong type (needs checking) as long as they are never used as pointers (dereferenced/for pointer arithmetic).

What I worry about, though, is ‒ even if I could have some kind of static Marker<T>(u8) = Marker(0); // Make sure it isn't ZST inside the generic implementation (not sure about that), I'm not entirely sure different translation units necessarily have the same instance. That is, this would give me different address for different types, but does it mean that the same type-marker is always at the same address? Again, needs a bit of investigation.

@vorner
Copy link
Owner

vorner commented Dec 26, 2022

I'm digging through the code. So far, I've come to a conclusion it is possible using the unaligned address trick and a plan how it could be done. So far I haven't come up with an exact implementation that wouldn't be somewhat ugly and entangled with the rest.

The observations:

  • The debt slots (both the fast ones and the fallback helping slot) have a lot of assumptions about behavior of the values put into them.
  • All the „special values“ (None, Some(None)) are things without an associated reference count.
  • So, they don't need to and should not ever reach the debt slots. That can be ensured in the load operation ‒ it is possible to abort the fast path as soon as we see a special value, before even trying to allocate the slot. We can't stop processing there yet (there wouldn't be a proper SeqCst operation involved and the library promises one), so we need to do one such as part of the slow processing afterwards somewhere (this is a bit vague as the plan goes, where exactly).
  • As the special value never reaches the slots, all the pay-related routines can work as previously, because they never encounter it.

Now, there's also the compare-and-swap operation, which I haven't started taking apart yet. But considering that one is only lock-free, not wait-free (eg. it can loop and wait for the time when stuff doesn't change under its hands), it should be possible and likely even easy.

Not sure when I manage to find more time to actually implement it (again, folks are welcome to give it a try, based on these notes or using their own approach).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants