New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a use-after-free race in SpinLatch::set, and release 1.9.3 #934
Conversation
@@ -1,3 +1,7 @@ | |||
# Release rayon-core 1.9.3 (2022-05-13) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've set the release date for this Friday, so affected folks have a little time to test it.
// Ensure the registry stays alive while we notify it. | ||
// Otherwise, it would be possible that we set the spin | ||
// latch and the other thread sees it and exits, causing | ||
// the registry to be deallocated, all before we get a | ||
// chance to invoke `registry.notify_worker_latch_is_set`. | ||
cross_registry = Arc::clone(self.registry); | ||
&cross_registry | ||
&*cross_registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing but also terrible as a single character seems to decide if there is a race or not.
Also this isn't criticism on the patch at all but could be a question for the folks at rust-lang if things like that could ever be made more explicit. CC @joshtriplett
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, we're also playing with fire to have a &self
that may not actually live for the whole call. There's a lot about this kind of problem in rust-lang/rust#55005. I have contemplated whether Latch::set
should use *const Self
instead, but at some point we'd still be calling atomic &self
methods with the same core problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the problem was really on the other branch, borrowing self.registry
. The cross_registry
clone is locally kept alive just fine, and I could have even left this line alone and let deref-coersion convert to &Registry
, but I chose to make that more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about the difference between Arc:: Clone (& Registry) and Registry.clone (), they look the same thing
I believe they are the same except that the new way of writing makes clear that the |
Yes, the clones mean the same thing. That's a way to be explicit that we're cloning the Only the first commit is necessary for the fix, and the clone change is just my preference. |
SpinLatch<'r>
borrows&'r Arc<Registry>
from theWorkerThread
where it is created. When weset
, we're careful to make sure that theRegistry
remains alive while we do the innerset
and thennotify_worker_latch_is_set
. We knew from past bugs that theSpinLatch
could be invalidated between set and notify, but the&Arc
could also be invalidated if the target thread sees the set and exits (dropping itsWorkerThread
) before the notification. That's a fairly long race, but preemption could make that happen.The inner
Registry
will still be alive, since the current thread is part of that pool, so we can hold that reference directly.Fixes #913
Fixes #929