Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a use-after-free race in SpinLatch::set, and release 1.9.3 #934

Merged
merged 3 commits into from May 13, 2022

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented May 11, 2022

SpinLatch<'r> borrows &'r Arc<Registry> from the WorkerThread where it is created. When we set, we're careful to make sure that the Registry remains alive while we do the inner set and then notify_worker_latch_is_set. We knew from past bugs that the SpinLatch could be invalidated between set and notify, but the &Arc could also be invalidated if the target thread sees the set and exits (dropping its WorkerThread) before the notification. That's a fairly long race, but preemption could make that happen.

The inner Registry will still be alive, since the current thread is part of that pool, so we can hold that reference directly.

Fixes #913
Fixes #929

@@ -1,3 +1,7 @@
# Release rayon-core 1.9.3 (2022-05-13)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've set the release date for this Friday, so affected folks have a little time to test it.

// Ensure the registry stays alive while we notify it.
// Otherwise, it would be possible that we set the spin
// latch and the other thread sees it and exits, causing
// the registry to be deallocated, all before we get a
// chance to invoke `registry.notify_worker_latch_is_set`.
cross_registry = Arc::clone(self.registry);
&cross_registry
&*cross_registry
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing but also terrible as a single character seems to decide if there is a race or not.

Also this isn't criticism on the patch at all but could be a question for the folks at rust-lang if things like that could ever be made more explicit. CC @joshtriplett

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, we're also playing with fire to have a &self that may not actually live for the whole call. There's a lot about this kind of problem in rust-lang/rust#55005. I have contemplated whether Latch::set should use *const Self instead, but at some point we'd still be calling atomic &self methods with the same core problem.

Copy link
Member Author

@cuviper cuviper May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the problem was really on the other branch, borrowing self.registry. The cross_registry clone is locally kept alive just fine, and I could have even left this line alone and let deref-coersion convert to &Registry, but I chose to make that more explicit.

Copy link

@scuwan scuwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the difference between Arc:: Clone (& Registry) and Registry.clone (), they look the same thing

@Byron
Copy link

Byron commented May 12, 2022

I believe they are the same except that the new way of writing makes clear that the Arc is cloned and not the thing inside of the Arc. There was a moment of confusion for me as well as I thought it's related to the fix itself, but it probably is merely a flyby refactor.

@cuviper
Copy link
Member Author

cuviper commented May 12, 2022

Yes, the clones mean the same thing. That's a way to be explicit that we're cloning the Arc, not the data inside, and also not accidentally cloning the &Arc reference (since &T is always Clone + Copy).

Only the first commit is necessary for the fix, and the clone change is just my preference.

@cuviper
Copy link
Member Author

cuviper commented May 13, 2022

Both #913 and #929 have confirmed the fix, let's go!

bors r+

@bors bors bot merged commit 19bf115 into rayon-rs:master May 13, 2022
@cuviper cuviper deleted the latch-registry branch February 25, 2023 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rayon thread pool coredump Crash at: "index out of bounds" in sleep mod
3 participants