How to reliably detect reentrancy #65

pitdicker · 2019-10-21T09:59:25Z

Your RFC states "It is an error to reentrantly initialize the cell from f. Doing so results in a panic." Currently sync:OnceCell does not give that guarantee. I tried to see how a solution would look like, but haven't found something easy yet.

Option 1 was to use a ReentrantMutex, and an atomic. Suppose the atomic says an initializer is running. If we are able to lock the mutex, this must be a case of reentrant initialization. Otherwise the mutex will block, which is what we want in case another thread is busy initializing. See my not working experiment in https://github.com/pitdicker/once_cell/commit/bd4a940c94b41ee2b6f595854ea2565e0a76a73e. parking_lot does not expose a const {Raw}ReentrantMutex::new.

Option 2 is using some sort of thread id, which is stored in the OnceCell when the initializer starts to run. When another initializer sees the OnceCell is in the process of being initialized, it can compare the stored thread id to its own, and choose between blocking or panicking. Interestingly parking_lot::ReentrantMutex also needs some sort of thread id to function. I don't yet have working code though.

The disadvantage of both options is it would add quite some complexity, and 4 or 8 bytes to the size of the OnceCell. Is that worth the trouble?

Also getting a thread id is not ideal: std::process::id is not supported on all platforms (like CloudABI). And std::thread::Thread::id is an opaque type, so it can't (officially) be stored in an atomic, but needs a Cell and manual synchronisation.

The text was updated successfully, but these errors were encountered:

matklad · 2019-10-21T10:02:22Z

Hm, yeah, I think that's a poor wording in the RFC text. For cell case, we guarantee a panic. For sync case, we say "this is a bug, potential outcomes are a panic or a deadlock".

matklad · 2019-10-21T10:04:09Z

BTW, I've added you to my fork of RFCs repo: you seem to be genuinely interested in the topic, so feel free to edit RFC directly/send PRs :)

pitdicker · 2019-10-21T10:08:34Z

Thank you! I'll think about it a bit first, I am unreliable... 😄

matklad · 2019-10-21T10:09:32Z

As to how to actually detect a deadlock, I think there were some vague plans to add deadlock detection to parking-lot proper via a flag, and that seems like the right approach here, except for the fact that feature flags obviously don't work with std.

pitdicker · 2019-10-28T19:14:39Z

Just to write it down for now: this a way to detect reentrancy without increasing the size of OnceCell using the std method.

Currently it uses a linked list of waiting threads. When ready the managing (initializing) thread wakes up all the other threads, which may then free their Waiter node. The only thread that can safely walk the linked list is the managing thread, because it is the only one who knows when they are all still waiting.

Suppose this is changed to a scheme where not the managing thread wakes up everyone, but every thread wakes up the next one. Now it is safe for every thread to always walk the list up to the first node.

If the first node is created by the thread that started to run the initialization function, it can store some information in that node, like the thread id. A thread that wants to wait on initialization adds itself to the linked list, walks the queue to the first node, and only parks itself if its thread id doesn't match that of the first node.

But I am not sure what effect it has to make every thread responsible for waking up the next one. I guess it will perform different from one thread waking up all others, but wouldn't know if it's worse.

pitdicker · 2019-11-02T14:56:23Z

The break_it test that I added in #71 turns out to be a good stress test to time the performance of parking/unparking under contention.
Rough results:

	x86_64	aarch64
std, one thread wakes the others	475ms	215ms
std, every thread wakes the next	400ms	200ms
parking lot mutex	1.10s	125ms

The test used 32 threads. My two test systems are AMD Ryzen 1950X (16-core) and Hisilicon Kirin 970 (4-core I think?).
The results on aarch64 are from my branch in #71 that uses consume ordering.

One funny result is that my phone runs this test faster than a good computer. But I think the explanation is that with just 4 threads, there is less contention and less threads waiting, that have to be managed. The lower the number of threads I use in the test, the faster it runs.

I'm not sure why using parking_lot is so much slower in this test on x86_64, it consistently takes 2-3x longer.

The more interesting result is that changing to a method where every thread wakes up the next one seems to be faster. So we can have our cake and eat it too (reentrancy detection and faster unparking under contention).

nbdd0121 · 2020-05-16T22:13:28Z

I believe just keeping a std::thread::Thread.id would be sufficient for reentrance detection.

nbdd0121 · 2020-05-16T22:48:07Z

I'll have a look next week to see if I can add reliable reentrance detection. Hopefully it'll provide a piece of code that can be shared among both OnceCell and Once.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reliably detect reentrancy #65

How to reliably detect reentrancy #65

pitdicker commented Oct 21, 2019

matklad commented Oct 21, 2019

matklad commented Oct 21, 2019

pitdicker commented Oct 21, 2019

matklad commented Oct 21, 2019

pitdicker commented Oct 28, 2019 •

edited

pitdicker commented Nov 2, 2019

nbdd0121 commented May 16, 2020

nbdd0121 commented May 16, 2020

How to reliably detect reentrancy #65

How to reliably detect reentrancy #65

Comments

pitdicker commented Oct 21, 2019

matklad commented Oct 21, 2019

matklad commented Oct 21, 2019

pitdicker commented Oct 21, 2019

matklad commented Oct 21, 2019

pitdicker commented Oct 28, 2019 • edited

pitdicker commented Nov 2, 2019

nbdd0121 commented May 16, 2020

nbdd0121 commented May 16, 2020

pitdicker commented Oct 28, 2019 •

edited