Add utility type WakerSet to the sync module #390

ghost · 2019-10-25T16:37:44Z

This type is shared among Mutex and channel for now, but would also be useful for implementing RwLock and Barrier - it'd make the code cleaner and more efficient at the same time!

We can integrate Registry into Mutex and channel in a follow-up PR. It might even be useful for implementing Condvar.

Comparing benchmarks from PR #370...

master branch:

test mutex_contention        ... bench:   3,535,766 ns/iter (+/- 284,716)
test mutex_mimick_contention ... bench:       1,686 ns/iter (+/- 414)
test mutex_no_contention     ... bench:     347,171 ns/iter (+/- 58,254)
test mutex_unused            ... bench:          35 ns/iter (+/- 2)

PR #370:

test mutex_contention        ... bench:   1,943,831 ns/iter (+/- 175,483)
test mutex_mimick_contention ... bench:       1,242 ns/iter (+/- 152)
test mutex_no_contention     ... bench:     261,742 ns/iter (+/- 66,615)
test mutex_unused            ... bench:           0 ns/iter (+/- 0)

This PR:

test mutex_contention        ... bench:   1,430,417 ns/iter (+/- 121,264)
test mutex_mimick_contention ... bench:       1,359 ns/iter (+/- 142)
test mutex_no_contention     ... bench:     277,573 ns/iter (+/- 46,106)
test mutex_unused            ... bench:           3 ns/iter (+/- 0)

ghost · 2019-10-25T16:41:26Z

@nbdd0121 It would be interesting to see how to consolidate efforts of this PR and PR #370. They're not necessarily in conflict - in fact, they're probably complementary.

This PR factors out the Registry utility that is useful in almost all synchronization primitives, not just Mutex. PR #370 essentially improves on this Registry before it was factored out.

nbdd0121 · 2019-10-25T21:03:11Z

It might even be useful for implementing Condvar.

Condvar is likely just going to be implemented as a thin wrapper around Registry, with wait call also handling unlock and re-lock.

The Registry type is pretty much similar to my WakerListLock type, but with better API design. Most improvements in #370 still applies, such as using linked list, and various techniques used to shorten hot path and de-bloat the generated code. I can surely rebase my PR on top on this.

Some of my concerns:

About NOTIFY_ONE and NOTIFY_ALL, is there any reason for the change? I can't think of a simple case where it is useful without spurious wake ups. If they're concrete cases supporting it over a simple empty flag I'd be glad to hear.
The code have quite a lot SeqCst, which is somehow worrying as their performance isn't ideal on non-TSO, weak memory consistency model systems.
Spinlock type is removed. It might be useful to have a copy of it left in sync module to be used by the entire crate, e.g. Use AtomicCell<u64> on targets with target_has_atomic less than 64 #286.

I am quite surprised about the performance improvement by this PR. I think mostly it's due changing the type to AtomicBool. I can dig further on the underlying reason.

ghost · 2019-10-25T22:09:45Z

About NOTIFY_ONE and NOTIFY_ALL, is there any reason for the change? I can't think of a simple case where it is useful without spurious wake ups. If they're concrete cases supporting it over a simple empty flag I'd be glad to hear.

The reason is that empty flag doesn't tell us much. If the list of registered items is not empty, a notify_one() operation will lock the entry list, get the first entry, and wake it. But it's possible that the first entry was already woken up and contains a None. In that case, we've just wasted time locking the entry list.

In other words, a notify_one() operation has work to do only when the list is non-empty and all items are Some. Similarly, a notify_all() operation has work to do only if the list is non-empty and there is at least one Some entry.

The code have quite a lot SeqCst, which is somehow worrying as their performance isn't ideal on non-TSO, weak memory consistency model systems.

Unfortunately, I don't know how to do this more elegantly without SeqCst. Any ideas?

Spinlock type is removed. It might be useful to have a copy of it left in sync module to be used by the entire crate, e.g. Use AtomicCell<u64> on targets with target_has_atomic less than 64 #286.

It's not used anywhere anymore, but I have no reservations about keeping it if it is useful.

nbdd0121 · 2019-10-25T22:32:48Z

The reason is that empty flag doesn't tell us much. If the list of registered items is not empty, a notify_one() operation will lock the entry list, get the first entry, and wake it. But it's possible that the first entry was already woken up and contains a None. In that case, we've just wasted time locking the entry list.

In other words, a notify_one() operation has work to do only when the list is non-empty and all items are Some. Similarly, a notify_all() operation has work to do only if the list is non-empty and there is at least one Some entry.

But considering that the item will only be turned into None when they're waken up, and they will either re-register, cancel, or complete in such case, surely most times when we call notify, emptiness usually is a good-enough indicator of whether we need to do any work?

The main reason that I think empty flag is better is that when Registry is switched to use linked list rather than slab, emptiness checking is essentially free and requires no special book-keeping to manage.

nbdd0121 · 2019-10-26T01:56:01Z

Unfortunately, I don't know how to do this more elegantly without SeqCst. Any ideas?

You're right. I am being naive originally thinking some sort of Acq/Rel would be sufficient. I ended up spending a few hours reading through relevant standards and research papers, and realized that this cannot be done without protecting the read in some sort of lock. The fundamental reason is that there are no easy way to turn thread-local sequenced-before relation to inter-thread happens before other than using sequential consistency.

I think the current Mutex's approach of having 1 bit indicating if anything has been blocked is much easier to reason about, as everything is protected by locks.

nbdd0121 · 2019-10-27T17:45:50Z

I don't think it's a Map. It's more like a set (or a list, if backed by a linked list rather than a slab).

I am not convinced by replacing cancel with remove and notify_one. If the waker is never woken up and cancelled, we don't need to wake up anyone.

In general to prefer the old names to the current one. Or maybe we can call the methods park/unpark.

ghost · 2019-10-27T17:46:59Z

But considering that the item will only be turned into None when they're waken up, and they will either re-register, cancel, or complete in such case, surely most times when we call notify, emptiness usually is a good-enough indicator of whether we need to do any work?

That's true, but I did find some performance benefits under heavy contention where we avoid locking the entry list when there's nothing really to notify.

Although, I think checking if the first entry is turned into None rather than all of them should work just as well as an optimization...

ghost · 2019-10-27T17:57:40Z

I am not convinced by replacing cancel with remove and notify_one. If the waker is never woken up and cancelled, we don't need to wake up anyone.

You're right, that was an oversight.

Okay, so we need the following methods:

insert(waker)
update(waker, key)
complete(key)
cancel(key)

Do these method names sound okay?

I renamed the type to WakerMap after suggestion by @yoshuawuyts because it's essentially a mapping from usize to Option<Waker>. The name Registry is a bit meh because it's so generic, wheres I feel it should contain "waker" somewhere in its name tobe a bit more specific... :)

nbdd0121 · 2019-10-27T18:07:25Z

Do these method names sound okay?

I renamed the type to WakerMap after suggestion by @yoshuawuyts because it's essentially a mapping from usize to Option. The name Registry is a bit meth because it's so generic, wheres I feel it should contain "waker" somewhere in its name tobe a bit more specific... :)

Vec is also kinda mapping from usize to Option<Waker> as well... The usize is just a handle to locate the waker. So I don't think WakerMap is a good name. Would WakerRegistry be a sensible name?

For the method name I tend to stick with register and reregister, which reflects more information than insert and update, and makes more sense when paired with methods like notify, complete or cancel.

yoshuawuyts · 2019-10-27T20:38:06Z

@nbdd0121 API-wise the difference between a "map" and a "list" is that with a list you provide the key, but with a map the key is provided for you:

// Map -- key is generated for you.
fn insert(&mut self, val) -> key;

// List -- you provide the key.
// Operations like `push` are just shorthands on top of this.
fn insert(&mut self, index, val);

In this case the key is generated, which means it's more similar to a HashMap than to a Vec.

nbdd0121 · 2019-10-27T20:54:53Z

@nbdd0121 API-wise the difference between a "map" and a "list" is that with a list you provide the key, but with a map the key is provided for you:
// Map -- key is generated for you.
fn insert(&mut self, val) -> key;

// List -- you provide the key.
// Operations like `push` are just shorthands on top of this.
fn insert(&mut self, index, val);
In this case the key is generated, which means it's more similar to a HashMap than to a Vec.

Not really. A map can never generate a key, because the key is always supplied by the user.

// Map: User must supply the key
fn insert(&mut self, k: K, v: V)

You cannot insert anything to a map without providing a key.

On the other hand, set (or vector and list) can return an iterator upon insertion:

// Set: An iterator is provided to the user.
std::pair<iterator,bool> insert( value_type&& value );

The fact that we use a usize and call it a key does not just turn the whole concept from a Set into a Map. In fact, we're just working around Rust's lifetime constraints which made Rust incapable of representing C++'s iterator safely.

ghost · 2019-10-27T21:16:02Z

What if we just call it WakerSet?

ghost · 2019-10-30T20:42:50Z

Can we do another round of review and merge if this looks good?

@nbdd0121 I still want us to have your optimizations on top of this PR :) I'm also particularly interested in having a doubly linked list because it allows us to have stronger fairness guarantees than what slab provides (i.e. the first waker to be inserted should be woken first).

In fact, it'd be great if we just removed the slab crate as a dependency and replaced it with doubly-linked lists everywhere :) I guess what I'm looking for is our own implementation of a subset of Slab based on a doubly-linked list.

What do you think? Sorry for overriding your already submitted PR, especially given how much effort you've put into it :(

nbdd0121 · 2019-10-30T21:15:08Z

src/sync/waker_set.rs

+            flag |= NOTIFY_ALL;
+        }
+
+        // Use `SeqCst` ordering to synchronize with `WakerSet::lock_to_notify()`.


You probably mean WakerSet::notify_{one, all} here

nbdd0121 · 2019-10-30T22:03:56Z

MutexGuard::drop uses AcqRel now, but it probably needs to be SeqCst. Acquire can only establish an inter-thread happens-before order if we use its value to make decisions. The current code works because swap(false, SeqCst) and swap(false, AcqRel) compiles to the same sequence on x64.

This leads to a broader issue. Currently it's very difficult to use WakerSet correctly, due to NOTIFY_ONE and NOTIFY_ALL fields. Maybe we still want the old approach, to store the BLOCKED bits inside the mutex, and let WakerSet logically be strictly identical to a Mutex<Inner>.

yoshuawuyts

API-wise this looks good to me. But my knowledge of Atomics is limited, so I can't comment on any of that. It seems @nbdd0121's feedback should probably be addressed before this can be merged.

Still approving this since I don't have any feedback left at this point.

yoshuawuyts · 2019-10-31T01:45:48Z

src/sync/waker_set.rs

+
+/// Set when the entry list is locked.
+#[allow(clippy::identity_op)]
+const LOCKED: usize = 1 << 0;


Alternatively the clippy lint could be removed by writing

const LOCKED: usize = 0b001; const NOTIFY_ONE: usize = 0b010; const NOTIFY_ALL: usize = 0b100;

ghost · 2019-10-31T13:55:51Z

I switched to SeqCst orderings now, that suggestion was on spot. Thank you for pointing the issue out! I also switched to SeqCst in the try_lock() method to be on the safe side. We similarly use SeqCst in try_recv() and try_send() inside channels.

@nbdd0121 I agree the previous approach with BLOCKED is easier to follow on its own, but don't think the one in this PR is significantly worse. In fact, it looks exactly the same as in channels and this approach is unifying the same pattern across many synchronization primitives.

Here are latest benchmarks. PR #370 :

test contention    ... bench:   1,927,725 ns/iter (+/- 248,164)
test create        ... bench:           0 ns/iter (+/- 0)
test no_contention ... bench:     243,104 ns/iter (+/- 50,678)

This PR:

test contention    ... bench:   1,286,675 ns/iter (+/- 153,616)
test create        ... bench:           3 ns/iter (+/- 0)
test no_contention ... bench:     251,007 ns/iter (+/- 173,492)

The difference in the highly contended cases is noticeable, at the expense of a smaller speed bump in uncontended cases, even with so many SeqCst orderings.

Add utility type Registry to the sync module

c11a81b

Stjepan Glavina added 2 commits October 25, 2019 18:42

Remove unused import

86ba88b

Split unregister into complete and cancel

7462e0b

Refactoring and renaming

232219e

ghost requested a review from yoshuawuyts October 27, 2019 18:02

Split remove() into complete() and cancel()

bbe8184

yoshuawuyts added the enhancement New feature or request label Oct 28, 2019

Stjepan Glavina added 3 commits October 30, 2019 10:52

Rename to WakerSet

5b5c103

Ignore clippy warning

4e55884

Ignore another clippy warning

a83070c

nbdd0121 reviewed Oct 30, 2019

View reviewed changes

Use stronger SeqCst ordering

1709eac

yoshuawuyts approved these changes Oct 31, 2019

View reviewed changes

yoshuawuyts changed the title ~~Add utility type Registry to the sync module~~ Add utility type WakerSet to the sync module Oct 31, 2019

yoshuawuyts reviewed Oct 31, 2019

View reviewed changes

ghost merged commit 87de4e1 into async-rs:master Nov 1, 2019

ghost deleted the refactor-sync branch November 1, 2019 01:45

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add utility type WakerSet to the sync module #390

Add utility type WakerSet to the sync module #390

ghost commented Oct 25, 2019

ghost commented Oct 25, 2019

nbdd0121 commented Oct 25, 2019

ghost commented Oct 25, 2019

nbdd0121 commented Oct 25, 2019

nbdd0121 commented Oct 26, 2019

nbdd0121 commented Oct 27, 2019

ghost commented Oct 27, 2019

ghost commented Oct 27, 2019 •

edited by ghost

nbdd0121 commented Oct 27, 2019

yoshuawuyts commented Oct 27, 2019

nbdd0121 commented Oct 27, 2019

ghost commented Oct 27, 2019

ghost commented Oct 30, 2019 •

edited by ghost

nbdd0121 Oct 30, 2019

nbdd0121 commented Oct 30, 2019

yoshuawuyts left a comment •

edited

yoshuawuyts Oct 31, 2019

ghost commented Oct 31, 2019

Add utility type WakerSet to the sync module #390

Add utility type WakerSet to the sync module #390

Conversation

ghost commented Oct 25, 2019

ghost commented Oct 25, 2019

nbdd0121 commented Oct 25, 2019

ghost commented Oct 25, 2019

nbdd0121 commented Oct 25, 2019

nbdd0121 commented Oct 26, 2019

nbdd0121 commented Oct 27, 2019

ghost commented Oct 27, 2019

ghost commented Oct 27, 2019 • edited by ghost

nbdd0121 commented Oct 27, 2019

yoshuawuyts commented Oct 27, 2019

nbdd0121 commented Oct 27, 2019

ghost commented Oct 27, 2019

ghost commented Oct 30, 2019 • edited by ghost

nbdd0121 Oct 30, 2019

Choose a reason for hiding this comment

nbdd0121 commented Oct 30, 2019

yoshuawuyts left a comment • edited

Choose a reason for hiding this comment

yoshuawuyts Oct 31, 2019

Choose a reason for hiding this comment

ghost commented Oct 31, 2019

ghost commented Oct 27, 2019 •

edited by ghost

ghost commented Oct 30, 2019 •

edited by ghost

yoshuawuyts left a comment •

edited