Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an algorithm to make this crate no_std, take 2 #33

Closed
wants to merge 5 commits into from

Conversation

notgull
Copy link
Member

@notgull notgull commented Nov 1, 2022

I'm not satisfied with the direction #30 has taken, so I've written this PR. It's less destructive, less verbose, and faster than the previous one. I haven't implemented portable_atomic or loom support here, I'll do that in future PRs.

In this PR, rather than using a Mutex to wrap an unsync linked list (what's currently on master) or using ConcurrentQueue (as in #30), I use an atomic singly linked list to store listeners. Similarly to #30, listeners in this list can be marked as "orphaned" to indicate that they should not be notified (e.g. when the EventListener is dropped). As opposed to #30, it only exhibits a 20% slowdown as compared to master on my PC. With the increased number of atomic operations involved, I think that that is about as low as we can go without using a locking mechanism of some kind.

Future work:

  • Add portable_atomic and loom support.
  • It may be possible to write this crate using only safe code. I intentionally separated out RacyArc because we may be able to split it off into another crate, the interior mutability in Listener can be resolved using crossbeam_utils::AtomicCell<Wakeup> or similar, and the linked list could probably be emulated using AtomicCell and Arc as well. The main concern is that we would have to not use the cache in ListenerQueue, and this (among other things) would likely cause significant performance issues (Arc is expensive).
  • Another way of doing this is using a spinlock, but making sure we never actually spin on it. We keep the master branch, but replace the Mutex with a spinlock. notify and listen both try to access the spinlock. If notify sees the spinlock active, it places the notification counts in an atomic variable and whoever holds the spinlock preforms the notification once they're about the drop the lock. If listen (or any of EventListener's methods) see the spinlock locked, they put their Wakers (or Unparkers) into a short-term ConcurrentQueue which is then used to set up listeners once the current holder tries to drop the lock. This may be a better way of doing it, but I don't know whether it would be faster or slower.

unsafe {
// Initialize the node.
node_ptr.write(Node::new(listener));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why isn't .assume_init() used here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bypass it by just casting node.get() into an *mut Node, which allows me to write the Node straight into the slot without worrying about MaybeUninit.

///
/// This is kept synchronized by the `state` variable; therefore, it's
/// technically `Sync`.
waker: Cell<Wakeup>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it make more sense to use UnsafeCell here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally used UnsafeCell, but given that most of the semantics involved are just "replace the waker with something else", I ended up using Cell instead.

I tried replacing it with UnsafeCell<MaybeUninit<Wakeup>> (and also getting rid of the Wakeup::None variant), but any gains are minimal at best. I figure it's better to just use Cell and avoid the unsafe code.

src/queue.rs Outdated
Comment on lines 156 to 203
/// Orphan a node in the queue.
///
/// The return value is Some if the node was notified, and true is the notification
/// was an additional notification.
pub(crate) unsafe fn orphan(&self, node: NonNull<Node>) -> Option<bool> {
let (needs_drop, notified) = unsafe { node.as_ref().listener.orphan() };

// If we need to deallocate the node, do so.
if needs_drop {
unsafe { self.dealloc(node) };
}

notified
}

/// Allocate a new node for a listener.
fn alloc(&self, listener: Listener) -> NonNull<Node> {
// Try to install a cached node.
if !self.cache.occupied.swap(true, Ordering::Acquire) {
// We can now initialize the node.
let node_ptr = self.cache.node.get() as *mut Node;

unsafe {
// Initialize the node.
node_ptr.write(Node::new(listener));

// Return the node.
NonNull::new_unchecked(node_ptr)
}
} else {
// We failed to install a cached node, so we need to allocate a new
// one.
unsafe { NonNull::new_unchecked(Box::into_raw(Box::new(Node::new(listener)))) }
}
}

/// Deallocate a node.
unsafe fn dealloc(&self, node: NonNull<Node>) {
// Is this node the current cache node?
if ptr::eq(self.cache.node.get() as *const Node, node.as_ptr()) {
// We can now clear the cache.
unsafe { (self.cache.node.get() as *mut Node).drop_in_place() };
self.cache.occupied.store(false, Ordering::Release);
} else {
// We need to deallocate the node on the heap.
unsafe { Box::from_raw(node.as_ptr()) };
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these methods (alloc, dealloc, orphan) appear to only act on the CachedNode and imo should be moved into an impl block associated with that. (and if these methods are still necessary, should be replaced with forwarding methods)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved alloc and dealloc, but orphan is, semantically, an important operation on the Queue so I've kept it there.


/// A wrapper around an `Arc` that is initialized atomically in a racy manner.
#[derive(Default)]
pub(crate) struct RacyArc<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big is the difference between this and OnceCell impls?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two main differences:

1). While the once cells are synchronized, this one is racy. While the once cells only run one closure at a time, this one runs all closures at once and sets the value to whichever one comes first.

2). It's only one system word in length, while once cells are usually at least three.

In addition, since our once cells are backed by this crate, we can't use once cells here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JFYI, the once_cell crate also provides a racy "OnceCell": https://docs.rs/once_cell/latest/once_cell/race/struct.OnceNonZeroUsize.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're trying to avoid the once_cell crate due to its recent MSRV bump, see smol-rs/async-io#93

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to ask about the performance of this, because in the past I implemented https://github.com/zseri/revenq,which I originally based upon a similar pattern and hit a huge perf bottleneck. Thus I ask: How much does this get "hammered" during its lifetime or the lifetime of the chain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After initialization (which is a one-time process), loads from the AtomicPtr are a simple atomic load. This is opposed to a OnceCell, which consists of a load as well. I doubt that there's much overhead aside from at initialization time. At worst, this leads to the initialization code being run once per thread.

I could replace it with the proper cell to see what happens, but I doubt it would make a difference. Besides, this is what this code used prior to this change.

src/queue.rs Outdated
Comment on lines 177 to 186
// We can now initialize the node.
let node_ptr = self.node.get() as *mut Node;

unsafe {
// Initialize the node.
node_ptr.write(Node::new(listener));

// Return the node.
NonNull::new_unchecked(node_ptr)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this part at all, especially since it appears to misuse MaybeUninit and/or UnsafeCell somewhat. how about:

unsafe {
    let node_ptr: &mut MaybeUninit<_> = &mut *self.node.get();
    // Initialize the node
    node_ptr.write(Node::new(listener));
    // Return the node
    node_ptr.assume_init_mut().into()
}

Copy link
Member Author

@notgull notgull Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced the node cast to an as_mut_ptr() usage. Unfortunately I can't use MaybeUninit::write() and MaybeUninit::assume_init_mut() since they were introduced in Rust 1.55, which would lead to a significant MSRV bump

@notgull
Copy link
Member Author

notgull commented Nov 6, 2022

Closing in favor of #34

@notgull notgull closed this Nov 6, 2022
@notgull notgull deleted the no_std_2 branch November 10, 2022 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants