New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Shared ThreadsafeFunction in EventQueue RFC #42

Closed

indutny wants to merge 2 commits into neon-bindings:main from indutny:feature/shared-event-queue

indutny commented May 19, 2021

See: neon-bindings/neon#739


          Shared ThreadsafeFunction in EventQueue RFC

e51dc48

kjvalencik reviewed

View reviewed changes

Member

kjvalencik left a comment

@indutny Thanks for providing this. I think this is a really interesting idea.

One complexity concern I have is how dropping a referenced EventQueue would work. With a single shared queue, the queue needs to track how many times it has been referenced. When the count drops to 0, the shared queue needs to unreference itself.

However, unreferencing can only be performed on the event loop. The only solution I can think of is sending an event to unreference itself. This could end up being costly if referenced queues are frequently created and dropped.

text/0030-shared-event-queue.md Outdated

+              The example above wouldn't print "every second" once per second. The prints
+              will be delayed by the closures scheduled on `another_queue`.
+              While the example above reproduces the issue - it is quite pathological and

Member

kjvalencik May 19, 2021

I think this use case is valid. The example is a bit extreme for the sake of simplicity to demonstrate the point, but I do think it could be an issue in real apps.

For example, consider a Node app that's part of a data processing pipeline. It would be expected to saturate resources and that might include a stream of events on the event loop. It may be beneficial to also be able to cancel the pipeline at any time.

If command events are served on a separate channel from processing events, it allows the app to push large volumes of data without delaying commands. Alternatively, the app would need to manage back-pressure.

indutny-signal May 19, 2021 •

edited

The data processing pipeline would likely be bound by IO and so it wouldn't be able to trigger the 1000 call loop in the n-api optimization, but the starvation can indeed be triggered by posting way too many events that take long time to process from one producer and some events from another producer that should interrupt that stream.

Was the preemption ever documented for neon APIs? I'm looking through the comments in the EventQueue.rs and the only thing that they say is that the function will be executed in JS thread without saying anything about preemption between different queues.

That being said the example is indeed valid.

text/0030-shared-event-queue.md Outdated

+              store instances of `EventQueue` in it themselves, shifting the burden of
+              performance optimization from Neon's hands to users.
+              Another alternative would be to make `EventQueue::new()` use new

Member

kjvalencik May 19, 2021

I think this is the better solution.

cx.queue(): Handle to global queue
EventQueue::new: Creates a new queue
(Maybe) EventQueue::handle<'a, C: Context<'a>>(&self, cx: &mut C). Create an independent handle to a queue. cx.queue() could use this internally.

I think this also extends better to bounded queues. We might have EventQueue::bounded(1000) and I think the asymmetry to EventQueue::new and EventQueue::bounded could be confusing. It's more clear if they both create new queues.

indutny-signal May 19, 2021

Do you want me to change RFC to add this? It already matches the PR that I've opened for the change.

Author

indutny May 19, 2021

I'll probably just do it. 😁

text/0030-shared-event-queue.md

+              improvements for Neon users.
+              These improvements are perhaps most noticeable when Neon addon is used from the
+              Electron, because scheduling new UV ticks in Electron are very costly.

Member

kjvalencik May 19, 2021

I don't think I've come across ticks being more expensive in Electron than normal Node. Do you happen to have a reference?

indutny-signal May 19, 2021

Here's a comment with an example app that uses C++ node-addon-api to trigger it: electron/electron#28957 (comment)

The high-level overview is that electron polls the UV loop in another thread and interrupts the browser's event loop to schedule UV callbacks. Sadly, these interruptions don't happen immediately because of the interactivity that the browser process has to maintain. The delay is a fraction of millisecond, but you can see how it is easy for it to add up to 2-3 seconds depending on how many closures you want to invoke with EventQueue.

text/0030-shared-event-queue.md Outdated

+              ```
+              Calling `queue.send()` several times in a row using different `cx.queue()` would
+              be significantly faster:

Member

kjvalencik May 19, 2021

Do you happen to have any benchmarks showing the performance benefit? I think that would help to add some weight to this RFC. I don't have any intuition to the difference. 1us, 10us, 1ms?

Author

indutny May 19, 2021

Added a comment. Running app/preload.js in this repo with unoptimized threadsafe function takes 128ms vs 10ms with optimized.

text/0030-shared-event-queue.md Outdated

+              With the recent [node.js changes][0] `ThreadsafeFunction` will loop over pending
+              calls, invoking up to 1000 of them in the same tick instead of running each new
+              call on a separate UV loop tick. This is an implementation detail that is
+              intentionally not covered by `napi_create_threadsafe_function` documentation to

Member

kjvalencik May 19, 2021

Nit, I think the word "intentionally" should be removed. There's no discussion on the implementation PR that not documenting the behavior is intentional or not.

text/0030-shared-event-queue.md Outdated

+              intentionally not covered by `napi_create_threadsafe_function` documentation to
+              allow for potential performance optimizations in the future.
+              Neon doesn't currently leverage it without the proposed changes because a new

Member

kjvalencik May 19, 2021

Neon does leverage the change, but it requires the user to continue to re-use the same queue. This is one of the reasons queues are Sync.

I think it's important to note that this is an API change with the intent to guide users towards the choice that is optimal in most situations.

Author

indutny May 19, 2021

@indutny Thanks for providing this. I think this is a really interesting idea.

One complexity concern I have is how dropping a referenced EventQueue would work. With a single shared queue, the queue needs to track how many times it has been referenced. When the count drops to 0, the shared queue needs to unreference itself.

However, unreferencing can only be performed on the event loop. The only solution I can think of is sending an event to unreference itself. This could end up being costly if referenced queues are frequently created and dropped.

This is indeed the case, but I don't believe that it will cause any performance losses because of how fast the threadsafe functions are now. I've added an extra optimization to the PR to neon to avoid this extra work when EventQueue owns the ThreadsafeFunction and will drop it.

text/0030-shared-event-queue.md Outdated

+              creates a new `EventQueue` every time as requested. So instead of doing that it
+              is proposed that we use existing `lifetime::InstanceData` internal API layer to
+              hold a wrapper around `ThreadsafeFunction` tailored to execute Rust closures
+              specifically, and to use an `Arc` reference to a `RwLock` over this wrapper in

Member

kjvalencik May 19, 2021

Is a RwLock necessary? The only non-threadsafe operations on EventQueue are unref and reference. Both of these require a Context which already enforces single threaded-ness.

Author

indutny May 19, 2021

The .reference()/.unref() methods of ThreadsafeFunction have mut self, so ThreadsafeTrampoline needs to be mutable in order to call them. Is there any better way of doing it?


          address feedback

cfc00fd

Author

indutny commented May 19, 2021

@indutny Thanks for providing this. I think this is a really interesting idea.

One complexity concern I have is how dropping a referenced EventQueue would work. With a single shared queue, the queue needs to track how many times it has been referenced. When the count drops to 0, the shared queue needs to unreference itself.

However, unreferencing can only be performed on the event loop. The only solution I can think of is sending an event to unreference itself. This could end up being costly if referenced queues are frequently created and dropped.

This indeed the case, but there's little to no performance penalty for doing it this way with the shared ThreadsafeFunction because it is stupid fast in Node now 😂 I've added an optimization to my PR to avoid this extra work for the EventQueue::new() because it owns the ThreadsafeFunction and will drop it automatically.

Thank you so much for a thorough review. I believe I've addressed your feedback. PTAL.

Member

kjvalencik commented Sep 3, 2021

Closing because the contents have been merged into #32

kjvalencik closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment