Use actual thread local queues instead of using a RwLock #93

james7132 · 2024-02-14T03:02:19Z

Currently, runner local queues rely on a RwLock<Vec<Arc<ConcurrentQueue>>>> to store the queues instead of using actual thread-local storage.

This adds thread_local as a dependency, but this should allow the executor to work steal without needing to hold a lock, as well as allow tasks to schedule onto the local queue directly, where possible, instead of always relying on the global injector queue.

Fixes #62.

james7132 · 2024-02-14T03:21:21Z

Whoa, that miri failure is a big red flag. Not sure how that's happening. Looks like a soundness bug with ThreadLocal::iter. Closing this out for now.

EDIT: This happens regardless of whether we're chaining the iterators or not, iterating over elements always seems to trigger Miri.

james7132 · 2024-02-14T03:37:14Z

Filed an issue regarding the UB: Amanieu/thread_local-rs#70

james7132 · 2024-02-16T20:26:56Z

The miri issue seems to be fixed with Amanieu/thread_local-rs#72, so this is likely going to be blocked on that being merged and released.

… injector queue

notgull

My main issue here is that thread_local's MSRV might exceed ours at some point. I've considered this in the past. However, the maintainer of thread_local has stated that the MSRV for thread-local might exceed Rust v1.63 in the future.

The issue is that, if thread-local's MSRV is bumped, we would be forced to depend on an older version of thread-local. For our dependents with higher MSRVs this would cause duplicate dependencies in the tree.

In the past we'd encountered this issue with once_cell, see smol-rs/async-io#93. This is why I avoid the use of once_cell throughout smol, instead opting to prefer async_lock::OnceCell.

src/lib.rs

Co-authored-by: John Nunley <jtnunley01@gmail.com>

src/lib.rs

james7132 · 2024-02-17T22:35:02Z

The issue is that, if thread-local's MSRV is bumped, we would be forced to depend on an older version of thread-local. For our dependents with higher MSRVs this would cause duplicate dependencies in the tree.

Is there a location where the MSRV policy for crates under this organization is documented? All things considered, I think the update frequency for thread_local is so low that this should be pretty low risk. That said, the once_cell dependency also is under a similar situation.

notgull · 2024-02-17T23:57:50Z

Is there a location where the MSRV policy for crates under this organization is documented? All things considered, I think the update frequency for thread_local is so low that this should be pretty low risk. That said, the once_cell dependency also is under a similar situation.

The MSRV policy is here. I should really add it explicitly to all crates.

Even if the risk is low it's not zero. It was a headache the first time and I'd like to avoid a repeat if I can.

notgull · 2024-02-18T06:15:47Z

I'm fine with merging it for now; from my perspective it doesn't look like thread-local will be bumping MSRV anytime soon. Not to mention with async traits coming out in the near future I expect a futures breaking change, which will translate into a breaking change in smol alongside a liberal MSRV bump.

james7132 · 2024-02-18T10:37:40Z

Did a quick benchmark comparison after all of the aforementioned changes, not sure what to make of these results:

executor::create        time:   [1.2670 µs 1.2680 µs 1.2693 µs]
                        change: [+10.796% +10.967% +11.086%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

single_thread/executor::spawn_one
                        time:   [1.6679 µs 1.6746 µs 1.6822 µs]
                        change: [+0.2716% +1.8302% +3.6198%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
single_thread/executor::spawn_many_local
                        time:   [6.0765 ms 6.1225 ms 6.1719 ms]
                        change: [+1.7673% +2.7052% +3.7431%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  7 (7.00%) high mild
single_thread/executor::spawn_recursively
                        time:   [35.880 ms 36.350 ms 36.840 ms]
                        change: [-3.4080% -1.6301% +0.2145%] (p = 0.09 > 0.05)
                        No change in performance detected.
single_thread/executor::yield_now
                        time:   [5.2446 ms 5.2552 ms 5.2670 ms]
                        change: [-10.483% -10.256% -10.034%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

multi_thread/executor::spawn_one
                        time:   [1.4244 µs 1.4298 µs 1.4361 µs]
                        change: [-1.6323% -0.6865% +0.2260%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe
multi_thread/executor::spawn_many_local
                        time:   [24.883 ms 24.969 ms 25.057 ms]
                        change: [+3.4173% +3.8879% +4.4088%] (p = 0.00 < 0.05)
                        Performance has regressed.
Benchmarking multi_thread/executor::spawn_recursively: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.5s, or reduce sample count to 30.
multi_thread/executor::spawn_recursively
                        time:   [162.44 ms 162.78 ms 163.19 ms]
                        change: [-4.6241% -4.2275% -3.8289%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
Benchmarking multi_thread/executor::yield_now: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.3s, enable flat sampling, or reduce sample count to 50.
multi_thread/executor::yield_now
                        time:   [1.8351 ms 1.8451 ms 1.8580 ms]
                        change: [-92.286% -92.235% -92.165%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

The impact to create I think can be chalked up to runtime environment since we've changed nothing about the initialization of the executor, and the mulitthreaded/executor::yield_now results are very suspicious.

james7132 · 2024-02-18T11:49:34Z

Cross checking the benchmark results for #37, it seems like the results are to be expected.

src/lib.rs

…wake

Cargo.toml

notgull

Overall looks good to me

Cargo.toml

notgull · 2024-02-21T04:06:58Z

src/lib.rs


                // Pick a random starting point in the iterator list and rotate the list.
-                let n = local_queues.len();
+                let n = local_queues.iter().count();


Is this a cold operation? It seems like this would take a while.

This is one part I'm not so sure about. Generally this shouldn't be under contention, since the cost to spin up new threads is going to be higher than it is to scan over the entire container, unless you have literally thousands of threads. It otherwise is just a scan through fairly small buckets.

We could use an atomic counter to track how many there are, but since you can't remove items from the ThreadLocal, there will be residual thread locals from currently unused threads (as thread IDs are reused), that may get out of sync.

src/lib.rs

Co-authored-by: John Nunley <jtnunley01@gmail.com>

james7132 added 3 commits February 13, 2024 18:50

Use actual thread-local queues instead of using a RwLock

7232b84

remove extra shadowing

5e353e0

Shut up clippy

3e05297

james7132 force-pushed the thread-local-queues branch from c7d9723 to 3e05297 Compare February 14, 2024 03:11

james7132 closed this Feb 14, 2024

james7132 reopened this Feb 16, 2024

Test miri fix on thread_local

929fdd1

Attempt to enqueue tasks onto the local queue first before the global…

bbfdc63

… injector queue

notgull requested changes Feb 17, 2024

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

notgull marked this pull request as draft February 17, 2024 20:25

compress the scheudling logic

4caf904

Co-authored-by: John Nunley <jtnunley01@gmail.com>

james7132 commented Feb 17, 2024

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

james7132 added 5 commits February 18, 2024 02:02

Address review comments

9da2149

Document the use of LocalQueue

dba98c2

Have tickers look in the local queue before the global queue.

851c077

Merge branch 'master' into thread-local-queues

c94991d

Properly wake up the local thread instead of notifying another thread

e6b8ca2

james7132 added 3 commits February 18, 2024 02:54

Complete the sentence

4420808

Restore Debug functionality

4cfd563

Remove unused RwLock

6b69fb9

notgull reviewed Feb 18, 2024

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

james7132 mentioned this pull request Feb 19, 2024

Close performance gap between bevy_tasks and rayon bevyengine/bevy#10064

Open

james7132 added 3 commits February 19, 2024 10:43

Conditionally wake up another thread if the local thread is already a…

6e9cc7f

…wake

Remove redundant use statement

4783890

Remove more of them

e46beb1

james7132 commented Feb 20, 2024

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

Switch off of my fork, and use the published version with the miri fix

049def9

james7132 marked this pull request as ready for review February 20, 2024 23:58

james7132 requested a review from notgull February 20, 2024 23:58

Bump MSRV to 1.61

aadac47

notgull approved these changes Feb 21, 2024

View reviewed changes

james7132 added 2 commits February 21, 2024 15:42

Add note about post-run spawns.

c8443e0

Sort dependencies

007eeae

james7132 requested a review from notgull February 21, 2024 23:43

notgull approved these changes Feb 22, 2024

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

fix typo

b9cd1ef

Co-authored-by: John Nunley <jtnunley01@gmail.com>

notgull merged commit 7592d41 into smol-rs:master Feb 22, 2024
8 checks passed

notgull mentioned this pull request Feb 22, 2024

v1.9.0 #98

Merged

This was referenced Feb 22, 2024

1.9.0 may need to be yanked #103

Closed

Fix deadlock when only calling Executor::tick to drive it #102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use actual thread local queues instead of using a RwLock #93

Use actual thread local queues instead of using a RwLock #93

james7132 commented Feb 14, 2024 •

edited

james7132 commented Feb 14, 2024 •

edited

james7132 commented Feb 14, 2024

james7132 commented Feb 16, 2024

notgull left a comment

james7132 commented Feb 17, 2024

notgull commented Feb 17, 2024

notgull commented Feb 18, 2024 •

edited

james7132 commented Feb 18, 2024

james7132 commented Feb 18, 2024

notgull left a comment

notgull Feb 21, 2024

james7132 Feb 21, 2024 •

edited

Use actual thread local queues instead of using a RwLock #93

Use actual thread local queues instead of using a RwLock #93

Conversation

james7132 commented Feb 14, 2024 • edited

james7132 commented Feb 14, 2024 • edited

james7132 commented Feb 14, 2024

james7132 commented Feb 16, 2024

notgull left a comment

Choose a reason for hiding this comment

james7132 commented Feb 17, 2024

notgull commented Feb 17, 2024

notgull commented Feb 18, 2024 • edited

james7132 commented Feb 18, 2024

james7132 commented Feb 18, 2024

notgull left a comment

Choose a reason for hiding this comment

notgull Feb 21, 2024

Choose a reason for hiding this comment

james7132 Feb 21, 2024 • edited

Choose a reason for hiding this comment

james7132 commented Feb 14, 2024 •

edited

james7132 commented Feb 14, 2024 •

edited

notgull commented Feb 18, 2024 •

edited

james7132 Feb 21, 2024 •

edited