rt: reduce no-op wakeups in the multi-threaded scheduler #4383

carllerche · 2022-01-05T21:56:40Z

This PR reduces the number of times worker threads wake up without having work to do in the multi-threaded scheduler. Unnecessary wake-ups are expensive and slow down the scheduler. I have observed this change reduce no-op wakes by up to 50%.

The multi-threaded scheduler is work-stealing. When a worker has tasks to process, and other workers are idle (parked), these idle workers must be unparked so that they can steal work from the busy worker. However, unparking threads is expensive, so there is an optimization that avoids unparking a worker if there already exists workers in a "searching" state (the worker is unparked and looking for work). This works pretty well, but transitioning from 1 "searching" worker to 0 searching workers introduces a race condition where a thread unpark can be lost:

thread 1: last searching worker about to exit searching state
thread 2: needs to unpark a thread, but skip because there is a searching worker.
thread 1: exits searching state w/o seeing thread 2's work.

Because this should be a rare condition, Tokio solves this by always unparking a new worker when the current worker:

is the last searching worker
is transitioning out of searching
has work to process.

When the newly unparked worker wakes, if the race condition described above happened, "thread 2"'s work will be found. Otherwise, it will just go back to sleep.

Now we come to the issue at hand. A bug incorrectly set a worker to "searching" when the I/O driver unparked the thread. In a situation where the scheduler was only partially under load and is able to operate with 1 active worker, the I/O driver would unpark the thread when new I/O events are received, incorrectly transition it to "searching", find new work generated by inbound I/O events, incorrectly transition itself from the last searcher -> no searchers, and unpark a new thread. This new thread would wake, find no work and go back to sleep.

Note that, when the scheduler is fully saturated, this change will make no impact as most workers are always unparked and the optimization to avoid unparking threads described at the top applies.

Benchmarks

Hyper

This benches Hyper's "hello" server using wrk -t1 -c400 -d10s http://127.0.0.1:3000/

master

Requests/sec: 118495.92
Transfer/sec: 9.94MB

this PR

Requests/sec: 135261.99
Transfer/sec: 11.35MB

mini-redis

This benches mini-redis using redis-benchmark -c 5 -t get,set

master

====== SET ======
  100000 requests completed in 1.03 seconds
  5 parallel clients
  3 bytes payload
  keep alive: 1
  multi-thread: no

99.17% <= 0.1 milliseconds
99.82% <= 0.2 milliseconds
99.92% <= 0.3 milliseconds
99.98% <= 0.4 milliseconds
99.99% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
96711.80 requests per second

====== GET ======
  100000 requests completed in 1.03 seconds
  5 parallel clients
  3 bytes payload
  keep alive: 1
  multi-thread: no

99.68% <= 0.1 milliseconds
99.82% <= 0.2 milliseconds
99.93% <= 0.3 milliseconds
99.98% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.7 milliseconds
100.00% <= 0.7 milliseconds
97370.98 requests per second

this PR

====== SET ======
  100000 requests completed in 0.83 seconds
  5 parallel clients
  3 bytes payload
  keep alive: 1
  multi-thread: no

99.11% <= 0.1 milliseconds
99.72% <= 0.2 milliseconds
99.92% <= 0.3 milliseconds
99.98% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.6 milliseconds
120627.27 requests per second

====== GET ======
  100000 requests completed in 0.81 seconds
  5 parallel clients
  3 bytes payload
  keep alive: 1
  multi-thread: no

99.72% <= 0.1 milliseconds
99.85% <= 0.2 milliseconds
99.89% <= 0.3 milliseconds
99.97% <= 0.4 milliseconds
100.00% <= 0.5 milliseconds
100.00% <= 0.5 milliseconds
123152.71 requests per second

carllerche · 2022-01-12T05:36:33Z

tokio/src/runtime/thread_pool/worker.rs

@@ -620,8 +620,7 @@ impl Core {
        // If a task is in the lifo slot, then we must unpark regardless of
        // being notified
        if self.lifo_slot.is_some() {
-            worker.shared.idle.unpark_worker_by_id(worker.index);
-            self.is_searching = true;
+            self.is_searching = !worker.shared.idle.unpark_worker_by_id(worker.index);


This is the key bit. When the worker wakes, it only enters the "searching" state if it was unparked by another thread. This distinguishes unparks from other threads to signal to the worker that it should steal work vs. the I/O driver unparked the thread because events arrived.

it might be worth having a comment in the code explaining that? not a hard blocker though.

carllerche · 2022-01-12T06:08:58Z

@Darksonn @hawkw this should be ready for review, it would be nice to have it confirmed in a "real app" too.

blt · 2022-01-12T18:13:32Z

@Darksonn @hawkw this should be ready for review, it would be nice to have it confirmed in a "real app" too.

I'm planning on wiring this up into vector and running it through our integrated benchmarks. Assuming all goes well I'll link numbers here in a handful of hours.

hawkw

This looks good to me --- the actual change is pretty simple and makes sense based on the explanation in the PR description.

It might be nice to add a bit to the comments in this code explaining this behavior, but that's not a hard blocker.

I'd like to test this out in Linkerd today or tomorrow, I'll post benchmark results if I get a chance to do that!

hawkw · 2022-01-12T18:18:06Z

tokio/src/runtime/thread_pool/worker.rs

@@ -620,8 +620,7 @@ impl Core {
        // If a task is in the lifo slot, then we must unpark regardless of
        // being notified
        if self.lifo_slot.is_some() {
-            worker.shared.idle.unpark_worker_by_id(worker.index);
-            self.is_searching = true;
+            self.is_searching = !worker.shared.idle.unpark_worker_by_id(worker.index);


it might be worth having a comment in the code explaining that? not a hard blocker though.

tokio/src/runtime/thread_pool/worker.rs

blt · 2022-01-12T21:01:55Z

No change in observed vector throughput, per these results. That said, we don't track CPU use in our experimental setup and the observation duration in a PR is constrained to 200 seconds to make turn-around time feasible.

carllerche · 2022-01-12T22:16:08Z

@blt thanks for checking. It would be interesting to know the CPU load as that would let us know if it falls within the scope of this change. If you are saturating all workers in your tests, then there would be no visible improvement as it only applies when only a few workers are actually kept busy.

blt · 2022-01-12T22:46:54Z

Unfortunately we don't capture that kind of saturation information yet, though we do have an issue for it: vectordotdev/vector#10456. That said, we're intending to fully saturate vector so it's reasonable to believe there's little idle time.

Co-authored-by: Eliza Weisman <eliza@buoyant.io>

…io-rs#4383)" This reverts commit 4eed411.

…io-rs#4383)"

…io-rs#4383)" This reverts commit 4eed411.

github-actions bot added the R-loom Run loom tests on this PR label Jan 5, 2022

Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Jan 6, 2022

carllerche force-pushed the investigate-noop-wakes branch 2 times, most recently from 9bc41b1 to 175d8b8 Compare January 11, 2022 00:16

rt: reduce no-op wakes in multi-threaded runtime

49192be

carllerche force-pushed the investigate-noop-wakes branch from 175d8b8 to 49192be Compare January 11, 2022 23:56

carllerche commented Jan 12, 2022

View reviewed changes

carllerche changed the title ~~rt: investigate noop wakes~~ rt: reduce no-op wakeups in the multi-threaded scheduler Jan 12, 2022

carllerche marked this pull request as ready for review January 12, 2022 06:06

carllerche requested review from Darksonn and hawkw January 12, 2022 06:08

hawkw approved these changes Jan 12, 2022

View reviewed changes

blt mentioned this pull request Jan 12, 2022

chore: Try out tokio-rs/tokio #4383 vectordotdev/vector#10823

Closed

Darksonn approved these changes Jan 12, 2022

View reviewed changes

carllerche and others added 3 commits January 13, 2022 09:41

Update tokio/src/runtime/thread_pool/worker.rs

c100d1f

Co-authored-by: Eliza Weisman <eliza@buoyant.io>

more comments

720dd10

Merge remote-tracking branch 'origin/master' into investigate-noop-wakes

0aea4d0

carllerche merged commit 4eed411 into master Jan 13, 2022

carllerche deleted the investigate-noop-wakes branch January 13, 2022 23:18

carllerche mentioned this pull request Jan 27, 2022

chore: prepare Tokio v1.16 release. #4431

Merged

biluohc mentioned this pull request Jan 28, 2022

chore: Upgrade Tokio to 1.16.0 denoland/deno#13518

Closed

mre mentioned this pull request Jan 28, 2022

Bump tokio from 1.15.0 to 1.16.1 lycheeverse/lychee#482

Merged

jszwedko mentioned this pull request Jan 31, 2022

fix(tests): Use one more thread for merge_and_fork test vectordotdev/vector#11112

Merged

CriesofCarrots added a commit to CriesofCarrots/tokio that referenced this pull request May 6, 2022

Revert "rt: reduce no-op wakeups in the multi-threaded scheduler (tok…

4ba08ac

…io-rs#4383)" This reverts commit 4eed411.

manaswini05 added a commit to manaswini05/tokio that referenced this pull request Jul 26, 2022

Revert "rt: reduce no-op wakeups in the multi-threaded scheduler (tok…

156772f

…io-rs#4383)" This reverts commit 4eed411.

manaswini05 added a commit to Constructor-io/tokio that referenced this pull request Jul 26, 2022

Revert "rt: reduce no-op wakeups in the multi-threaded scheduler (tok…

0fa446c

…io-rs#4383)" This reverts commit 4eed411.

manaswini05 added a commit to Constructor-io/tokio that referenced this pull request Jul 28, 2022

Revert "rt: reduce no-op wakeups in the multi-threaded scheduler (tok…

5026a83

…io-rs#4383)" This reverts commit 4eed411.

manaswini05 mentioned this pull request Jul 28, 2022

Performance degradation due to PR #4383 to reduce no-op wakeups in multi-threaded scheduler #4873

Open

Darksonn mentioned this pull request Sep 3, 2022

tokio::time::sleep causes hyper.rs http server to get stuck and unresponsive. #4961

Closed

cristi- added a commit to cristi-/tokio that referenced this pull request Mar 6, 2023

Revert "rt: reduce no-op wakeups in the multi-threaded scheduler (tok…

54d068c

…io-rs#4383)"

menshikh-iv added a commit to Constructor-io/tokio that referenced this pull request Jul 6, 2023

Revert "rt: reduce no-op wakeups in the multi-threaded scheduler (tok…

39507ad

…io-rs#4383)" This reverts commit 4eed411.

CriesofCarrots mentioned this pull request Nov 28, 2023

Revert additional line solana-labs/solana-tokio#1

Merged

wathenjiang mentioned this pull request Dec 27, 2023

rt: reduce the impact of CPU bound tasks on the overall runtime shceduler #6251

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rt: reduce no-op wakeups in the multi-threaded scheduler #4383

rt: reduce no-op wakeups in the multi-threaded scheduler #4383

carllerche commented Jan 5, 2022 •

edited

carllerche Jan 12, 2022

hawkw Jan 12, 2022

carllerche Jan 13, 2022

carllerche commented Jan 12, 2022

blt commented Jan 12, 2022

hawkw left a comment

hawkw Jan 12, 2022

blt commented Jan 12, 2022

carllerche commented Jan 12, 2022

blt commented Jan 12, 2022

rt: reduce no-op wakeups in the multi-threaded scheduler #4383

rt: reduce no-op wakeups in the multi-threaded scheduler #4383

Conversation

carllerche commented Jan 5, 2022 • edited

Benchmarks

Hyper

mini-redis

carllerche Jan 12, 2022

Choose a reason for hiding this comment

hawkw Jan 12, 2022

Choose a reason for hiding this comment

carllerche Jan 13, 2022

Choose a reason for hiding this comment

carllerche commented Jan 12, 2022

blt commented Jan 12, 2022

hawkw left a comment

Choose a reason for hiding this comment

hawkw Jan 12, 2022

Choose a reason for hiding this comment

blt commented Jan 12, 2022

carllerche commented Jan 12, 2022

blt commented Jan 12, 2022

carllerche commented Jan 5, 2022 •

edited