feat: add task counter pairs #6114

conradludgate · 2023-10-27T17:02:04Z

Motivation

Metrics like active_tasks_count or injection_queue_depth are fast-moving gauges and even taking a snapshot every few seconds doesn't say much about what's going inside Tokio. It would be better to use two counters: one for additions, one for removals

We're hoping to add a prometheus exporter for the tokio metrics information, but a sample rate of 15 seconds will likely miss a lot of task spikes. I could implement some level of eager aggregation, but as the linked comment says, you can still miss some with a sample rate of 500ms.

Solution

In CountedLinkedList, replace the count: usize with a pair of u64s that can only be incremented. One u64 for added items and one for removed items.

Open to bikeshedding on the terminology

Open questions

should the active task API return all 3 values in 1, rather than require 3 separate lock calls?
what other APIs are current gauges and should be counters?

conradludgate · 2023-10-30T09:17:48Z

Gauge like metrics:

num_workers (a constant quality, doesn't count)
num_blocking_threads
num_idle_blocking_threads
injection_queue_depth
worker_local_queue_depth
blocking_queue_depth

`num_blocking_threads`

Can be treated as blocking_threads_created - blocking_threads_released. Would require 2 atomics, unless it is acceptable to make this a u64 which encodes 2 u32s (how many apps will create 4 billion blocking threads?!)

`num_idle_blocking_threads`

Same as above, although likely will need 2 u64 counters. blocking_active_total - blocking_idle_total.

`injection_queue_depth`

injection_pushed - injection_popped. Requires 2 u64 atomic counters.

`worker_local_queue_depth`

Requires no additional counters, we already have head and tail. They are u32 quantities though and will likely overflow, which makes this tricky. I appreciate that adding extra atomics to this path might introduce noticeable latency spike so I am fine with ignoring this one.

`blocking_queue_depth`

Same as the other blocking gauges.

hawkw · 2023-10-30T16:11:15Z

IMO using two counters rather than a gauge is definitely more correct for these metrics, so I'm 👍 on this change.

tokio/src/util/linked_list.rs

tokio/tests/rt_metrics.rs

Darksonn · 2023-11-25T14:27:23Z

Any status update on this?

conradludgate · 2023-11-26T16:57:41Z

I'll try and fix up the flaky tests tomorrow.

Any opinions on the API? Since it's likely that the pair will be accessed together and not separately, doing 2 locks is a bit unfortunate rather than just 1. Probably this should return a tuple pair instead of having 2 functions

Darksonn · 2023-11-27T12:51:36Z

Returning a tuple makes sense to me. You could even define a struct with two fields to give better names than .0 and .1 to the two properties.

tokio/src/runtime/metrics/runtime.rs

Darksonn · 2024-01-30T10:11:23Z

Hi, it looks like the conflicting PR has been merged now. Sorry that it took so long to get back to you after that. Are you still interested in working on this?

conradludgate · 2024-01-30T10:41:50Z

Are you still interested in working on this?

Yes, I will rebase accordingly. Are there any other changes you think should be included?

Darksonn · 2024-01-30T11:19:09Z

Hmm, overall it looks good, but I don't love the naming of CounterPair and CounterPair::len.

conradludgate · 2024-02-12T14:56:04Z

Since the sharded list makes use of atomics, I've moved from added/removed to added/count so that is_empty() only needs 1 atomic access.

Hmm, overall it looks good, but I don't love the naming of CounterPair and CounterPair::len.

I'm tempted to remove it then and we can stick with start_task_count and active_task_count functions.

conradludgate · 2024-02-12T15:04:49Z

also renamed start_tasks to spawned_tasks as it is likely more intuitive.

tokio/tests/rt_metrics.rs

tokio/src/runtime/metrics/runtime.rs

tokio/tests/rt_metrics.rs

Darksonn · 2024-05-03T09:53:33Z

There's a CI failure:

FAIL [   0.386s] tokio::rt_metrics num_active_tasks

--- STDOUT:              tokio::rt_metrics num_active_tasks ---

running 1 test
test num_active_tasks ... FAILED

failures:

failures:
    num_active_tasks

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 22 filtered out; finished in 0.33s


--- STDERR:              tokio::rt_metrics num_active_tasks ---
thread 'num_active_tasks' panicked at tokio/tests/rt_metrics.rs:104:5:
assertion `left == right` failed
  left: 0
 right: 1
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
   2: core::panicking::assert_failed_inner
             at /rustc/9b00[956](https://github.com/tokio-rs/tokio/actions/runs/8936900362/job/24548151879?pr=6114#step:8:957)e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:343:17
   3: core::panicking::assert_failed
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:298:5
   4: rt_metrics::num_active_tasks
             at ./tests/rt_metrics.rs:104:5
   5: rt_metrics::num_active_tasks::{{closure}}
             at ./tests/rt_metrics.rs:85:22
   6: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

github-actions bot added R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR R-loom-multi-thread-alt Run loom multi-thread alt tests on this PR labels Oct 27, 2023

conradludgate force-pushed the metrics-counter-pairs branch from 458813d to 5fec88d Compare October 29, 2023 15:10

Darksonn added A-tokio Area: The main tokio crate M-metrics Module: tokio/runtime/metrics labels Nov 5, 2023

Darksonn reviewed Nov 5, 2023

View reviewed changes

tokio/src/util/linked_list.rs Outdated Show resolved Hide resolved

tokio/tests/rt_metrics.rs Outdated Show resolved Hide resolved

Darksonn requested a review from hawkw November 5, 2023 14:13

ghost reviewed Nov 27, 2023

View reviewed changes

tokio/src/runtime/metrics/runtime.rs Outdated Show resolved Hide resolved

conradludgate force-pushed the metrics-counter-pairs branch 8 times, most recently from 4ad7dd4 to 5355563 Compare November 28, 2023 11:08

conradludgate force-pushed the metrics-counter-pairs branch from 5355563 to 176d74b Compare February 12, 2024 14:52

Darksonn reviewed Feb 13, 2024

View reviewed changes

tokio/tests/rt_metrics.rs Outdated Show resolved Hide resolved

Darksonn reviewed Feb 13, 2024

View reviewed changes

tokio/src/runtime/metrics/runtime.rs Outdated Show resolved Hide resolved

conradludgate added 11 commits April 25, 2024 16:24

feat: add task counter pairs

63d85dd

make tests more reliable

3d5f34e

switch to CounterPair struct, improve reliability of flaky test

0833885

remove counter pair

8568aac

rename start_task_count everywhere

9a00ce7

use loom atomicu64

d5c01f0

remove test loop

e14ae50

remove asserts

6f9cece

document monotonic behaviour

34834e7

rename active_tasks_count to num_active_tasks for consistency

28f0562

deprecate

6497d0c

conradludgate force-pushed the metrics-counter-pairs branch from 755e551 to 6497d0c Compare April 25, 2024 15:24

conradludgate requested a review from Darksonn April 30, 2024 14:29

Darksonn reviewed May 1, 2024

View reviewed changes

tokio/src/runtime/metrics/runtime.rs Outdated Show resolved Hide resolved

tokio/tests/rt_metrics.rs Show resolved Hide resolved

conradludgate and others added 2 commits May 2, 2024 15:36

fix test and add deprecation reason

a7713db

Merge branch 'master' into metrics-counter-pairs

77df06c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add task counter pairs #6114

feat: add task counter pairs #6114

conradludgate commented Oct 27, 2023 •

edited

conradludgate commented Oct 30, 2023

hawkw commented Oct 30, 2023

Darksonn commented Nov 25, 2023

conradludgate commented Nov 26, 2023

Darksonn commented Nov 27, 2023

Darksonn commented Jan 30, 2024

conradludgate commented Jan 30, 2024

Darksonn commented Jan 30, 2024

conradludgate commented Feb 12, 2024

conradludgate commented Feb 12, 2024

Darksonn commented May 3, 2024

feat: add task counter pairs #6114

Are you sure you want to change the base?

feat: add task counter pairs #6114

Conversation

conradludgate commented Oct 27, 2023 • edited

Motivation

Solution

Open questions

conradludgate commented Oct 30, 2023

num_blocking_threads

num_idle_blocking_threads

injection_queue_depth

worker_local_queue_depth

blocking_queue_depth

hawkw commented Oct 30, 2023

Darksonn commented Nov 25, 2023

conradludgate commented Nov 26, 2023

Darksonn commented Nov 27, 2023

Darksonn commented Jan 30, 2024

conradludgate commented Jan 30, 2024

Darksonn commented Jan 30, 2024

conradludgate commented Feb 12, 2024

conradludgate commented Feb 12, 2024

Darksonn commented May 3, 2024

conradludgate commented Oct 27, 2023 •

edited

`num_blocking_threads`

`num_idle_blocking_threads`

`injection_queue_depth`

`worker_local_queue_depth`

`blocking_queue_depth`