Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Console doesn't detect self wakes from tokio::task::yield_now() after Tokio 1.23 #512

Open
hds opened this issue Jan 23, 2024 · 5 comments
Labels
S-bug Severity: bug

Comments

@hds
Copy link
Collaborator

hds commented Jan 23, 2024

(edit(2024-01-24): Updated title and description to clarfiy that it is only self wakes coming from tokio::task::yield_now() that are affected)

What is the issue?

When observing an instrumented program built with Tokio 1.23 or later, Tokio Console isn't able to detect self wakes coming from tokio::task::yield_now().

This is due to tokio-rs/tokio#5223, where the self wake was changed to be deferred so that self waking tasks wouldn't starve the resource drivers.

How can the bug be reproduced?

The following code should show a task with a single self wake:

    _ = tokio::spawn(async {
        tokio::task::yield_now().await;
    })
    .await;

If build with Tokio 1.22.0 then we see the wake counted as a self wake in Tokio Console. We can confirm this using ari-subscriber where we get the output (truncated):

2024-01-23T16:24:05.167259Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} new
2024-01-23T16:24:05.167786Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} enter
2024-01-23T16:24:05.167911Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} tokio::task::waker: op="waker.wake_by_ref", task.id=1
2024-01-23T16:24:05.168007Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} exit

Specifically there is a wake operation (op="waker.wake_by_ref") inside the runtime.spawn span.

When updating to Tokio 1.23.0, Tokio Console no longer counts the single wake as a self wake. Again, we can confirm using ari-subscriber (truncated output):

2024-01-23T16:33:26.834206Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} new
2024-01-23T16:33:26.834805Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} enter
2024-01-23T16:33:26.834966Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} tokio::task::waker: op="waker.clone", task.id=1
2024-01-23T16:33:26.835048Z TRACE runtime.spawn[1]{kind=task, task.name=, task.id=18, loc.file="src/main.rs", loc.line=15, loc.col=9} exit
2024-01-23T16:33:26.835145Z TRACE tokio::task::waker: op="waker.wake", task.id=1

Here we see that the waker is cloned (op="waker.clone") within the runtime.spanw span for the task, but the wake itself happens by value after the span has exited (op="waker.wake").

Note: The task.id on the tokio::task::waker traces corresponds to the tracing span Id, which is shown for traces in brackets after the name, e.g. runtime.spawn[1].

Logs, error output, etc

No response

Versions

Tokio 1.23

Possible solution

We may need to modify the instrumentation in Tokio itself to be able to detect a self wake. Since Tokio knows that it is performing a self wake (the deferred waking code is only used from tokio::task::yield_now()), it may be possible to explicitly include in the instrumentation when a wake is a self wake.

It's not necessarily straight forward though, this code passes through the std library Waker object.

Additional context

No response

Would you like to work on fixing this bug?

yes

@hds hds added the S-bug Severity: bug label Jan 23, 2024
@hawkw
Copy link
Member

hawkw commented Jan 23, 2024

Hmm, do we know why the wake operation is no longer inside the task's span? What changed upstream in tokio that resulted in the task span being exited before the wakeup?

Edit: oh haha i can't read, i missed the first paragraph that references tokio-rs/tokio#5223


As a side note, it would be really cool if we had some way to run automated tests against new tokio releases, in order to proactively detect issues where upstream changes break our instrumentation. I wonder if we could get some kind of webhook that could trigger a GitHub Actions workflow when a new Tokio release is published. This might be a bit of a big project, and it probably deserves its own tracking ticket, but I figured I'd mention it here.

@hawkw
Copy link
Member

hawkw commented Jan 23, 2024

Thinking about this...I honestly think it might be okay to just exclude task::yield_now() from the self-wake warning, because of the changes in tokio-rs/tokio#5223.

The primary reason we want to warn about tasks self-waking is that excessive self-wakes can result in a task starving the runtime by waking itself, getting put in the LIFO slot, and immediately executing again. Since tokio-rs/tokio#5223 changes yield_now() to wake the task to a separate queue that gets polled only after tasks woken by IO, timers, and synchronization primitives, it no longer presents as much of a risk of starving the runtime, compared to user-implemented self-wakes (e.g. a Future::poll implementation that immediately wakes its own Waker). Honestly, it might be fine for yield_now() to not trigger the self-wake lint after Tokio 1.23.

@hds what do you think?

@hds
Copy link
Collaborator Author

hds commented Jan 23, 2024

@hawkw that's a good point regarding yield_now() not being problematic after the defer changes.

In that case it's more a question of changing the tests and documenting the limitation in self wake counting. I'll do that and then we can probably close this issue.

Thanks!

@hawkw
Copy link
Member

hawkw commented Jan 23, 2024

Sounds good to me, thanks for flagging this!

@hds
Copy link
Collaborator Author

hds commented Jan 24, 2024

As a side note, it would be really cool if we had some way to run automated tests against new tokio releases, in order to proactively detect issues where upstream changes break our instrumentation. I wonder if we could get some kind of webhook that could trigger a GitHub Actions workflow when a new Tokio release is published. This might be a bit of a big project, and it probably deserves its own tracking ticket, but I figured I'd mention it here.

I missed this first comment yesterday.

This is a good idea, although I think we might want to test against tokio master to catch these issues before the release. I was thinking a weekly action that runs against tokio master as well as always running against the latest version (I suppose this should be in addition to running against some "minimum stable tokio version") in our PRs and merges to main.

In a certain way this might be easier than the webhook - although having that webhook to catch things on the actual release would also be really nice.

@hds hds changed the title Console doesn't detect self wakes after Tokio 1.23 Console doesn't detect self wakes from tokio::task::yield_now() after Tokio 1.23 Jan 24, 2024
hds added a commit that referenced this issue Feb 16, 2024
Part of the testing performed in the `console-subscriber` integration
tests is detecting self wakes. This relied upon the `yield_now()` from
Tokio.

However, the behavior of this function was changed in
tokio-rs/tokio#5223 and since Tokio 1.23 the wake doesn't occur in the
task that `yield_now()` is called from. This breaks the test when using
a newer version of Tokio.

This change replaces the use of `yield_now()` with a custom
`self_wake()` function that returns a future which does perform a self
wake (wakes the task from within itself before returning
`Poll::Pending`).

The same custom `self_wake()` is also included in the `app` example so
that it shows self wakes correctly.

Tokio has been updated to 1.28.2 in the lock file (the last with
compatible MSRV) so that this fix is tested.

Ref #512
hds added a commit that referenced this issue Mar 8, 2024
Part of the testing performed in the `console-subscriber` integration
tests is detecting self wakes. This relied upon the `yield_now()` from
Tokio.

However, the behavior of this function was changed in
tokio-rs/tokio#5223 and since Tokio 1.23 the wake doesn't occur in the
task that `yield_now()` is called from. This breaks the test when using
a newer version of Tokio.

This change replaces the use of `yield_now()` with a custom
`self_wake()` function that returns a future which does perform a self
wake (wakes the task from within itself before returning
`Poll::Pending`).

The same custom `self_wake()` is also included in the `app` example so
that it shows self wakes correctly.

Tokio has been updated to 1.28.2 in the lock file (the last with
compatible MSRV) so that this fix is tested.

Ref #512
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-bug Severity: bug
Projects
None yet
Development

No branches or pull requests

2 participants