Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertion failed: next.is_notified() during task dump with tokio v1.35.0 #6343

Open
scottlamb opened this issue Feb 13, 2024 · 1 comment
Open
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-taskdump --cfg tokio_taskdump

Comments

@scottlamb
Copy link

Version
List the versions of all tokio crates you are using. The easiest way to get
this information is using cargo tree subcommand:

cargo tree | grep tokio

tokio v1.35.0
│   │   │       ├── tokio v1.35.0
│   │   │       │   ├── tokio-macros v2.2.0 (proc-macro)
│   │   │       └── tokio-stream v0.1.14
│   │   │           └── tokio v1.35.0 (*)
│   │   │   │   ├── tokio v1.35.0 (*)
│   │   │   │   ├── tokio-util v0.7.10
│   │   │   │   │   ├── tokio v1.35.0 (*)
│   │   │   │   ├── tokio v1.35.0 (*)
│   │   │   │   ├── tokio v1.35.0 (*)
│   │   │   │   └── tokio-native-tls v0.3.0
│   │   │   │       └── tokio v1.35.0 (*)
│   │   │   ├── tokio v1.35.0 (*)
│   │   │   ├── tokio-native-tls v0.3.0 (*)
│   │   │   ├── tokio-util v0.7.10 (*)
│   │   │   │   ├── tokio v1.35.0 (*)
│   │   │   │   └── tokio v1.35.0 (*)
│   │   │   ├── tokio v1.35.0 (*)
│   │   ├── tokio v1.35.0 (*)
│   │   ├── tokio-metrics v0.3.1
│   │   │   ├── tokio v1.35.0 (*)
│   │   │   └── tokio-stream v0.1.14 (*)
│   │   │   ├── tokio v1.35.0 (*)
│   │   │   ├── tokio-stream v0.1.14 (*)
│   │   │   ├── tokio-tungstenite v0.20.1
│   │   │   │   ├── tokio v1.35.0 (*)
│   │   │   ├── tokio-util v0.7.10 (*)
│   │   │   └── tokio v1.35.0 (*)
│   │   ├── tokio v1.35.0 (*)
│   │   └── tokio v1.35.0 (*)
│   │   └── tokio v1.35.0 (*)
│   │   │   │   │   └── tokio v1.35.0 (*)
│   │   │   │   ├── tokio v1.35.0 (*)
│   │   │   │   ├── tokio-util v0.7.10 (*)
│   │   └── tokio v1.35.0 (*)
│   ├── tokio v1.35.0 (*)
│   ├── tokio-metrics v0.3.1 (*)
│   ├── tokio-stream v0.1.14 (*)
│   └── tokio v1.35.0 (*)
│   └── tokio v1.35.0 (*)
│   └── tokio v1.35.0 (*)
│   ├── tokio v1.35.0 (*)
├── tokio v1.35.0 (*)
├── tokio-metrics v0.3.1 (*)
├── tokio-stream v0.1.14 (*)
├── tokio-util v0.7.10 (*)
├── tokio v1.35.0 (*)
│   ├── tokio v1.35.0 (*)
│   ├── tokio-stream v0.1.14 (*)
│   │   ├── tokio v1.35.0 (*)
│   ├── tokio v1.35.0 (*)
├── tokio v1.35.0 (*)
├── tokio-util v0.7.10 (*)
├── tokio v1.35.0 (*)
├── tokio-metrics v0.3.1 (*)
├── tokio v1.35.0 (*)
├── tokio-stream v0.1.14 (*)

Platform
The output of uname -a (UNIX), or version and 32 or 64-bit (Windows)

Linux ip-10-0-173-157 4.14.336-178.554.amzn1.x86_64 #1 SMP Fri Jan 12 10:40:15 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Description

I'm getting panics with the experimental task dump stuff. I'm running this code:

#[cfg(target_os = "linux")]
async fn taskdump(logger: &slog::Logger) {
    use std::sync::atomic::AtomicUsize;
    static COUNTER: AtomicUsize = AtomicUsize::new(0);
    let handle = tokio::runtime::Handle::current();
    let dump = handle.dump().await;
    let tasks: Vec<_> = dump.tasks().iter().collect();
    let dump_num = COUNTER.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
    info!(logger, "dump {} completed with {} active tasks", dump_num, tasks.len());
    for (i, task) in tasks.iter().enumerate() {
        info!(
            logger, "active task";
            "dump_num" => dump_num,
            "task_num" => i,
            "trace" => %task.trace(),
        );
    }
}

and while it is running, I see panics such as:

1707506081228,"thread 'tokio-runtime-worker' panicked at 'assertion failed: next.is_notified()', /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.0/src/runtime/task/state.rs:118:13"

These are very similar to those reported in #6051. We saw these more frequently prior to the fix mentioned there, but apparently they still happen sometime.

I've attached a log snippet; notes:

  • there are a couple of these panics, with stack traces
  • they are interleaved with other messages, including dump 0 completed with 41 active tasks, active task messages, and some other periodically logged stuff. So the dump does complete; it's workers in the dumped runtime that crash as it's running. The code above should be running on the main thread anyway, so if it were the panicking one, it wouldn't say thread 'tokio-runtime-worker' panicked.
  • I've redacted the actual task dumps both for brevity and because this is a proprietary app in production, but if there's something about them that would be interesting to know, I do have them, and please ask.
  • If you're wondering why the format is weird: it's a download from AWS CloudWatch in its CSV format with a timestamp as the first column, and our logging code must drop/reacquire the stderr lock mid-line as some other stuff was actually in the middle of a couple of the panic lines.

As mentioned in this comment, this is a rare production thing that I unfortunately don't know how to reproduce.

@Darksonn asked:

And you are sure that this happened in relation to a taskdump? It's not some unrelated thing going on?

Let's call it 99% sure. There are active tasks as mentioned above, and some of them were doing stuff as it happened, but the panics were just as the task dump was running, we've never seen them any other time, I don't think we do anything else strange right when we start the task dump, and the message is identical to the one from the previous task dump bug, so I'm going with the likely thing here.

pruned.csv

@scottlamb scottlamb added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Feb 13, 2024
@Darksonn Darksonn added the M-taskdump --cfg tokio_taskdump label Feb 14, 2024
@Darksonn
Copy link
Contributor

cc @jswrenn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-taskdump --cfg tokio_taskdump
Projects
None yet
Development

No branches or pull requests

2 participants