tracing: instrument time::Sleep #4072

zaharidichev · 2021-08-26T10:49:09Z

This branch instruments the Sleep resource to allow the tokio-console
to consume data about resources usage. The corresponding console branch
is here: tokio-rs/console#77

Signed-off-by: Zahari Dichev zaharidichev@gmail.com

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

Darksonn · 2021-08-26T11:08:27Z

tokio/src/time/driver/sleep.rs

+    cfg_not_trace! {
+        fn poll(mut self: Pin<&mut Self>, cx: &mut task::Context<'_>) -> Poll<Self::Output> {
+            match ready!(self.as_mut().poll_elapsed(cx)) {
+                Ok(()) => Poll::Ready(()),
+                Err(e) => panic!("timer error: {}", e),
+            }
+        }
+    }
+
+    cfg_trace! {
+        fn poll(mut self: Pin<&mut Self>, cx: &mut task::Context<'_>) -> Poll<Self::Output> {
+            let _span = self.inner.async_op_span.clone().entered();
+            match ready!(self.as_mut().poll_elapsed(cx)) {
+                Ok(()) => Poll::Ready(()),
+                Err(e) => panic!("timer error: {}", e),
+            }
        }
    }


This seems like it could be done by putting the cfg inside the function. If the macro doesn't support this, try changing it to this:

macro_rules! cfg_trace { ($($stmt:stmt;)*) => { $( #[cfg(all(tokio_unstable, feature = "tracing"))] $stmt; )* }; ($($item:item)*) => { $( #[cfg(all(tokio_unstable, feature = "tracing"))] #[cfg_attr(docsrs, doc(cfg(feature = "tracing")))] $item )* }; }

zaharidichev · 2021-08-26T15:31:22Z

Seems to be failing due to some attempt to subtract with overflow panics. Will address later today.

hawkw

Thanks for working on this! I had some thoughts on how we might want to restructure the instrumentation so that the async_op spans are more useful --- let me know what you think?

I also commented on some minor style nits and ways we can potentially make the code a little clearer, but those are lower-priority.

hawkw · 2021-08-26T16:13:36Z

tokio/src/time/driver/sleep.rs

+        pub(crate) fn new_timeout(deadline: Instant) -> Sleep {
+            let handle = Handle::current();
+            let entry = TimerEntry::new(&handle, deadline);
+            Sleep { inner: Inner {deadline}, entry }


nit: it looks like rustfmt is not formatting these because they're inside the cfg_not_trace! macro? i would expect this to be formatted like

Suggested change

Sleep { inner: Inner {deadline}, entry }

Sleep {

inner: Inner { deadline },

entry,

}

hawkw · 2021-08-26T16:14:02Z

tokio/src/time/driver/sleep.rs

+            let resource_span = tracing::trace_span!(
+                "runtime.resource",
+                concrete_type = "Sleep",
+                kind = "timer");


nit: looks like rustfmt didn't format this:

Suggested change

kind = "timer");

kind = "timer",

);

hawkw · 2021-08-26T16:14:15Z

tokio/src/time/driver/sleep.rs

+                source = "Sleep::new_timeout"
+                );


Suggested change

source = "Sleep::new_timeout"

);

source = "Sleep::new_timeout",

);

hawkw · 2021-08-26T16:15:12Z

tokio/src/time/driver/sleep.rs


-        Sleep { deadline, entry }
+            Sleep { inner: Inner {deadline, resource_span, async_op_span, time_source }, entry }


Suggested change

Sleep { inner: Inner {deadline, resource_span, async_op_span, time_source }, entry }

Sleep {

inner: Inner {

deadline,

resource_span,

async_op_span,

time_source,

},

entry,

}

hawkw · 2021-08-26T16:15:53Z

tokio/src/time/driver/sleep.rs

+            let span_guard = resource_span.enter();
+            tracing::trace!(
+                target: "tokio::resource::state_update",
+                duration = duration,
+                duration.unit = "ms",
+                duration.op = "override",
+            );
+            drop(span_guard);


nit: rather than entering and then immediately exiting the span, we can also write

Suggested change

let span_guard = resource_span.enter();

tracing::trace!(

target: "tokio::resource::state_update",

duration = duration,

duration.unit = "ms",

duration.op = "override",

);

drop(span_guard);

tracing::trace!(

parent: resource_span.id(),

target: "tokio::resource::state_update",

duration = duration,

duration.unit = "ms",

duration.op = "override",

);

In the console we are using ctx.current_span().id() to get the resource span, expecting that this span is always the current when such an event is emitted. Does setting the parent explicitly just set the parent or it also enters the span. And more importantly, which approach do we want to use, looking at the current span or looking at the explicit parent of the event?

Does setting the parent explicitly just set the parent or it also enters the span.

Setting the parent only explicitly sets the parent, so we will need to change the console code to use Event::parent or (preferably) Context::event_span instead.

And more importantly, which approach do we want to use, looking at the current span or looking at the explicit parent of the event?

It's more correct to look at the explicit parent of the event, and fall back to the current span if there is no explicit parent.

tokio/src/time/driver/sleep.rs

hawkw · 2021-08-26T16:46:19Z

tokio/src/time/driver/sleep.rs

+            let async_op_span = tracing::trace_span!(
+                "runtime.resource.async_op",
+                source = "Sleep::new_timeout"


hmm, i'm not totally sure how useful this is going to be. the Sleep resource can be created by the user-facing sleep and sleep_until functions, as well as internally by the Interval type created by interval and interval_at. I would kind of expect the async-op span to be distinguish between those different places where a Sleep might be created, rather than always having the source of "Sleep::new_timeout"...which isn't even a public API function. Since users aren't calling Sleep::new_timeout directly, it's not going to be particularly useful for them to see it in their diagnostics...

also, creating the async_op span here means that the async op will "start" as soon as the Sleep is created...but I'm not really sure if that's what we want. I think the async op's duration should probably start the first time the sleep's poll_elapsed is called.

Actually, what I think we really want is to have one async-op span that corresponds to the Future impl for Sleep, and is created the first time the future is polled, and another one that corresponds to Interval::tick/``Interval::poll_tick. We could do something like this by using [Span::none`](https://docs.rs/tracing/0.1.26/tracing/span/struct.Span.html#method.none):

When constructing the Sleep, we would initialize the async_op_span field like this

let inner = Inner { deadline, resource_span, async_op_span: tracing::Span::none(), time_source, };

and then in the Future impl for sleep, we would create the span if it is None:

impl Future for Sleep { type Output = (); fn poll(mut self: Pin<&mut Self>, cx: &mut task::Context<'_>) -> Poll<Self::Output> { // eliding some `Pin`-related noise that i'm sure would be necessary here... if self.inner.async_op_span.is_none() { self.inner.async_op_span = tracing::trace_span!( "runtime.resource.async_op", source = "Sleep", ); } let _span = self.inner.async_op_span.clone().entered(); // ... } }

Interval could manage its own async op span (which I think we'd probably want to re-create every time a new tick starts?).

What do you think?

@hawkw I think that makes sense. My initial thought was that the async op span represents the life of the future that represents the async operation. So really from the moment of instantiating the future to the time it is dropped. This is also why there is this little comment in the proto code: https://github.com/tokio-rs/console/blob/main/proto/async_ops.proto#L52. Namely, until it is polled, we do not really know which task or which resource this op is associated with. I think what you are describing makes sense, especially the suggestion around the "source" field. One reason I went with creating the span as part of the future instantiation is that I would assume that ideally we would like to be able to observe a situation where a bunch of futures are created by are never polled? WDYT?

Hmm. I think the issue is that a Sleep is not necessarily a single Future. It might also be constructed internally as part of an Interval, and I think it's important that every time the interval is ticked is considered a separate async op.

I could go either way on when the async op span should begin, but I think distinguishing between awaiting a Sleep as a future, and ticking an Interval is important.

You could have it reset the span every time Sleep::reset is called?

You could have it reset the span every time Sleep::reset is called?

Yeah, that plus maybe passing in a string constant for the "source" field to distinguish between spans created via time::sleep and time::interval would probably be sufficient!

@hawkw All of this makes me wonder whether the source field is useful at all. Passing string constants around seems a bit brittle. I wonder if there is a more elegant way to represent the fact that the sleep resource is being used by the interval? This is not the only instance of the pattern. For example RwLock internally uses a Semaphore. Should we just get rid of the "source" field an introduce a parent-child relationship between resources where a resource span can be a child of another resource span. Or event rely on the stack of entered spans to determine that and instead of relying on a field rely on the backtrace of entered spans? This way we will be able to associate the fact that this async op (and all others) that are happening on the Sleep future are being triggered by the parent Interval resource. Does that make sense? Does it sound better? The fact that we need to reset the async op span when resetting the timer is still true.

Hmm, I think that in the eventual future, establishing the relationships between higher and lower-level async operations using a span hierarchy definitely does make sense. However, it's also additional work we'd have to do in the subscriber. I think that having the source field is probably fine though, especially since adding code to the console subscriber to infer these relationships based on inspecting the span context would require additional changes.

In general, I think it's fine to add as many fields to things as we like, and not worry too much about changing them around later...as long as they're just displayed in the "fields" section, and don't have other special meanings to the console subscriber. If we do add special meanings to fields beyond just displaying them as arbitrary k/v pairs, then we need to be a little more cautious about removing/changing them, but I think it's fine to keep the source field and continue iterating on it in further branches, especially if it lets us move forward with this and come back to it later...

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

Based on some [earlier feedback][1], this PR changes the proto definition so readiness of a poll op is represented via a bool. The relevant changes have been made to: tokio-rs/tokio#4072 [1]: #77 (comment) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

hawkw

couple minor, non-blocking suggestions. let me know what you think?

hawkw · 2021-08-31T16:56:58Z

tokio/src/time/driver/sleep.rs

    pub struct Sleep {
-        deadline: Instant,
+        inner: Inner,

        // The link between the `Sleep` instance and the timer that drives it.
        #[pin]
        entry: TimerEntry,
    }
 }

+cfg_trace! {
+    #[derive(Debug)]
+    struct Inner {
+        deadline: Instant,
+        resource_span: tracing::Span,
+        async_op_span: tracing::Span,
+        time_source: ClockTime,
+    }
+}
+
+cfg_not_trace! {
+    #[derive(Debug)]
+    struct Inner {
+        deadline: Instant,
+    }
+}


thinking about this a bit more: it might be a bit more concise to move the deadline field back to the Sleep struct, and just stick the tracing-related stuff in a separate struct that's only present if the trace cfg is enabled. something like this:

pub struct Sleep { deadline: Instant, // The link between the `Sleep` instance and the timer that drives it. #[pin] entry: TimerEntry, #[cfg(all(tokio_unstable, feature = "tracing"))] trace: Trace, } // ... #[cfg(all(tokio_unstable, feature = "tracing"))] #[derive(Debug)] struct Trace { resource_span: tracing::Span, async_op_span: tracing::Span, time_source: ClockTime, }

that way, we don't need to add a separate Inner type with only one field in the case where tracing is disabled.

Of course, this only works if we use a #[cfg(...)] attribute on the field, rather than using the cfg_trace! macro, since the macro can only be used on item definitions, not fields...

Take it or leave it, of course!

this is what I was going for originally but pin_project does not play well with cfg attrs on fields.. so therefore I took this approach.

this is what I was going for originally but pin_project does not play well with cfg attrs on fields.. so therefore I took this approach.

ah, got it...this is fine, then.

hawkw · 2021-08-31T17:01:30Z

tokio/src/time/driver/sleep.rs

+        deadline: Instant,
+        resource_span: tracing::Span,
+        async_op_span: tracing::Span,
+        time_source: ClockTime,


nit/tioli: it occurs to me that we don't really need to clone the time source into the Inner, as the TimerEntry struct already owns a Handle and exposes access to it via this accessor:

tokio/tokio/src/time/driver/entry.rs

Lines 401 to 406 in f3ed064

/// Returns a TimerHandle for this timer.

pub(super) fn handle(&self) -> TimerHandle {

TimerHandle {

inner: NonNull::from(self),

}

}

so, rather than cloning the ClockTime into the struct, we could just call self.entry.handle().time_source() whenever we need to get the time. this makes the Inner struct one Instant smaller (since a ClockTime is a zero-sized Clock struct plus an Instant). OTTOH, this requires fewer pointer dereferences than accessing the time source through the handle every time. So, the current approach may actually be better. Anyway, just a thought.

But that accessor is on TimerShared not TimerEntry, which is what we have in the struct

hawkw · 2021-08-31T17:03:33Z

tokio/src/time/driver/sleep.rs

+            let now = me.inner.time_source.now();
+            let duration = me.inner.time_source.deadline_to_tick(deadline) - now;
+            let _span_guard = me.inner.resource_span.enter();
+            tracing::trace!(
+                target: "runtime::resource::state_update",
+                duration = duration,
+                duration.unit = "ms",
+                duration.op = "override",


nit/TIOLI: it would be nice to avoid the Instant::now call if the tracing event is disabled by tracing filtering. The resulting code is admittedly a bit uglier, but if we stuck the duration calculation inside the tracing macro, it would only be evaluated if the event is enabled:

Suggested change

let now = me.inner.time_source.now();

let duration = me.inner.time_source.deadline_to_tick(deadline) - now;

let _span_guard = me.inner.resource_span.enter();

tracing::trace!(

target: "runtime::resource::state_update",

duration = duration,

duration.unit = "ms",

duration.op = "override",

let _span_guard = me.inner.resource_span.enter();

tracing::trace!(

target: "runtime::resource::state_update",

duration = {

let now = me.inner.time_source.now();

me.inner.time_source.deadline_to_tick(deadline) - now

},

duration.unit = "ms",

duration.op = "override",

hawkw · 2021-08-31T17:04:57Z

tokio/src/macros/mod.rs

+#[macro_use]
+mod trace;


nit: this should probably be behind a cfg_trace!?

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

This PR changes the way poll and state update ops are associated with resource by incorporating the feedback in tokio-rs/tokio#4072 (comment) Instead of only looking at entered spans, we now use `Context::event_span` Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

hawkw

this looks good to me overall! i had a few minor style nits.

hawkw · 2021-09-14T17:58:39Z

tokio/src/time/driver/sleep.rs

+    #[cfg(tokio_track_caller)]
+    let location = Some(std::panic::Location::caller());
+    #[cfg(not(tokio_track_caller))]
+    let location = None;


nit, take it or leave it: i wonder if it's worthwhile to just stick this behind a function, so we don't have to have so many cfg(...) attributes in function bodies? maybe in util/trace.rs, we could add a

pub(crate) fn caller_location() -> Option<std::panic::Location> { #[cfg(all(tokio_track_caller, tokio_unstable, feature = "tracing"))] Some(std::panic::Location::caller()) #[cfg(not(all(tokio_track_caller, tokio_unstable, feature = "tracing")))] None }

and use this here and elsewhere. then, this could just be

Suggested change

#[cfg(tokio_track_caller)]

let location = Some(std::panic::Location::caller());

#[cfg(not(tokio_track_caller))]

let location = None;

let location = crate::util::trace::caller_location();

hawkw · 2021-09-14T17:59:31Z

tokio/src/time/driver/sleep.rs

    pub struct Sleep {
-        deadline: Instant,
+        inner: Inner,

        // The link between the `Sleep` instance and the timer that drives it.
        #[pin]
        entry: TimerEntry,
    }
 }

+cfg_trace! {
+    #[derive(Debug)]
+    struct Inner {
+        deadline: Instant,
+        resource_span: tracing::Span,
+        async_op_span: tracing::Span,
+        time_source: ClockTime,
+    }
+}
+
+cfg_not_trace! {
+    #[derive(Debug)]
+    struct Inner {
+        deadline: Instant,
+    }
+}


this is what I was going for originally but pin_project does not play well with cfg attrs on fields.. so therefore I took this approach.

ah, got it...this is fine, then.

hawkw · 2021-09-14T18:00:55Z

tokio/src/time/driver/sleep.rs

 impl Sleep {
-    pub(crate) fn new_timeout(deadline: Instant) -> Sleep {
+    #[allow(unused_variables)]


nit: this should maybe be

Suggested change

#[allow(unused_variables)]

#[cfg_attr(not(all(tokio_unstable, feature = "tracing")), allow(unused_variables)]

so that unused variables are only allowed when we're actually not using the location field...

hawkw · 2021-09-14T18:01:45Z

tokio/src/time/timeout.rs

+    #[cfg(all(tokio_track_caller, tokio_unstable, feature = "tracing"))]
+    let location = Some(Location::caller());
+    #[cfg(not(all(tokio_track_caller, tokio_unstable, feature = "tracing")))]
+    let location = None;


if we made a wrapper function for getting the callsite location, as i suggested above, this could also just be

Suggested change

#[cfg(all(tokio_track_caller, tokio_unstable, feature = "tracing"))]

let location = Some(Location::caller());

#[cfg(not(all(tokio_track_caller, tokio_unstable, feature = "tracing")))]

let location = None;

let location = crate::util::trace::caller_location();

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

seanmonstar

💯

This branch instruments the `Sleep` resource to allow the tokio-console to consume data about resources usage. The corresponding console branch is here: tokio-rs/console#77 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

instrument sleep

b1f2dde

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

zaharidichev requested review from hawkw and seanmonstar August 26, 2021 10:49

zaharidichev mentioned this pull request Aug 26, 2021

tracing: add resource instrumentation using declarative macros #3933

Closed

fmt

5a51dd2

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

Darksonn reviewed Aug 26, 2021

View reviewed changes

hawkw reviewed Aug 26, 2021

View reviewed changes

change target to runtime::...

68de623

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

zaharidichev mentioned this pull request Aug 31, 2021

proto: represent readiness as bool tokio-rs/console#103

Merged

zaharidichev added 2 commits August 31, 2021 12:32

address some feedback

3a226df

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

add cfg

bb3d8dc

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

hawkw reviewed Aug 31, 2021

View reviewed changes

zaharidichev added 2 commits September 3, 2021 12:29

Merge branch 'master' into zd/instrument-sleep

50162e4

fix overflow

a7180b1

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

zaharidichev requested a review from hawkw September 3, 2021 16:02

zaharidichev mentioned this pull request Sep 6, 2021

subscriber: look at event parents to determine resource id tokio-rs/console#114

Merged

reset async_op span on reset

b1c8f3e

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

Darksonn added A-tokio Area: The main tokio crate M-time Module: tokio/time M-tracing Tracing support in Tokio labels Sep 7, 2021

track location

148a989

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

hawkw approved these changes Sep 14, 2021

View reviewed changes

move track caller into func

ad39bb6

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

zaharidichev force-pushed the zd/instrument-sleep branch from ef58731 to ad39bb6 Compare September 14, 2021 20:27

zaharidichev added 2 commits September 15, 2021 18:06

location -> loc

02a4107

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

change source

e404e9b

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

seanmonstar approved these changes Sep 21, 2021

View reviewed changes

Merge branch 'master' into zd/instrument-sleep

4e94841

zaharidichev changed the title ~~Instrument time::Sleep~~ tracing: instrument time::Sleep Sep 22, 2021

zaharidichev merged commit b9834f6 into tokio-rs:master Sep 22, 2021

Darksonn mentioned this pull request Oct 26, 2021

chore: prepare Tokio v1.13.0 #4196

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tracing: instrument time::Sleep #4072

tracing: instrument time::Sleep #4072

zaharidichev commented Aug 26, 2021

Darksonn Aug 26, 2021

zaharidichev commented Aug 26, 2021

hawkw left a comment

hawkw Aug 26, 2021

hawkw Aug 26, 2021

hawkw Aug 26, 2021

hawkw Aug 26, 2021

hawkw Aug 26, 2021

zaharidichev Aug 30, 2021

hawkw Aug 31, 2021

hawkw Aug 26, 2021

hawkw Aug 26, 2021

zaharidichev Aug 30, 2021

hawkw Aug 31, 2021

Darksonn Aug 31, 2021

hawkw Aug 31, 2021

zaharidichev Sep 3, 2021 •

edited

hawkw Sep 15, 2021

hawkw left a comment

hawkw Aug 31, 2021 •

edited

zaharidichev Sep 3, 2021

hawkw Sep 14, 2021

hawkw Aug 31, 2021

zaharidichev Sep 3, 2021

hawkw Aug 31, 2021

hawkw Aug 31, 2021

hawkw left a comment

hawkw Sep 14, 2021

hawkw Sep 14, 2021

hawkw Sep 14, 2021

hawkw Sep 14, 2021

seanmonstar left a comment


		Sleep { deadline, entry }
		Sleep { inner: Inner {deadline, resource_span, async_op_span, time_source }, entry }

	/// Returns a TimerHandle for this timer.
	pub(super) fn handle(&self) -> TimerHandle {
	TimerHandle {
	inner: NonNull::from(self),
	}
	}

	#[allow(unused_variables)]
	#[cfg_attr(not(all(tokio_unstable, feature = "tracing")), allow(unused_variables)]

		#[macro_use]
		mod trace;

tracing: instrument time::Sleep #4072

tracing: instrument time::Sleep #4072

Conversation

zaharidichev commented Aug 26, 2021

Choose a reason for hiding this comment

zaharidichev commented Aug 26, 2021

hawkw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaharidichev Sep 3, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hawkw left a comment

Choose a reason for hiding this comment

hawkw Aug 31, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hawkw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seanmonstar left a comment

Choose a reason for hiding this comment

zaharidichev Sep 3, 2021 •

edited

hawkw Aug 31, 2021 •

edited