- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracing: instrument time::Sleep #4072
tracing: instrument time::Sleep #4072
Conversation
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
tokio/src/time/driver/sleep.rs
Outdated
cfg_not_trace! { | ||
fn poll(mut self: Pin<&mut Self>, cx: &mut task::Context<'_>) -> Poll<Self::Output> { | ||
match ready!(self.as_mut().poll_elapsed(cx)) { | ||
Ok(()) => Poll::Ready(()), | ||
Err(e) => panic!("timer error: {}", e), | ||
} | ||
} | ||
} | ||
|
||
cfg_trace! { | ||
fn poll(mut self: Pin<&mut Self>, cx: &mut task::Context<'_>) -> Poll<Self::Output> { | ||
let _span = self.inner.async_op_span.clone().entered(); | ||
match ready!(self.as_mut().poll_elapsed(cx)) { | ||
Ok(()) => Poll::Ready(()), | ||
Err(e) => panic!("timer error: {}", e), | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it could be done by putting the cfg inside the function. If the macro doesn't support this, try changing it to this:
macro_rules! cfg_trace {
($($stmt:stmt;)*) => {
$(
#[cfg(all(tokio_unstable, feature = "tracing"))]
$stmt;
)*
};
($($item:item)*) => {
$(
#[cfg(all(tokio_unstable, feature = "tracing"))]
#[cfg_attr(docsrs, doc(cfg(feature = "tracing")))]
$item
)*
};
}
Seems to be failing due to some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I had some thoughts on how we might want to restructure the instrumentation so that the async_op
spans are more useful --- let me know what you think?
I also commented on some minor style nits and ways we can potentially make the code a little clearer, but those are lower-priority.
tokio/src/time/driver/sleep.rs
Outdated
pub(crate) fn new_timeout(deadline: Instant) -> Sleep { | ||
let handle = Handle::current(); | ||
let entry = TimerEntry::new(&handle, deadline); | ||
Sleep { inner: Inner {deadline}, entry } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it looks like rustfmt
is not formatting these because they're inside the cfg_not_trace!
macro? i would expect this to be formatted like
Sleep { inner: Inner {deadline}, entry } | |
Sleep { | |
inner: Inner { deadline }, | |
entry, | |
} |
tokio/src/time/driver/sleep.rs
Outdated
let resource_span = tracing::trace_span!( | ||
"runtime.resource", | ||
concrete_type = "Sleep", | ||
kind = "timer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: looks like rustfmt didn't format this:
kind = "timer"); | |
kind = "timer", | |
); |
tokio/src/time/driver/sleep.rs
Outdated
source = "Sleep::new_timeout" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source = "Sleep::new_timeout" | |
); | |
source = "Sleep::new_timeout", | |
); |
tokio/src/time/driver/sleep.rs
Outdated
|
||
Sleep { deadline, entry } | ||
Sleep { inner: Inner {deadline, resource_span, async_op_span, time_source }, entry } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sleep { inner: Inner {deadline, resource_span, async_op_span, time_source }, entry } | |
Sleep { | |
inner: Inner { | |
deadline, | |
resource_span, | |
async_op_span, | |
time_source, | |
}, | |
entry, | |
} |
tokio/src/time/driver/sleep.rs
Outdated
let span_guard = resource_span.enter(); | ||
tracing::trace!( | ||
target: "tokio::resource::state_update", | ||
duration = duration, | ||
duration.unit = "ms", | ||
duration.op = "override", | ||
); | ||
drop(span_guard); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rather than entering and then immediately exiting the span, we can also write
let span_guard = resource_span.enter(); | |
tracing::trace!( | |
target: "tokio::resource::state_update", | |
duration = duration, | |
duration.unit = "ms", | |
duration.op = "override", | |
); | |
drop(span_guard); | |
tracing::trace!( | |
parent: resource_span.id(), | |
target: "tokio::resource::state_update", | |
duration = duration, | |
duration.unit = "ms", | |
duration.op = "override", | |
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the console we are using ctx.current_span().id()
to get the resource span, expecting that this span is always the current when such an event is emitted. Does setting the parent explicitly just set the parent or it also enters the span. And more importantly, which approach do we want to use, looking at the current span or looking at the explicit parent of the event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does setting the parent explicitly just set the parent or it also enters the span.
Setting the parent only explicitly sets the parent, so we will need to change the console code to use Event::parent
or (preferably) Context::event_span
instead.
And more importantly, which approach do we want to use, looking at the current span or looking at the explicit parent of the event?
It's more correct to look at the explicit parent of the event, and fall back to the current span if there is no explicit parent.
tokio/src/time/driver/sleep.rs
Outdated
let async_op_span = tracing::trace_span!( | ||
"runtime.resource.async_op", | ||
source = "Sleep::new_timeout" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, i'm not totally sure how useful this is going to be. the Sleep
resource can be created by the user-facing sleep
and sleep_until
functions, as well as internally by the Interval
type created by interval
and interval_at
. I would kind of expect the async-op span to be distinguish between those different places where a Sleep
might be created, rather than always having the source of "Sleep::new_timeout"...which isn't even a public API function. Since users aren't calling Sleep::new_timeout
directly, it's not going to be particularly useful for them to see it in their diagnostics...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, creating the async_op
span here means that the async op will "start" as soon as the Sleep
is created...but I'm not really sure if that's what we want. I think the async op's duration should probably start the first time the sleep's poll_elapsed
is called.
Actually, what I think we really want is to have one async-op span that corresponds to the Future
impl for Sleep
, and is created the first time the future is polled, and another one that corresponds to Interval::tick
/``Interval::poll_tick. We could do something like this by using [
Span::none`](https://docs.rs/tracing/0.1.26/tracing/span/struct.Span.html#method.none):
When constructing the Sleep
, we would initialize the async_op_span
field like this
let inner = Inner {
deadline,
resource_span,
async_op_span: tracing::Span::none(),
time_source,
};
and then in the Future
impl for sleep, we would create the span if it is None
:
impl Future for Sleep {
type Output = ();
fn poll(mut self: Pin<&mut Self>, cx: &mut task::Context<'_>) -> Poll<Self::Output> {
// eliding some `Pin`-related noise that i'm sure would be necessary here...
if self.inner.async_op_span.is_none() {
self.inner.async_op_span = tracing::trace_span!(
"runtime.resource.async_op",
source = "Sleep",
);
}
let _span = self.inner.async_op_span.clone().entered();
// ...
}
}
Interval
could manage its own async op span (which I think we'd probably want to re-create every time a new tick starts?).
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hawkw I think that makes sense. My initial thought was that the async op span represents the life of the future that represents the async operation. So really from the moment of instantiating the future to the time it is dropped. This is also why there is this little comment in the proto code: https://github.com/tokio-rs/console/blob/main/proto/async_ops.proto#L52. Namely, until it is polled, we do not really know which task or which resource this op is associated with. I think what you are describing makes sense, especially the suggestion around the "source" field. One reason I went with creating the span as part of the future instantiation is that I would assume that ideally we would like to be able to observe a situation where a bunch of futures are created by are never polled? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I think the issue is that a Sleep
is not necessarily a single Future
. It might also be constructed internally as part of an Interval
, and I think it's important that every time the interval is tick
ed is considered a separate async op.
I could go either way on when the async op span should begin, but I think distinguishing between awaiting a Sleep
as a future, and tick
ing an Interval
is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have it reset the span every time Sleep::reset
is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could have it reset the span every time
Sleep::reset
is called?
Yeah, that plus maybe passing in a string constant for the "source" field to distinguish between spans created via time::sleep
and time::interval
would probably be sufficient!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hawkw All of this makes me wonder whether the source field is useful at all. Passing string constants around seems a bit brittle. I wonder if there is a more elegant way to represent the fact that the sleep resource is being used by the interval? This is not the only instance of the pattern. For example RwLock
internally uses a Semaphore
. Should we just get rid of the "source" field an introduce a parent-child relationship between resources where a resource span can be a child of another resource span. Or event rely on the stack of entered spans to determine that and instead of relying on a field rely on the backtrace of entered spans? This way we will be able to associate the fact that this async op (and all others) that are happening on the Sleep future are being triggered by the parent Interval
resource. Does that make sense? Does it sound better? The fact that we need to reset the async op span when resetting the timer is still true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think that in the eventual future, establishing the relationships between higher and lower-level async operations using a span hierarchy definitely does make sense. However, it's also additional work we'd have to do in the subscriber. I think that having the source
field is probably fine though, especially since adding code to the console subscriber to infer these relationships based on inspecting the span context would require additional changes.
In general, I think it's fine to add as many fields to things as we like, and not worry too much about changing them around later...as long as they're just displayed in the "fields" section, and don't have other special meanings to the console subscriber. If we do add special meanings to fields beyond just displaying them as arbitrary k/v pairs, then we need to be a little more cautious about removing/changing them, but I think it's fine to keep the source
field and continue iterating on it in further branches, especially if it lets us move forward with this and come back to it later...
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Based on some [earlier feedback][1], this PR changes the proto definition so readiness of a poll op is represented via a bool. The relevant changes have been made to: tokio-rs/tokio#4072 [1]: #77 (comment) Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple minor, non-blocking suggestions. let me know what you think?
pub struct Sleep { | ||
deadline: Instant, | ||
inner: Inner, | ||
|
||
// The link between the `Sleep` instance and the timer that drives it. | ||
#[pin] | ||
entry: TimerEntry, | ||
} | ||
} | ||
|
||
cfg_trace! { | ||
#[derive(Debug)] | ||
struct Inner { | ||
deadline: Instant, | ||
resource_span: tracing::Span, | ||
async_op_span: tracing::Span, | ||
time_source: ClockTime, | ||
} | ||
} | ||
|
||
cfg_not_trace! { | ||
#[derive(Debug)] | ||
struct Inner { | ||
deadline: Instant, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about this a bit more: it might be a bit more concise to move the deadline
field back to the Sleep
struct, and just stick the tracing
-related stuff in a separate struct that's only present if the trace
cfg is enabled. something like this:
pub struct Sleep {
deadline: Instant,
// The link between the `Sleep` instance and the timer that drives it.
#[pin]
entry: TimerEntry,
#[cfg(all(tokio_unstable, feature = "tracing"))]
trace: Trace,
}
// ...
#[cfg(all(tokio_unstable, feature = "tracing"))]
#[derive(Debug)]
struct Trace {
resource_span: tracing::Span,
async_op_span: tracing::Span,
time_source: ClockTime,
}
that way, we don't need to add a separate Inner
type with only one field in the case where tracing is disabled.
Of course, this only works if we use a #[cfg(...)]
attribute on the field, rather than using the cfg_trace!
macro, since the macro can only be used on item definitions, not fields...
Take it or leave it, of course!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what I was going for originally but pin_project
does not play well with cfg
attrs on fields.. so therefore I took this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what I was going for originally but
pin_project
does not play well withcfg
attrs on fields.. so therefore I took this approach.
ah, got it...this is fine, then.
deadline: Instant, | ||
resource_span: tracing::Span, | ||
async_op_span: tracing::Span, | ||
time_source: ClockTime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/tioli: it occurs to me that we don't really need to clone the time source into the Inner
, as the TimerEntry
struct already owns a Handle
and exposes access to it via this accessor:
tokio/tokio/src/time/driver/entry.rs
Lines 401 to 406 in f3ed064
/// Returns a TimerHandle for this timer. | |
pub(super) fn handle(&self) -> TimerHandle { | |
TimerHandle { | |
inner: NonNull::from(self), | |
} | |
} |
so, rather than cloning the ClockTime
into the struct, we could just call self.entry.handle().time_source()
whenever we need to get the time. this makes the Inner
struct one Instant
smaller (since a ClockTime
is a zero-sized Clock
struct plus an Instant
). OTTOH, this requires fewer pointer dereferences than accessing the time source through the handle every time. So, the current approach may actually be better. Anyway, just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that accessor is on TimerShared
not TimerEntry
, which is what we have in the struct
tokio/src/time/driver/sleep.rs
Outdated
let now = me.inner.time_source.now(); | ||
let duration = me.inner.time_source.deadline_to_tick(deadline) - now; | ||
let _span_guard = me.inner.resource_span.enter(); | ||
tracing::trace!( | ||
target: "runtime::resource::state_update", | ||
duration = duration, | ||
duration.unit = "ms", | ||
duration.op = "override", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/TIOLI: it would be nice to avoid the Instant::now
call if the tracing
event is disabled by tracing filtering. The resulting code is admittedly a bit uglier, but if we stuck the duration calculation inside the tracing
macro, it would only be evaluated if the event is enabled:
let now = me.inner.time_source.now(); | |
let duration = me.inner.time_source.deadline_to_tick(deadline) - now; | |
let _span_guard = me.inner.resource_span.enter(); | |
tracing::trace!( | |
target: "runtime::resource::state_update", | |
duration = duration, | |
duration.unit = "ms", | |
duration.op = "override", | |
let _span_guard = me.inner.resource_span.enter(); | |
tracing::trace!( | |
target: "runtime::resource::state_update", | |
duration = { | |
let now = me.inner.time_source.now(); | |
me.inner.time_source.deadline_to_tick(deadline) - now | |
}, | |
duration.unit = "ms", | |
duration.op = "override", |
tokio/src/macros/mod.rs
Outdated
#[macro_use] | ||
mod trace; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this should probably be behind a cfg_trace!
?
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This PR changes the way poll and state update ops are associated with resource by incorporating the feedback in tokio-rs/tokio#4072 (comment) Instead of only looking at entered spans, we now use `Context::event_span` Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good to me overall! i had a few minor style nits.
tokio/src/time/driver/sleep.rs
Outdated
#[cfg(tokio_track_caller)] | ||
let location = Some(std::panic::Location::caller()); | ||
#[cfg(not(tokio_track_caller))] | ||
let location = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, take it or leave it: i wonder if it's worthwhile to just stick this behind a function, so we don't have to have so many cfg(...)
attributes in function bodies? maybe in util/trace.rs
, we could add a
pub(crate) fn caller_location() -> Option<std::panic::Location> {
#[cfg(all(tokio_track_caller, tokio_unstable, feature = "tracing"))]
Some(std::panic::Location::caller())
#[cfg(not(all(tokio_track_caller, tokio_unstable, feature = "tracing")))]
None
}
and use this here and elsewhere. then, this could just be
#[cfg(tokio_track_caller)] | |
let location = Some(std::panic::Location::caller()); | |
#[cfg(not(tokio_track_caller))] | |
let location = None; | |
let location = crate::util::trace::caller_location(); |
pub struct Sleep { | ||
deadline: Instant, | ||
inner: Inner, | ||
|
||
// The link between the `Sleep` instance and the timer that drives it. | ||
#[pin] | ||
entry: TimerEntry, | ||
} | ||
} | ||
|
||
cfg_trace! { | ||
#[derive(Debug)] | ||
struct Inner { | ||
deadline: Instant, | ||
resource_span: tracing::Span, | ||
async_op_span: tracing::Span, | ||
time_source: ClockTime, | ||
} | ||
} | ||
|
||
cfg_not_trace! { | ||
#[derive(Debug)] | ||
struct Inner { | ||
deadline: Instant, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what I was going for originally but
pin_project
does not play well withcfg
attrs on fields.. so therefore I took this approach.
ah, got it...this is fine, then.
tokio/src/time/driver/sleep.rs
Outdated
impl Sleep { | ||
pub(crate) fn new_timeout(deadline: Instant) -> Sleep { | ||
#[allow(unused_variables)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this should maybe be
#[allow(unused_variables)] | |
#[cfg_attr(not(all(tokio_unstable, feature = "tracing")), allow(unused_variables)] |
so that unused variables are only allowed when we're actually not using the location
field...
tokio/src/time/timeout.rs
Outdated
#[cfg(all(tokio_track_caller, tokio_unstable, feature = "tracing"))] | ||
let location = Some(Location::caller()); | ||
#[cfg(not(all(tokio_track_caller, tokio_unstable, feature = "tracing")))] | ||
let location = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we made a wrapper function for getting the callsite location, as i suggested above, this could also just be
#[cfg(all(tokio_track_caller, tokio_unstable, feature = "tracing"))] | |
let location = Some(Location::caller()); | |
#[cfg(not(all(tokio_track_caller, tokio_unstable, feature = "tracing")))] | |
let location = None; | |
let location = crate::util::trace::caller_location(); |
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
ef58731
to
ad39bb6
Compare
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
This branch instruments the `Sleep` resource to allow the tokio-console to consume data about resources usage. The corresponding console branch is here: tokio-rs/console#77 Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
This branch instruments the
Sleep
resource to allow the tokio-consoleto consume data about resources usage. The corresponding console branch
is here: tokio-rs/console#77
Signed-off-by: Zahari Dichev zaharidichev@gmail.com