Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: instrument time::Sleep #4072

Merged
merged 13 commits into from Sep 22, 2021

Conversation

zaharidichev
Copy link
Contributor

This branch instruments the Sleep resource to allow the tokio-console
to consume data about resources usage. The corresponding console branch
is here: tokio-rs/console#77

Signed-off-by: Zahari Dichev zaharidichev@gmail.com

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Comment on lines 398 to 415
cfg_not_trace! {
fn poll(mut self: Pin<&mut Self>, cx: &mut task::Context<'_>) -> Poll<Self::Output> {
match ready!(self.as_mut().poll_elapsed(cx)) {
Ok(()) => Poll::Ready(()),
Err(e) => panic!("timer error: {}", e),
}
}
}

cfg_trace! {
fn poll(mut self: Pin<&mut Self>, cx: &mut task::Context<'_>) -> Poll<Self::Output> {
let _span = self.inner.async_op_span.clone().entered();
match ready!(self.as_mut().poll_elapsed(cx)) {
Ok(()) => Poll::Ready(()),
Err(e) => panic!("timer error: {}", e),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it could be done by putting the cfg inside the function. If the macro doesn't support this, try changing it to this:

macro_rules! cfg_trace {
    ($($stmt:stmt;)*) => {
        $(
        #[cfg(all(tokio_unstable, feature = "tracing"))]
        $stmt;
        )*
    };
    ($($item:item)*) => {
        $(
            #[cfg(all(tokio_unstable, feature = "tracing"))]
            #[cfg_attr(docsrs, doc(cfg(feature = "tracing")))]
            $item
        )*
    };
}

@zaharidichev
Copy link
Contributor Author

Seems to be failing due to some attempt to subtract with overflow panics. Will address later today.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I had some thoughts on how we might want to restructure the instrumentation so that the async_op spans are more useful --- let me know what you think?

I also commented on some minor style nits and ways we can potentially make the code a little clearer, but those are lower-priority.

pub(crate) fn new_timeout(deadline: Instant) -> Sleep {
let handle = Handle::current();
let entry = TimerEntry::new(&handle, deadline);
Sleep { inner: Inner {deadline}, entry }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it looks like rustfmt is not formatting these because they're inside the cfg_not_trace! macro? i would expect this to be formatted like

Suggested change
Sleep { inner: Inner {deadline}, entry }
Sleep {
inner: Inner { deadline },
entry,
}

let resource_span = tracing::trace_span!(
"runtime.resource",
concrete_type = "Sleep",
kind = "timer");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like rustfmt didn't format this:

Suggested change
kind = "timer");
kind = "timer",
);

Comment on lines 237 to 238
source = "Sleep::new_timeout"
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
source = "Sleep::new_timeout"
);
source = "Sleep::new_timeout",
);


Sleep { deadline, entry }
Sleep { inner: Inner {deadline, resource_span, async_op_span, time_source }, entry }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Sleep { inner: Inner {deadline, resource_span, async_op_span, time_source }, entry }
Sleep {
inner: Inner {
deadline,
resource_span,
async_op_span,
time_source,
},
entry,
}

Comment on lines 241 to 248
let span_guard = resource_span.enter();
tracing::trace!(
target: "tokio::resource::state_update",
duration = duration,
duration.unit = "ms",
duration.op = "override",
);
drop(span_guard);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rather than entering and then immediately exiting the span, we can also write

Suggested change
let span_guard = resource_span.enter();
tracing::trace!(
target: "tokio::resource::state_update",
duration = duration,
duration.unit = "ms",
duration.op = "override",
);
drop(span_guard);
tracing::trace!(
parent: resource_span.id(),
target: "tokio::resource::state_update",
duration = duration,
duration.unit = "ms",
duration.op = "override",
);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the console we are using ctx.current_span().id() to get the resource span, expecting that this span is always the current when such an event is emitted. Does setting the parent explicitly just set the parent or it also enters the span. And more importantly, which approach do we want to use, looking at the current span or looking at the explicit parent of the event?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does setting the parent explicitly just set the parent or it also enters the span.

Setting the parent only explicitly sets the parent, so we will need to change the console code to use Event::parent or (preferably) Context::event_span instead.

And more importantly, which approach do we want to use, looking at the current span or looking at the explicit parent of the event?

It's more correct to look at the explicit parent of the event, and fall back to the current span if there is no explicit parent.

tokio/src/time/driver/sleep.rs Show resolved Hide resolved
tokio/src/time/driver/sleep.rs Outdated Show resolved Hide resolved
tokio/src/time/driver/sleep.rs Outdated Show resolved Hide resolved
tokio/src/time/driver/sleep.rs Outdated Show resolved Hide resolved
Comment on lines 235 to 237
let async_op_span = tracing::trace_span!(
"runtime.resource.async_op",
source = "Sleep::new_timeout"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i'm not totally sure how useful this is going to be. the Sleep resource can be created by the user-facing sleep and sleep_until functions, as well as internally by the Interval type created by interval and interval_at. I would kind of expect the async-op span to be distinguish between those different places where a Sleep might be created, rather than always having the source of "Sleep::new_timeout"...which isn't even a public API function. Since users aren't calling Sleep::new_timeout directly, it's not going to be particularly useful for them to see it in their diagnostics...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, creating the async_op span here means that the async op will "start" as soon as the Sleep is created...but I'm not really sure if that's what we want. I think the async op's duration should probably start the first time the sleep's poll_elapsed is called.

Actually, what I think we really want is to have one async-op span that corresponds to the Future impl for Sleep, and is created the first time the future is polled, and another one that corresponds to Interval::tick/``Interval::poll_tick. We could do something like this by using [Span::none`](https://docs.rs/tracing/0.1.26/tracing/span/struct.Span.html#method.none):

When constructing the Sleep, we would initialize the async_op_span field like this

let inner = Inner {
    deadline,
    resource_span,
    async_op_span: tracing::Span::none(),
    time_source,
};

and then in the Future impl for sleep, we would create the span if it is None:

impl Future for Sleep {
    type Output = ();

    fn poll(mut self: Pin<&mut Self>, cx: &mut task::Context<'_>) -> Poll<Self::Output> {
        // eliding some `Pin`-related noise that i'm sure would be necessary here...
        if self.inner.async_op_span.is_none() {
            self.inner.async_op_span = tracing::trace_span!(
                "runtime.resource.async_op",
                source = "Sleep",
            );
        }
        let _span = self.inner.async_op_span.clone().entered();
        // ...
    }
}

Interval could manage its own async op span (which I think we'd probably want to re-create every time a new tick starts?).

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hawkw I think that makes sense. My initial thought was that the async op span represents the life of the future that represents the async operation. So really from the moment of instantiating the future to the time it is dropped. This is also why there is this little comment in the proto code: https://github.com/tokio-rs/console/blob/main/proto/async_ops.proto#L52. Namely, until it is polled, we do not really know which task or which resource this op is associated with. I think what you are describing makes sense, especially the suggestion around the "source" field. One reason I went with creating the span as part of the future instantiation is that I would assume that ideally we would like to be able to observe a situation where a bunch of futures are created by are never polled? WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I think the issue is that a Sleep is not necessarily a single Future. It might also be constructed internally as part of an Interval, and I think it's important that every time the interval is ticked is considered a separate async op.

I could go either way on when the async op span should begin, but I think distinguishing between awaiting a Sleep as a future, and ticking an Interval is important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have it reset the span every time Sleep::reset is called?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could have it reset the span every time Sleep::reset is called?

Yeah, that plus maybe passing in a string constant for the "source" field to distinguish between spans created via time::sleep and time::interval would probably be sufficient!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hawkw All of this makes me wonder whether the source field is useful at all. Passing string constants around seems a bit brittle. I wonder if there is a more elegant way to represent the fact that the sleep resource is being used by the interval? This is not the only instance of the pattern. For example RwLock internally uses a Semaphore. Should we just get rid of the "source" field an introduce a parent-child relationship between resources where a resource span can be a child of another resource span. Or event rely on the stack of entered spans to determine that and instead of relying on a field rely on the backtrace of entered spans? This way we will be able to associate the fact that this async op (and all others) that are happening on the Sleep future are being triggered by the parent Interval resource. Does that make sense? Does it sound better? The fact that we need to reset the async op span when resetting the timer is still true.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think that in the eventual future, establishing the relationships between higher and lower-level async operations using a span hierarchy definitely does make sense. However, it's also additional work we'd have to do in the subscriber. I think that having the source field is probably fine though, especially since adding code to the console subscriber to infer these relationships based on inspecting the span context would require additional changes.

In general, I think it's fine to add as many fields to things as we like, and not worry too much about changing them around later...as long as they're just displayed in the "fields" section, and don't have other special meanings to the console subscriber. If we do add special meanings to fields beyond just displaying them as arbitrary k/v pairs, then we need to be a little more cautious about removing/changing them, but I think it's fine to keep the source field and continue iterating on it in further branches, especially if it lets us move forward with this and come back to it later...

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
hawkw pushed a commit to tokio-rs/console that referenced this pull request Aug 31, 2021
Based on some [earlier feedback][1], this PR changes the proto
definition so readiness of a poll op is represented via a bool. The
relevant changes have been made to:
tokio-rs/tokio#4072

[1]: #77 (comment)

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple minor, non-blocking suggestions. let me know what you think?

Comment on lines 188 to +212
pub struct Sleep {
deadline: Instant,
inner: Inner,

// The link between the `Sleep` instance and the timer that drives it.
#[pin]
entry: TimerEntry,
}
}

cfg_trace! {
#[derive(Debug)]
struct Inner {
deadline: Instant,
resource_span: tracing::Span,
async_op_span: tracing::Span,
time_source: ClockTime,
}
}

cfg_not_trace! {
#[derive(Debug)]
struct Inner {
deadline: Instant,
}
}
Copy link
Member

@hawkw hawkw Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this a bit more: it might be a bit more concise to move the deadline field back to the Sleep struct, and just stick the tracing-related stuff in a separate struct that's only present if the trace cfg is enabled. something like this:

    pub struct Sleep {
        deadline: Instant,

        // The link between the `Sleep` instance and the timer that drives it.
        #[pin]
        entry: TimerEntry,

         #[cfg(all(tokio_unstable, feature = "tracing"))]
        trace: Trace,
    }
    
// ...

 #[cfg(all(tokio_unstable, feature = "tracing"))]
#[derive(Debug)]
struct Trace {
    resource_span: tracing::Span,
    async_op_span: tracing::Span,
    time_source: ClockTime,
}

that way, we don't need to add a separate Inner type with only one field in the case where tracing is disabled.

Of course, this only works if we use a #[cfg(...)] attribute on the field, rather than using the cfg_trace! macro, since the macro can only be used on item definitions, not fields...

Take it or leave it, of course!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what I was going for originally but pin_project does not play well with cfg attrs on fields.. so therefore I took this approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what I was going for originally but pin_project does not play well with cfg attrs on fields.. so therefore I took this approach.

ah, got it...this is fine, then.

deadline: Instant,
resource_span: tracing::Span,
async_op_span: tracing::Span,
time_source: ClockTime,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/tioli: it occurs to me that we don't really need to clone the time source into the Inner, as the TimerEntry struct already owns a Handle and exposes access to it via this accessor:

/// Returns a TimerHandle for this timer.
pub(super) fn handle(&self) -> TimerHandle {
TimerHandle {
inner: NonNull::from(self),
}
}

so, rather than cloning the ClockTime into the struct, we could just call self.entry.handle().time_source() whenever we need to get the time. this makes the Inner struct one Instant smaller (since a ClockTime is a zero-sized Clock struct plus an Instant). OTTOH, this requires fewer pointer dereferences than accessing the time source through the handle every time. So, the current approach may actually be better. Anyway, just a thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that accessor is on TimerShared not TimerEntry, which is what we have in the struct

Comment on lines 308 to 315
let now = me.inner.time_source.now();
let duration = me.inner.time_source.deadline_to_tick(deadline) - now;
let _span_guard = me.inner.resource_span.enter();
tracing::trace!(
target: "runtime::resource::state_update",
duration = duration,
duration.unit = "ms",
duration.op = "override",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/TIOLI: it would be nice to avoid the Instant::now call if the tracing event is disabled by tracing filtering. The resulting code is admittedly a bit uglier, but if we stuck the duration calculation inside the tracing macro, it would only be evaluated if the event is enabled:

Suggested change
let now = me.inner.time_source.now();
let duration = me.inner.time_source.deadline_to_tick(deadline) - now;
let _span_guard = me.inner.resource_span.enter();
tracing::trace!(
target: "runtime::resource::state_update",
duration = duration,
duration.unit = "ms",
duration.op = "override",
let _span_guard = me.inner.resource_span.enter();
tracing::trace!(
target: "runtime::resource::state_update",
duration = {
let now = me.inner.time_source.now();
me.inner.time_source.deadline_to_tick(deadline) - now
},
duration.unit = "ms",
duration.op = "override",

Comment on lines 18 to 19
#[macro_use]
mod trace;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should probably be behind a cfg_trace!?

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
hawkw pushed a commit to tokio-rs/console that referenced this pull request Sep 6, 2021
This PR changes the way poll and state update ops are associated
with resource by incorporating the feedback in tokio-rs/tokio#4072 (comment)
Instead of only looking at entered spans, we now use `Context::event_span`

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
@Darksonn Darksonn added A-tokio Area: The main tokio crate M-time Module: tokio/time M-tracing Tracing support in Tokio labels Sep 7, 2021
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me overall! i had a few minor style nits.

Comment on lines 99 to 102
#[cfg(tokio_track_caller)]
let location = Some(std::panic::Location::caller());
#[cfg(not(tokio_track_caller))]
let location = None;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take it or leave it: i wonder if it's worthwhile to just stick this behind a function, so we don't have to have so many cfg(...) attributes in function bodies? maybe in util/trace.rs, we could add a

pub(crate) fn caller_location() -> Option<std::panic::Location> {
    #[cfg(all(tokio_track_caller, tokio_unstable, feature = "tracing"))]
    Some(std::panic::Location::caller())
    #[cfg(not(all(tokio_track_caller, tokio_unstable, feature = "tracing")))]
    None
}

and use this here and elsewhere. then, this could just be

Suggested change
#[cfg(tokio_track_caller)]
let location = Some(std::panic::Location::caller());
#[cfg(not(tokio_track_caller))]
let location = None;
let location = crate::util::trace::caller_location();

Comment on lines 188 to +212
pub struct Sleep {
deadline: Instant,
inner: Inner,

// The link between the `Sleep` instance and the timer that drives it.
#[pin]
entry: TimerEntry,
}
}

cfg_trace! {
#[derive(Debug)]
struct Inner {
deadline: Instant,
resource_span: tracing::Span,
async_op_span: tracing::Span,
time_source: ClockTime,
}
}

cfg_not_trace! {
#[derive(Debug)]
struct Inner {
deadline: Instant,
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what I was going for originally but pin_project does not play well with cfg attrs on fields.. so therefore I took this approach.

ah, got it...this is fine, then.

impl Sleep {
pub(crate) fn new_timeout(deadline: Instant) -> Sleep {
#[allow(unused_variables)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should maybe be

Suggested change
#[allow(unused_variables)]
#[cfg_attr(not(all(tokio_unstable, feature = "tracing")), allow(unused_variables)]

so that unused variables are only allowed when we're actually not using the location field...

Comment on lines 56 to 59
#[cfg(all(tokio_track_caller, tokio_unstable, feature = "tracing"))]
let location = Some(Location::caller());
#[cfg(not(all(tokio_track_caller, tokio_unstable, feature = "tracing")))]
let location = None;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we made a wrapper function for getting the callsite location, as i suggested above, this could also just be

Suggested change
#[cfg(all(tokio_track_caller, tokio_unstable, feature = "tracing"))]
let location = Some(Location::caller());
#[cfg(not(all(tokio_track_caller, tokio_unstable, feature = "tracing")))]
let location = None;
let location = crate::util::trace::caller_location();

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Copy link
Member

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@zaharidichev zaharidichev changed the title Instrument time::Sleep tracing: instrument time::Sleep Sep 22, 2021
@zaharidichev zaharidichev merged commit b9834f6 into tokio-rs:master Sep 22, 2021
suikammd pushed a commit to suikammd/tokio that referenced this pull request Oct 7, 2021
This branch instruments the `Sleep` resource to allow the tokio-console
to consume data about resources usage. The corresponding console branch
is here: tokio-rs/console#77

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-time Module: tokio/time M-tracing Tracing support in Tokio
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants