Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct way to handle trace context propagation when a context might not be present? #45

Closed
shaunbennett opened this issue Aug 8, 2023 · 3 comments

Comments

@shaunbennett
Copy link

shaunbennett commented Aug 8, 2023

edit: perhaps this should have been opened as a Discussion rather than an issue, apologies!

Greetings, I was hoping for some assistance with handling trace context propagation in cases where a trace context may or may not be present. I've tried digging through the docs and trying a few different options, but ran into issues with each case.

First of all, I should clarify what I expected the result to be:

  1. In the case that a trace context exists from Message, we utilize that trace context and continue producing spans for the distributed trace
  2. In the case where a trace context does not exist from Message, we end up generating a new trace with the parent span being handle_message

Attempt 1: Only setting the parent when a trace context is present in Message

#[instrument]
pub fn handle_message(message: Message) {
    if let Some(context) = message.trace_context {
        let context = get_text_map_propagator(|propagator| propagator.extract(context);
        Span::current().set_parent(context);
    }

    // handle the message, generate child spans
}

I haven't been able to understand why, but this case ran into a problem where calls between handle_message would all bleed into a single trace id. On another thought - I do have a higher level span that's created (at TRACE level - with TRACE not enabled), could that perhaps be why they end all up being linked together into the same TRACE when I don't propagate a context?

Attempt 2: Always attempt to extract a context, even when one might not be present

#[instrument]
pub fn handle_message(message: Message) {
    let context = get_text_map_propagator(|propagator| propagator.extract(message.tracing_context); // may be a value with no trace context fields set
    Span::current().set_parent(context);

    // handle the message, generate child spans
    generate_child_span();
}

This resolved the problem of handle_message calls without a trace context set all ending up in the same trace, but now I started running into a new issue where the handle_message span and generate_child_span end up in separate trace ids, but linked to each other. For example, it would look something like this:

| ----------------------------------| // handle_message, trace_id: 1, span_id: 1, parent_span_id: None
       |--------------------------|   // generate_child_span, trace_id: 2, span_id: 2, parent_span_id: 1

Thus, generate_child_span correctly links to parent_span_id, but ends up in its own trace_id so it doesn't display correctly.

What's the correct way of utilizing OpenTelemetrySpanExt::set_parent for these cases?

@jtescher
Copy link
Collaborator

jtescher commented Aug 8, 2023

So generally you should use strategy 2, at the boundary of your system extract the context from whichever system the request came from which will allow you to propagate other data like baggage even if the span is not recording or sampled, and will keep them all in the same trace.

It sounds like there may be a bug somewhere if you are not seeing the same trace id on the child span after setting the parent. Do you have a repo or other full example that shows number two with different trace ids?

@notheotherben
Copy link

I'm observing identical behaviour in several of my projects, here's a short repro that demonstrates the behaviour: https://github.com/notheotherben/tracing-opentelemetry-traceid-repro

Something to note here is that this only appears in scenarios where Span.set_parent is used with an empty propagation context and appears to have been introduced between the bump to OpenTelemetry 0.18 and 0.20 (sorry, this is the closest I can offer in terms of scoping as I was running a custom branch of tracing prior to 0.20).

At a glance, it seems like this may have something to do with the way that the tracing_opentelemetry::tracer::current_trace_state method leans on Context::has_active_span and then how this later gets wrapped up under Context::with_remote_span_context - I see that there were a number of changes in tokio-rs/tracing/pull/2303 which stripped out some of the handling around local vs remote parents (old:tracing-opentelemetry/src/tracer.rs#L97-102) that piques my interest, but I haven't had the chance to dig in deeply.

jtescher pushed a commit that referenced this issue Aug 23, 2023
## Motivation

Fix following issues
* #54
* #45

## Solution

* Revert https://github.com/tokio-rs/tracing-opentelemetry/pull/26/files
* When an invalid(empty) Context is extracted from the propagator and
passed to OpenTelemetrySpanExt::set_parent(), if none is set for
trace_id, root and child trace id will not match
* I can confirm that this change works in
[reproduction](https://github.com/notheotherben/tracing-opentelemetry-traceid-repro)
that @notheotherben created
* add regression test case which propagate empty context
@jtescher
Copy link
Collaborator

Resolved in #55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants