Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporters should not be sent "is_sampled=false" spans #870

Closed
fasterthanlime opened this issue Aug 30, 2022 · 6 comments · Fixed by #871
Closed

Exporters should not be sent "is_sampled=false" spans #870

fasterthanlime opened this issue Aug 30, 2022 · 6 comments · Fixed by #871
Labels
A-trace Area: issues related to tracing bug Something isn't working

Comments

@fasterthanlime
Copy link
Contributor

I have a custom ShouldSample implemetation. I ran into #800 and am glad to see #839 is almost there.

But there's a remaining problem afaict. As per the opentelemetry spec:

Sampled flag in TraceFlags on SpanContext. This flag is propagated via the SpanContext to child Spans. For more details see the W3C Trace Context specification. This flag indicates that the Span has been sampled and will be exported.

Span Exporters MUST receive those spans which have Sampled flag set to true and they SHOULD NOT receive the ones that do not.

The problem is that, in the opentelemetry crate, span exporters are simply span processors. Span::ensure_ended_and_exported sends export data to all processors unconditionally:

match provider.span_processors().as_slice() {
[] => {}
[processor] => {
processor.on_end(build_export_data(
data,
self.span_context.clone(),
provider.config().resource.clone(),
&self.tracer,
));
}
processors => {
let config = provider.config();
for processor in processors {
processor.on_end(build_export_data(
data.clone(),
self.span_context.clone(),
config.resource.clone(),
&self.tracer,
));
}
}
}
}

(..unless the span was already exported, or the provider was shut down). It doesn't check for self.span_context().is_sampled(), and can't tell the difference between exporters and other span processors anyway.

Here's what happens whe installing a simple exporter:

/// The `SpanExporter` that this provider should use.
pub fn with_simple_exporter<T: SpanExporter + 'static>(self, exporter: T) -> Self {
let mut processors = self.processors;
processors.push(Box::new(SimpleSpanProcessor::new(Box::new(exporter))));
Builder { processors, ..self }
}

The SimpleSpanProcessor unconditionally sends to its internal channel:

fn on_end(&self, span: SpanData) {
if let Err(err) = self.sender.send(Some(span)) {
global::handle_error(TraceError::from(format!("error processing span {:?}", err)));
}
}

Which unconditionally calls exporter.export:

if let Err(err) = futures_executor::block_on(exporter.export(vec![span])) {
global::handle_error(err);
}

So, this seems like another departure from the spec (which was very surprising to me).


A bad fix for this is to teach both the "simple" and "batched" exporters to ignore spans where is_sampled=false. This is bad for a bunch of reasons:

  • Other exporter crates implement their own processors, which might not know they have to filter
  • It's really not the responsibility of exporters to do filtering
  • build_export_data is not free, calling it for a non-sampled span is wasteful

Another fix is to not send the span to any processor if it's not sampled - ie, to return early from ensure_ended_and_exported if is_sampled=false. This also seems bad because what if there's processors that aren't exporters?

Again, the spec says:

  • span processors must receive spans with is_recording=true, regardless of is_sampled
  • span exporters must only receive spans with is_recording=true && is_sampled=true

It seems like the actual fix is to either:

  • keep separate lists for "processors" and "exporters"
  • add an "fn is_exporter(&self) -> bool" method to SpanProcessor - this seems expensive for no good reason

Either way this seems like a breaking change.

@TommyCpp TommyCpp added A-trace Area: issues related to tracing bug Something isn't working labels Sep 2, 2022
@TommyCpp
Copy link
Contributor

TommyCpp commented Sep 2, 2022

Yeah, I think this is a regression that happened when we switch to async exporters. Good catch!
I am not sure why

in the opentelemetry crate, span exporters are simply span processors.

in opentelemetry crate, SpanExrpoter is the span exporters. The internal queue inside the span processors are to batch the spans

So I think we can simply check the spans before it sends to the internal queues in span processors.

@TommyCpp TommyCpp self-assigned this Sep 2, 2022
@fasterthanlime
Copy link
Contributor Author

in opentelemetry crate, SpanExporter is the span exporters.

That's a trait, though. Third-party exporters like https://github.com/ramosbugs/opentelemetry-honeycomb-rs implement both SpanExporter and SpanProcessor on their own types, and would require a separate fix if that's how you want to approach it.

Also, simply filtering in the two built-in {Simple,Batch}SpanProcessor types would still create "exported span data" for every span, even if it's not sampled, so it's not my preferred solution (but I'll take any improvements in correctness I can get!)

@TommyCpp
Copy link
Contributor

TommyCpp commented Sep 2, 2022

That's a trait, though.

Yeah the idea is people can bring their own exporter and those exporter will be able to work with BatchSpanProcessor or SimpleSpanProcessor in SDK.

Third-party exporters like https://github.com/ramosbugs/opentelemetry-honeycomb-rs implement both SpanExporter and SpanProcessor on their own types

Haven't look into that repo much but I think the reason they build their own Processor is to have a custom function before span starts.

Also, simply filtering in the two built-in {Simple,Batch}SpanProcessor types would still create "exported span data" for every span, even if it's not sampled, so it's not my preferred solution

In the spec, if a span is recording but not sampled then the SpanProcessor should receive them but it shouldn't pass it down to the SpanExporter

If a span is not recording then we will drop it in Tracers.

I think that the reason this bug stays undetected is even if we export it to the backend most of them just gonna drop the span without a sample flag

@jtescher
Copy link
Member

jtescher commented Sep 2, 2022

Yeah looks like this was a regression at some point. Thanks for the detailed report @fasterthanlime!

@jtescher
Copy link
Member

jtescher commented Sep 2, 2022

A bad fix for this is to teach both the "simple" and "batched" exporters to ignore spans where is_sampled=false. This is bad for a bunch of reasons:

  • Other exporter crates implement their own processors, which might not know they have to filter
  • It's really not the responsibility of exporters to do filtering
  • build_export_data is not free, calling it for a non-sampled span is wasteful

I think this actually is the commonly accepted solution (see the go impl, java impl, etc), so that's the patch I submitted in #871 to get this working properly again.

@jtescher
Copy link
Member

jtescher commented Sep 2, 2022

Happy to keep investigating less footguy-y solutions as well, but would be good to get this fixed for the upcoming 0.18.x release so at least the sdk provided processors have the correct behavior.

@TommyCpp TommyCpp removed their assignment Sep 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-trace Area: issues related to tracing bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants