Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute to indicate whether a Span is used as an Exemplar #2922

Closed
fstab opened this issue Nov 6, 2022 · 17 comments · Fixed by #3820
Closed

Attribute to indicate whether a Span is used as an Exemplar #2922

fstab opened this issue Nov 6, 2022 · 17 comments · Fixed by #3820
Assignees
Labels
area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory spec:trace Related to the specification/trace directory

Comments

@fstab
Copy link
Member

fstab commented Nov 6, 2022

Exemplars are used to navigate from metrics to example traces. For example, the screenshot shows latencies from the http.server.duration metric as visualized by Grafana. Each little dot represents an Exemplar. If you click on an Exemplar, you can navigate to the corresponding trace.

screenshot_2022-10-05_16:23:20_457465619

However, this does not work well if traces are sampled in the collector: In many cases the metric library will choose a Span as an exemplar that will be discarded by the collector during sampling.

It would be good if metric libraries could add a standard Span attribute to indicate that a Span has been selected as an Exemplar. This attribute could be considered by the collector's sampling algorithm to ensure that Exemplars aren't discarded.

A simple way to implement this is adding a boolean attribute, like exemplar="true".

A better way to implement this would be an attribute containing the metric name, like exemplar.metric="http.server.duration". If the attribute is present, it means that the Span is used as an Exemplar. As an additional benefit, the metric name could be used by vendors to allow navigation from a trace to a dashboard showing the corresponding metric.

What do you think?

P.S.: If a Github issue here is the wrong place for this let me know where else to put it.

@fstab fstab added the spec:trace Related to the specification/trace directory label Nov 6, 2022
@arminru arminru added spec:metrics Related to the specification/metrics directory area:semantic-conventions Related to semantic conventions labels Nov 8, 2022
@cyrille-leclerc
Copy link
Member

I like the simplicity of this solution to not break exemplars when sampling traces.
I prefer the more sophisticated option to track which metric has used the trace as an exemplar and I would consider it a multi valued string attribute
examplar.metric=["http.server.duration", "mycompany.order.value"].

@cijothomas
Copy link
Member

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#exemplar-defaults
From the current, it looks infeasible to implement such a thing. The Exemplar reservoir could hold a exemplar for a moment, but it could be overwritten later, so we cannot accurately add this attribute to the span itself.

@fstab
Copy link
Member Author

fstab commented Nov 9, 2022

Thanks Cijo, that's interesting. Tbh I don't understand why the SDKs implement it that way. You can only have one Exemplar per metric per collection cycle, so I don't see why Exemplars should be updated more frequently.

My own background is with Prometheus metrics, I'm maintainer of the Prometheus Java client library. Updating the Exemplar only once per collection cycle should be straightforward to implement in Prometheus libraries, and I guess also in other metrics libraries out there.

I'm wondering whether this could still be considered in the interest of making OpenTelemetry tracing (and sampling) work well with Prometheus metric instrumentation?

It might as well be a good idea to reconsider the current implementation in OpenTelemetry metrics SDKs. If the metric SDK has a sampling algorithm for Exemplars that is not aligned with the trace sampling in the OpenTelemetry collector it is unlikely that the Exemplars will actually be available in the monitoring backend. This makes Exemplars useless unless you don't sample traces in the collector.

@cijothomas
Copy link
Member

Sorry I haven't looked at how OTel Sdks have implemented it yet (only Java has implemented it so far I think.)

But from the spec wording, and the algorithm used in the built-in SimpleFixedSizedExemplarReservoir, the reservoir can be overwritten with subsequent measurements within the same collection cycle. (Unless I read the spec incorrectly..)

@fstab
Copy link
Member Author

fstab commented Nov 10, 2022

For reference: The Prometheus Java metrics library has a retention time for Exemplars, which defaults to 7s. Exemplars are kept for min 7s before the next Exemplar is sampled. This should work fine with marking Spans as "used as an Exemplar", as these spans would only be produced once every 7s.

Is there any chance to change the Spec to allow something like this in the OpenTelemetry SDK? And if that's not possible, is there any chance to introduce a "used as exemplar" attribute anyway for 3rd party metric libraries like Prometheus?

@jsuereth
Copy link
Contributor

jsuereth commented Nov 14, 2022

EDIT: added a section on head sampling.

In practice, it's really hard to know when a span is exporter whether or not the exemplar will be retained and reported by metrics.

As already stated here. Even for prometheus, the issue is that only the "latest" exemplar is kept in the reservoir, so unless you buffer spans and delay writing them until metrics are also written to guarantee consistency, you're basically unable to do this.

Exemplars (today) work best with head-based sampling. They can take into account whether a trace is expected to be sampled to be included. If you can use only head-based sampling, you should be relatively guaranteed at least the local span has been sampled when an exemplar shows up.

If you're doing tail-based sampling, AND, you're able to buffer the full minute or two of data you'd need to delay sampling decisions for metrics to show up, it's possibly a good feature for a tail-sampler. However, right now, by design of exemplars (both in Prometheus client libraries + otel), it's practically impossible to achieve this. This likely needs to be fixed by the tail-based sampling features. At best we can annotate a span that it may have an exemplar, but without waiting for metric reporting, we cannot guarantee it is reported.

@oertl
Copy link

oertl commented Nov 17, 2022

If spans are sampled consistently (see https://opentelemetry.io/docs/reference/specification/trace/tracestate-probability-sampling/#consistent-probability-sampling), exemplars could be simply selected based on the r-value. By choosing spans with greatest r-values as exemplars, the probability will be maximized that they also get sampled at the end.

@jmacd
Copy link
Contributor

jmacd commented Nov 17, 2022

Trying to offer another technical position. Suppose you were not interested in distributed tracing, per se, just in recording the spans that happen to be associated with interesting metric events but not propagating trace context.

In this parallel discussion about Span links and Sampling, #2918 (comment), I've proposed two new Sampler decision codes that could be useful in developing standalone spans as metric exemplars. The two states are

  1. Conditionally Recorded -- for an otherwise unsampled span, it means keep the span in memory while awaiting interesting events.
  2. Exported Unsampled -- for a span that is being exported because it completes some other context or signal, despite not being sampled by an independent sampling decision.

These primitive Sampler decisions would support configuring a trace SDK to export spans either because they link to other sampled traces or because they were included as metrics exemplars, for example.

@jsuereth
Copy link
Contributor

@oertl The Exemplar specification defines a pluggable reservoir component, so if a new type of reservoir was designed to interact with r-value, that's definitely something that could happen. It's unclear to me if that would actually fix the issue or just minimize it, and whether or not that would be a default option or just a configuration option for users who want tail sampling.

@jmacd For both of those, who do you propose we have the interaction between Span/Exemplar? Today Exemplars leverage the Context provided by spans, but metrics do not expose their internals or decision making.

@fstab
Copy link
Member Author

fstab commented Nov 22, 2022

the Exemplar specification defines a pluggable reservoir component

Is it possible to change this and allow other implementations? I understand the issue with the reservoir is that at the point in time when you put a span in the reservoir you don't know yet whether you will actually use it as an Exemplar.

I think there could be other implementations that decide directly whether or not to use a span as an Exemplar, without using a reservoir. For example, you could just select a new Exemplar every n seconds.

@jsuereth
Copy link
Contributor

@fstab WE defined the interface to allow other implementations, we just haven't exposed the configuration options to do so. That is a thing you could do, it does mean you're more likely to sample exemplars early. There's implications in the kind of data you'd be showing in exemplars.

Would love a better hollistic analysis/proposal for tail-sampling.

@fstab
Copy link
Member Author

fstab commented Nov 24, 2022

Would love a better holistic analysis/proposal for tail-sampling.

I can spend a couple of hours next week to write a design doc. Then we have something to start with. It would be great to find a way to make exemplars work with tail-sampling.

@fstab
Copy link
Member Author

fstab commented Nov 29, 2022

@HaloFour
Copy link

We ran into this situation ourselves as we started moving to tail sampling. The solution we employed was that we used a custom SpanContextSupplier implementation for the Prometheus Java client that when it samples a span to be used as an exemplar that it also adds an attribute to that span. Then we configure the tail sampler to look for that attribute and if it is there then we always ensure that the entire trace is sampled.

@samwright
Copy link

I've just had a quick go at implementing something along the lines of @fstab's solution in opentelemetry-java.

The only difference is that instead of adding a new way of rate-limiting the exemplar sampling process, I'm instead selecting the first span seen by each reservoir cell as an exemplar, and keeping it until the metric is collected and the reservoir is reset (which by default happens every 60 seconds). Does that sound feasible, or am I missing something?

@jsuereth
Copy link
Contributor

jsuereth commented Nov 3, 2023

@samwright Sorry for the delay here. Here is what I think we should do instead: open-telemetry/opentelemetry-java#5960

Once we've exposed the hook for you to create a custom ExemplarRegistry, you can move all the code form your POC into it.

@fstab
Copy link
Member Author

fstab commented Nov 3, 2023

Hey, awesome to see progress on this. Just FYI: The new Prometheus client_java library 1.0.0 also implements this, the Span attribute there is exemplar="true". https://prometheus.github.io/client_java/otel/tracing/

carlosalberto pushed a commit that referenced this issue Jan 31, 2024
Fixes #2922
Fixes #3812 
Related to #3756

## Changes

- Cleans up language around exposing `ExemplarReservoir`. (Remove TODO,
e.g.)
- Remove `ExemplarFilter` interface but keep configuration options. (see
#3812)
- Clarify / simplify Spec Compliance matrix for existing state of the
Exemplar Specification.


Prototype in java:
open-telemetry/opentelemetry-java#5960
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants