Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BeforeEnd to have a callback where the span is still writeable #1089

Open
gugahoa opened this issue Oct 13, 2020 · 70 comments · May be fixed by #4024
Open

Add BeforeEnd to have a callback where the span is still writeable #1089

gugahoa opened this issue Oct 13, 2020 · 70 comments · May be fixed by #4024
Assignees
Labels
area:sdk Related to the SDK enhancement New feature or request spec:trace Related to the specification/trace directory triage:accepted:ready-with-sponsor

Comments

@gugahoa
Copy link

gugahoa commented Oct 13, 2020

What are you trying to achieve?
Set an attribute to all spans in a trace

What did you expect to see?
Writeable spans on SpanProcessor's OnEnd(Span)

Additional context.
What I tried to implement is a feature called trace field in Honeycomb (reference: AddFieldToTrace)

For that I first tried to use SpanProcessor, but soon found out that OnEnd(Span) is not writeable.

The solution I arrived was to write a custom exporter which was able to read from a thread-local storage (at the time of the implementation, Baggage had yet to be spec'ed out) to list all attributes related to a Trace and insert into the span before exporting.

From what I understood from the SpanProcessor specification, that's the place where I should have placed this logic, as it's a transformation that should happen before the span is sent to any exporter

@gugahoa gugahoa added the spec:trace Related to the specification/trace directory label Oct 13, 2020
@carlosalberto carlosalberto added the area:sdk Related to the SDK label Oct 13, 2020
@tsloughter
Copy link
Member

/cc @arminru

@garthk
Copy link

garthk commented Oct 13, 2020

Required for dynamic resource-like attributes desired in every span but computed at run-time because they're not constant, eg. instance uptime and recent CPU utilisation. Injecting them in the exporter gives results that become less accurate during the spikes we're most interested in.

@arminru arminru added area:api Cross language API specification issue enhancement New feature or request and removed area:api Cross language API specification issue labels Oct 14, 2020
@arminru
Copy link
Member

arminru commented Oct 14, 2020

IIRC it was discussed to make spans writeable in OnEnd but this was left out because users will not always have full control over the order in which span processors are registered and in turn invoked, which could result in unreliable behavior.
From your use case it sounds like you would not need any attributes that are added/overwritten in OnEnd visible to any other span processor, right? If you only need to have those changes reflected once passed to exporters, this shouldn't be an issue then.

@carlosalberto you seemed to remember some of that discussion as well in the meeting yesterday. Do you know the rationale for that decision?

@iNikem
Copy link
Contributor

iNikem commented Oct 14, 2020

Why these attributes cannot be set in onStart, which already has writable span?

@arminru
Copy link
Member

arminru commented Oct 14, 2020

Reading the spec again, I noticed that it explicitly says that spans are already marked as ended at the time they are passed to SpanProcessor.OnEnd:

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#onendspan

OnEnd is called after a span is ended (i.e., the end timestamp is already set).

https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/api.md#end

End: Implementations SHOULD ignore all subsequent calls to End and any other Span methods, i.e. the Span becomes non-recording by being ended (there might be exceptions when Tracer is streaming events and has no mutable state associated with the Span).

We would also have to change either of those requirements if we wanted to make spans writeable in OnEnd and we would need to understand the consequences well.

@Oberon00
Copy link
Member

Oberon00 commented Oct 14, 2020

Exactly. The difficult to get right (but specified!) order (exporting may happen before the other OnEnd processor) is another problem, but the main problem is that spans are already ended, as @arminru said. This was already discussed in #669 (comment).

There are two ways we could go forward here:

We definitely can't pass a writable span to OnEnd.

@tsloughter
Copy link
Member

Why not define it as your Span Processor must assume it gets the Span in the form it is ended with and not expect to have any changes from previous processors visible?

@tsloughter
Copy link
Member

@iNikem my understanding is that it is because OnStart can't be used to implement the Honeycomb feature https://godoc.org/github.com/honeycombio/beeline-go#AddFieldToTrace -- it would only be able to add the attribute to spans started after this call, but it is supposed to be able to add them to all local spans in the trace, even if already started.

I'd be fine with a BeforeEnd span processor, but don't really like adding more hooks when as far as I can tell the only issue is to define that OnEnd processors can't depend on each other and that if you have multiple processors writing the same attribute you can't know which will actually be added last -- so a warning to use unique attribute keys.

@Oberon00
Copy link
Member

No, the main issue is that in OnEnd the span is already ended, and for an ended span, SetAttribute should be a no-op. Changing that seems liable to opening a can of worms.

@tsloughter
Copy link
Member

Ah, ok. Yea, I wouldn't want to change the fact that ended means read-only but instead move the "ending" to after the processors. But this isn't possible since one of the processors is what takes the "ended" span and passes it to the exporter... I didn't like exporting being part of processors because of this.

And probably too late to argue that sending to the exporter should be separate and unrelated to processors, leaving OnEnd processors to act like "BeforeEnd" :(

@Oberon00
Copy link
Member

Maybe we would even need a special state where the span is semi-ended, i.e. it has IsEnded==true and has an end timestamp, but is still recording.

@iNikem
Copy link
Contributor

iNikem commented Oct 14, 2020

@iNikem my understanding is that it is because OnStart can't be used to implement the Honeycomb feature https://godoc.org/github.com/honeycombio/beeline-go#AddFieldToTrace -- it would only be able to add the attribute to spans started after this call, but it is supposed to be able to add them to all local spans in the trace, even if already started.

I don't understand. If you have a SpanProcessor registered, it will receive all spans created in the given process. And so it will be able to add attributes to all spans. You don't "do a call" manually. All spans are handed to you.

@tsloughter
Copy link
Member

@iNikem it only receives a write-able span for OnStart, meaning if spans are already started in the trace you will not be handed those spans at a point that you have the attributes and can write to the span.

@iNikem
Copy link
Contributor

iNikem commented Oct 14, 2020

It seems I still don't understand your use-case: what are these attributes that are not available right away but become available only when some spans were already started?

And even more: how does onEnd differ from onStart in this regard? I fail to see how you hoped to achieve your goal with onEnd that you cannot do in onStart?

@tsloughter
Copy link
Member

It seems I still don't understand your use-case: what are these attributes that are not available right away but become available only when some spans were already started?

An HTTP user id parsed out of headers is the best example I can think of, but actual honeycomb users (@garthk @gugahoa ?) probably have more. The initial request span is started right away, before user code parses out headers they care about, a child span is started when the user's logic is called at which point they parse out the user-id and want to put it on all spans in that trace but the span created when the HTTP handler got the request has already run OnStart so the only time it could be added is OnEnd -- without manually going through the stack and adding the attribute.

@seh
Copy link

seh commented Oct 14, 2020

That's a good characterization, @tsloughter. This has been a "normal" way of using Honeycomb's Beeline library for a long time. It buffers the trace span tree in memory, and can distribute attributes (called "fields" there) up and down the span tree before sending the spans to the server. It stores a set of fields on the trace itself, in a field on the trace.Trace type called "traceLevelFields." You can see the transcription from that field down to each span just before sending each here.

@bogdandrutu
Copy link
Member

I am curios why do you need to still add events/attributes to the Span when you can do that directly on the SpanData that we will export? What is the advantage to change the Span vs modifying the data that will be exported directly?

@Oberon00
Copy link
Member

Agree with @bogdandrutu. I'm not sure which implemenation language you are using but in Java there is https://github.com/open-telemetry/opentelemetry-java/blob/v0.9.1/sdk_extensions/tracing_incubator/src/main/java/io/opentelemetry/sdk/extensions/incubator/trace/data/SpanDataBuilder.java

Unfortunately, #158 is not resolved so this is all a bit vague in the spec.

@tsloughter
Copy link
Member

"that we will export"

Where is there a pluggable place to modify SpanData before export?

If you are saying OnEnd Processor can modify SpanData then that would be fine.

@iNikem
Copy link
Contributor

iNikem commented Oct 14, 2020

If you are saying OnEnd Processor can modify SpanData then that would be fine.

Processor can certainly modify SpanData

@Oberon00
Copy link
Member

Oberon00 commented Oct 14, 2020

@tsloughter

Where is there a pluggable place to modify SpanData before export?

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

@Oberon00
Copy link
Member

@iNikem

Processor can certainly modify SpanData

True, but in order for the "modification" (SpanData is immutable at least in Java, so it actually creates an updated copy) to be picked up by an exporter, that exporter must be invoked in the same OnEnd.

@gugahoa
Copy link
Author

gugahoa commented Oct 14, 2020

@bogdandrutu

I am curios why do you need to still add events/attributes to the Span when you can do that directly on the SpanData that we will export? What is the advantage to change the Span vs modifying the data that will be exported directly?

That would mean writing my own processor which will export to the configured exporters, but I wouldn't want to write my own processor for that and instead it would be preferable to rely on tried-and-true processors such as the BatchProcessor

@Oberon00

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

I'm not sure this translates well to Erlang, but I need to try it out first to see how it goes

@tsloughter
Copy link
Member

The exporter interface. You can make your own exporter that wraps an existing one and does the modification.

That is what we hoped not to have to do and I believe is already the way people have been doing it.

But I guess if its a wrapper that can take any exporter it and be a new exporter that acts on the attributes and then calls the wrapped exporter with the modified batch it wouldn't be too bad.

It would certainly be more efficient to update "on end", but it sounds like a wrapper exporter is the way to go -- assuming there aren't issues with the lifetime of the shared attributes on a trace, will have to talk to those using it.

@tsloughter
Copy link
Member

I'm not sure this translates well to Erlang, but I need to try it out first to see how it goes

It does. We have an exporter behaviour, otel_exporter. You can create a new exporter that simply goes through all the spans and instead of exporting them it updates their attributes in the table and then calls the export/2 function of the wrapped exporter.

And has to call the init/1 of the wrapped exporter when its init/1 is called at startup.

@vetinari
Copy link

We need this also. Is there anything we can do to get progress on the BeforeEnd()?

@tsloughter
Copy link
Member

Discussion has led to the desire for a more general change to processors for pipelines of "something" (I don't know what to call them since "processor" is already taken) within in each processor. This would allow for the modifications on spans that BeforeEnd.

But no progress aside from that being discussed in Spec SIG meetings has happened.

It has been in my backlog to work on for a long time but one stumbling block is simply naming :). In the collector there are pipelines with processors that feed each other, but since Span Processors already exist in the SDK that terminology can't be re-used. And no matter what is chosen it'll be confusing when compared against the collector.

@Oberon00
Copy link
Member

TelemetryHandler? If the new "something" is meant to replace the processor, it might be the least confusing to name it "Processor2". Yes it's ugly, but also very clear.

@tsloughter
Copy link
Member

I wasn't thinking it would replace them. Instead Processors would remain as they are, an interface of OnStart and OnEnd with an attached exporter but they would have "steps" they run as well. So Processors would be like a Collector's pipeline and "steps" the pipeline's processors.

@dyladan dyladan added triage:accepted:needs-sponsor and removed release:after-ga Not required before GA release, and not going to work on before GA labels Mar 29, 2024
@jackshirazi
Copy link

This issue was discussed at the March 28, 2024 Java SIG in the context of adding stack traces to spans. This particular feature needs the equivalent of beforeEnd (taking a stack trace is expensive, so should only be done for long duration spans which you only know when the span is ended). The discussion suggested this was a compelling use-case for beforeEnd. Following @jack-berg suggestion, we (Elastic) will provide a Java implementation proof-of-concept for this use case, which hopefully will lead to a review of the benefit

@JonasKunz
Copy link

Followup to @jackshirazi 's comment:

Here is a PoC on how to implement a beforeEnd callback in backwards-compatible manner in the Java SDK.

This PR implements a SpanProcessor which adds stacktraces to long-running spans without beforeEnd support. The PR shows how much boilerplate is needed:

  • It needs to wrap the ReadableSpan in order to be able to modify it after it has been ended
  • The SpanProcessor needs to be installed in a special way: It needs to decorate the SpanProcessor invoking the exporters. This is the only way the "wrapped" and therefore modified span can be passed down to the exporters

@jack-berg
Copy link
Member

I support this. Let's discuss at the 4/9/2024 Spec SIG meeting.

We're trying out a new spec issue management (described in #3821) flow where issues like this need a sponsor. I would be happy to sponsor @jackshirazi / @JonasKunz if one of you wants to take this on (i.e. own the spec PR, prototype).

@jack-berg
Copy link
Member

Reading through the comments, the arguments not to do this seem to be:

  • Its already possible to do this in a span exporter
  • Doing this in a span processor requires paying attention to span processor ordering

But asking users to do this in span exporter is inconvenient and unintuitive - this is clearly a "processing" type of task so why would a user expect to have to do it in an exporter?

And yes, by doing this in a processor you have to pay attention to processor registration order, but that's expected and matches common sense expectations for processors. Notice that nobody expects to ignore processor ordering configuration in the collector. File configuration makes the registration order of processors easy to configure.

This is the type of thing that is easy to do and fills an obvious conceptual flaw in the SDK design, where we've defined processors which can only process in a very constrained manner. Let's just fix it with the obvious / simple solution, which is to add a new optional beforeEnd method to SpanProcessor.

@tsloughter
Copy link
Member

@jack-berg the end result I think was the replace the decorators feature` that was never defined.

@jack-berg
Copy link
Member

Why introduce a new concept / SDK extension point instead of enhancing an existing? Limiting SDK extension points reduces cognitive load for users.

@tsloughter
Copy link
Member

@jack-berg to make it possible to do similar to what this wants to do in OnStart as well.

@tsloughter
Copy link
Member

Or no, that wasn't the reason... I swear there was a good reason though! hah

@pellared
Copy link
Member

pellared commented Apr 9, 2024

Loose naming idea:
In .NET there is a pattern to name callbacks like OnEnd (called just after End) and OnEnding (called just before End). We could name it OnEnding (instead of BeforeEnd).

Making a writeable spans on SpanProcessor's OnEnd(Span) for Go will be breaking change as OnEnd accepts a ReadOnlySpan interface. Reference: https://pkg.go.dev/go.opentelemetry.io/otel/sdk/trace#SpanProcessor

@jack-berg
Copy link
Member

Discussed in the 4/9/2024 Spec SIG meeting. There was agreement that we should look into solutions to accommodate a variety of use cases that require modifying spans near the end, including:

  • Capturing expensive stack traces only if some condition is true
  • Removing unwanted attributes that were added by instrumentation
  • Schema transformation

Two broad families of solutions were suggested:

  • Modify the span processor, e.g.
    • Add new beforeEnd operation where the span is still mutable
    • Evolve the onEnd operation to allow span to be mutated
  • Codify the notion of a decorator, where the SDK provides tooling or an additional extension point to allow span processors to decorate / modify the ReadableSpan given the SpanProcessor#onEnd

@jackshirazi / @JonasKunz are either (or anyone else) interested / able to own this?

@JonasKunz
Copy link

I'm happy to own this.

@jack-berg
Copy link
Member

Hi @JonasKunz - you've been assigned to the issue. Please reach out to me if needed. Thanks!

@JonasKunz
Copy link

JonasKunz commented Apr 17, 2024

I'd like to share my thoughts on the proposed families of solutions.

  • Modify the span processor, e.g.
    • Add new beforeEnd operation where the span is still mutable
    • Evolve the onEnd operation to allow span to be mutated

I think evolving the existing onEnd callback isn't the best solution here, because it technically is a breaking change:
Currently, span processors can assume that the span they are operating on in onEnd does not change during this call.
If the span becomes mutable, this is not the case anymore.

For the beforeEnd (or as suggested onEnding) callback this shouldn't be a problem in terms of backwards compatibility.
However implementors of beforeEnd would need to be aware of the fact that the span they are operating on might be changing from another thread, because in contrast to onStart the span has already escaped into the wild.

Here is a draft implementation showing how this new callback could be implemented in Java.

Codify the notion of a decorator, where the SDK provides tooling or an additional extension point to allow span processors to decorate / modify the ReadableSpan given the SpanProcessor#onEnd

IIRC the main concern here from @jack-berg was that this introduction of a new component type might introduce too much complexity for SDK users and I share that concern.

However, having a pipeline-like structure where one stage can pass a wrapped, immutable span to the next stage is a lot more flexible than just the beforeEnd callback:

  • It allows to filter spans
    • Example use-case: We only want to export database spans which have a latency above a given threshold
    • Can be implemented by not invoking the next pipeline stage
    • Currently can be achieved by wrapping exporters, but this invalidates the batching performed previously in the BatchingSpanProcessor
  • Buffer spans and export them after some delayed notification:
    • First example use-case: Adding cloud provider resource attributes (e.g. AWS). These resource attributes require costly operations (e.g. network calls) to be discovered. This currently requires a tradeoff: Either not add those attributes or delay the application startup until the costly operation has finished. The capability to buffer spans would open up a third option: Buffer the spans in a span processor, apply the new resource attributes when they are discovered and only after that pass the spans to the next stage
    • Second example use-case: Correlating spans with external profiling data. In our distro we add the ids of stack-traces recorded by the eBPF profiler to spans. This data arrives with a slight delay, so the spans need to be delayed and not immediately sent too.

I think we can get this flexibility without introducing an entirely new component and therefore with a lot less complexity.
Here is a backwards-compatible implementation showing how this could be done at least in Java.

The idea is to add a new parameter to onEnd: This parameter is a function which needs to be called with a span as argument. This function will invoke the next span processor in the "pipeline". This is a well known pattern, e.g. from Java Servlet Filters.

At least in Java, this functionality is very easy and cleanly addable in a backwards compatible way as shown by my implementation. However, I don't know if this is also the case for the other languages.

An alternative way to achieve the same capability would be to add a new boolean initOnEndChaining(Function nextStage) function to SpanProcessor. This function would be invoked by the SDK at initialization time in order to detect whether the SpanProcessor operates in chaining mode (it will call the next processor in line itself) or not (the SDK will be responsible for invoking the span processor after this processor).

This alternative solution is not quiet as elegant as my provided Java solution with the onEnd, but I think it should work in all languages.

It would be great to get your feedback here! As you might have noticed, I would prefer the decoration/chaining approach over beforeEnd, but by enhancing SpanProcessors to do it instead of a new component.

@jack-berg
Copy link
Member

I think evolving the existing onEnd callback isn't the best solution here, because it technically is a breaking change:
Currently, span processors can assume that the span they are operating on in onEnd does not change during this call.
If the span becomes mutable, this is not the case anymore.

I agree but, but would also add that whether or not the onEnd method can be evolved to accept a new mutable span type as an argument will vary based what is allowed in a language and how it is implemented. I think in practice several languages will have to do something quirky to accommodate a change like this, like introduce a new onEnd method with a slightly different name, or a new SpanProcessor interface with a different name, or wait for a major version bump of the SDK.

However, having a pipeline-like structure where one stage can pass a wrapped, immutable span to the next stage is a lot more flexible than just the beforeEnd callback:

The idea that a processor isn't able to generically mutate AND filter data makes it feel misnamed. With that said, the concept of a sampler at least partially overlaps with the filter use case. I wonder if its better to keep things separate, and instead of introducing a filter-chain concept in SpanProcessor, extend Sampler so there is a way to filter spans out on span end. Obviously any filtering that is done after the start of a span introduces the potential for broken spans, since downstream spans make sampling decisions based on the sampling status of their parent, which no longer reflects whether the parent span actually made it to the backend. Still, this use case comes up enough that its worth considering supporting and warning users to make sure they know the consequences.

Can anyone involved in the original trace SDK recall / comment on the reasoning why SpanProcessor wasn't initially designed with chaining in mind? The fact that it wasn't and therefore prevents the SpanProcessor filter use case seems deliberate rather than an oversight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK enhancement New feature or request spec:trace Related to the specification/trace directory triage:accepted:ready-with-sponsor
Projects
Status: Spec - In Progress
Development

Successfully merging a pull request may close this issue.