Allow adding links after span creation #3186

pyohannes · 2023-02-07T23:33:13Z

Fixes #454

The limitation caused by preventing adding links after span creation limits OpenTelemetry users in the ability to efficiently and intuitively create spans and traces in certain scenarios (see examples in the linked issues and PRs below).

Adding links after span creation allows to create more meaningful traces in those scenarios. Best-effort sampling based on links is still possible, but only based on links added at creation time (similar to attributes). This was discussed with members of the sampling SIG, and from their side there are no concerns regarding removing this restriction (also see this PR comment).

Some use cases from the related issue:
#454 (comment)
#454 (comment)
#454 (comment)
#454 (comment)
#2278 (comment)

Users who are forced to come up with workarounds:

Changes

This moves the section about adding span links from "Span creation" to "Span operations", removing mentions of the limitation that links can only be added at creation time. The name AddLink is proposed, wording is kept consistent with the corresponding section for adding attributes.

Discussions about the need for indicating causality in link-relationships (e. g. via attributes) will be held in the messaging workgroup. If necessary, this will result in a follow-up PR.

specification/compatibility/opencensus.md

MrAlias · 2023-02-08T01:07:19Z

Adding this method to a span is going to be a breaking change to the Go API for SDK developers. It is not something I support.

I have looked through the use cases, and they all seem to be the same situation: a user wants to "link" message IDs to an already existing span. I don't see this as a good enough reason to break the API, especially since a user can already wrap this batch receive operation with a child span and add the appropriate links there. Additionally, there were many other proposals to solve this use case in #454. Those alternate approach would not break the Go API. Can we consider those approaches instead?

cc @open-telemetry/go-maintainers

reyang · 2023-02-08T01:15:57Z

Adding this method to a span is going to be a breaking change to the Go API for SDK developers.

Is this due to a restriction in the Go programming language itself (adding any new API will become a breaking change instead of an additive change), or it is due to some design/implementation decision that OpenTelemetry Go has made (and it is hard to revert that decision)?

yurishkuro · 2023-02-08T01:17:53Z

Adding this method to a span is going to be a breaking change to the Go API for SDK developers.

I know what you mean, but that's not a good argument as it means there can be no evolution, period. There are ways to add methods to Go API without breaking the existing interfaces, e.g. by defining new ones. It's not pretty, but what can you do - the new functionality is relatively niche, so I think it's an acceptable trade-off.

yurishkuro · 2023-02-08T01:20:43Z

Is this due to a restriction in the Go programming language itself

it exists in any language that supports pure interfaces. E.g. in Java <=1.6, adding a method to an interface meant existing classes were incompatible, but I think Java deferred that to runtime, whereas Go fails at compile time. After Java introduced default implementations, this became non-issue (well, less of).

reyang · 2023-02-08T01:30:41Z

Is this due to a restriction in the Go programming language itself

it exists in any language that supports pure interfaces. E.g. in Java <=1.6, adding a method to an interface meant existing classes were incompatible, but I think Java deferred that to runtime, whereas Go fails at compile time. After Java introduced default implementations, this became non-issue (well, less of).

In C++, this is what folks normally do 😆

https://learn.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/icorprofilerinfo-interface

lalitb · 2023-02-08T03:00:16Z

Adding new method in API will break the ABI compatibility for OpenTelemetry C++ too. But we can increment API version through inline namespaces in case of ABI breaking change, so should be good. And will vote for this feature as there have been multiple asks for it from C++ community too.

MrAlias · 2023-02-08T15:48:42Z

I know what you mean, but that's not a good argument as it means there can be no evolution, period.

I didn't say the API cannot be evolved, those are your words. I said this change is not motivated enough to warrant the breakage it will cause.

There are alternatives that were proposed and there is an existing solution using a new span. Why do we need to add this change to accomplish what can already be achieved or can be achieved in an alternate, less destructive, manner?

pyohannes · 2023-02-08T18:10:18Z

I said this change is not motivated enough to warrant the breakage it will cause.

I'm not sure what exactly qualifies a change as "motivated enough". The proposed change is probably not crucial to a majority of OpenTelemetry users, but users stumble over the limitation again and again. The description above mentions comments and PRs of at least seven different users, and those are only the ones who somehow made it to #454 (via comments or links).

Creating a separate (semantically meaningless) span for adding links is not an acceptable workaround in my opinion. What operation is such a span supposed to cover? What's the duration of such a span? What's the semantic meaning of this duration?

Regarding "breakage": I'm not a Go expert, but as far as I understand this doesn't refer to users (no existing Go OTel users will be broken), but it means that supporting this in a way that doesn't break existing users results in challenges and headaches for OTel Go maintainers?

MrAlias · 2023-02-08T18:30:49Z

Creating a separate (semantically meaningless) span for adding links is not an acceptable workaround in my opinion. What operation is such a span supposed to cover? What's the duration of such a span? What's the semantic meaning of this duration?

Creating a span to track the partial read result of a consumer request tracks the operation of reading a partial result. The duration of the span would represent the time it took to read part of the returned request. Calling it meaningless is subjective and not universal.

For those that do consider it meaningless, why not delay the real span creation until all known messages are read and set the start time to a previously recorded value?

The description above mentions comments and PRs of at least seven different users, and those are only the ones who somehow made it to #454 (via comments or links).

Would these users get what they want if documentation was updated to show how to achieve what they want by delaying a span creation? Or by creating child spans?

How many users will complain about the breaking changes to Go and C++? Will they be even more frustrated when they realize there were design choice that could have been implemented that wouldn't have disrupted things?

As one of those linked use cases mentions, the idea of adding a link via an event has still not been addressed.

Regarding "breakage": I'm not a Go expert, but as far as I understand this doesn't refer to users (no existing Go OTel users will be broken), but it means that supporting this in a way that doesn't break existing users results in challenges and headaches for OTel Go maintainers?

No, currently how these changes are written the Span interface will need to be extended. This is a breaking change to any SDK author that implements the current interface definition.

Adding an additional Go interface that an SDK span could implement is the proposed work around for Go (see https://github.com/open-telemetry/opentelemetry-go/blob/main/CONTRIBUTING.md#interfaces). The current changes in this PR would not be satisfied if that approach was taken in Go, our implementation would not be compliant with the Span interface as this PR defines it. It would also indeed create a jumbled mess. A mess that would affect all API users, not just the users that want to add a link to a span after it was created.

pyohannes · 2023-02-08T19:23:48Z

For those that do consider it meaningless, why not delay the real span creation until all known messages are read and set the start time to a previously recorded value?

This was considered, but it breaks context propagation.

MrAlias · 2023-02-08T19:29:41Z

For those that do consider it meaningless, why not delay the real span creation until all known messages are read and set the start time to a previously recorded value?

This was considered, but it breaks context propagation.

Gotcha 👍

jsuereth · 2023-02-09T13:26:10Z

specification/compatibility/opencensus.md

@@ -177,14 +177,10 @@ using the OpenCensus <-> OpenTelemetry bridge.
   that [parent spans must be specified during span creation](../trace/api.md#span-creation).
   This leads to some issues with OpenCensus APIs that allowed flexible
   specification of parent spans post-initialization.
-2. Links added to spans after the spans are created. This is [not supported in


I don't think we have OC compliance matrix lines, but I think this would deserve one.

cc @dashpole - For tracking that this work will need to be done in the OC shims.

Aneurysm9 · 2023-02-09T16:14:35Z

For those that do consider it meaningless, why not delay the real span creation until all known messages are read and set the start time to a previously recorded value?

This was considered, but it breaks context propagation.

How? I might be missing something here. Can you spell this out for me?

MrAlias · 2023-02-09T16:33:01Z

For those that do consider it meaningless, why not delay the real span creation until all known messages are read and set the start time to a previously recorded value?

This was considered, but it breaks context propagation.

How? I might be missing something here. Can you spell this out for me?

I think the idea is the request to a messaging system is mean to be made with a context. The context should contain an active trace/span so the receiver can properly link to the trace. If you start the trace after the request returns there is no way to include it in the transmitted context.

MrAlias · 2023-02-09T16:33:40Z

Talking with @pyohannes in slack it was brought up that another option that could work here, in addition to adding a link to an event, is to add a span end option that accept links.

carlosalberto · 2023-02-13T15:59:51Z

specification/trace/api.md

+`Span`s can be from the same or a different trace (see
+[Links between spans](../overview.md#links-between-spans)).
+
+A `Link` is structurally defined by the following properties:


We should massage these two as operation parameters, as done for other operations (as it stopped being tied to Span creation).

(Also realized this original section repeats below the same information regarding SpanContext and attributes :/ )

We should massage these two as operation parameters, as done for other operations (as it stopped being tied to Span creation).

Do you think the point below (which is already part of the PR) suffices:

An API to record a single Link where the Link properties are passed as
arguments.

specification/trace/api.md

pyohannes · 2023-02-21T17:21:10Z

There's an alternative PR up, exploring adding links as events: #3240

yurishkuro · 2023-02-21T17:28:46Z

I don't understand why sampling consideration are trampling the basic data modeling considerations?

pyohannes · 2023-02-21T17:57:35Z

I don't understand why sampling consideration are trampling the basic data modeling considerations?

Sampling isn't the issue here, but avoiding churn for SDK developers:

Alternate, less intrusive, approaches (events and span end options) can implemented instead of this change. Those alternates should be explored.

The aim is to evaluate alternative approaches that were brought up, and then to decide which way to go. Or, as it came up as an argument too, to decide that the proposed capability isn't worth the cost of any of the possible approaches.

yurishkuro · 2023-02-21T18:17:17Z

but avoiding churn for SDK developers
The aim is to evaluate alternative approaches that were brought up

Alternative approaches involve just as much "churn", since the ask is to add new functionality.

lmolkova · 2023-02-23T18:34:30Z

Regarding non-extensible Go interfaces:
our Azure SDK for Go team found a solution that allows designing APIs in an extensible way. It won't help to solve this problem since Span is already an interface, but could be used in the future to prevent such problems on APIs that are not stable yet.

Effectively we use a struct of functions instead of an interface. It's a bit more work on the implementation side but should not affect user experience if we do it right.
Tagging @jhendrixMSFT who has much more context on this solution.

Here's an example:
(we have tracing abstractions to avoid direct dependencies and support legacy tracing approaches)

type SpanImpl struct {
	End func()
	SetAttributes func(...Attribute)
	AddEvent func(string, ...Attribute)
	AddError func(err error)
	SetStatus func(SpanStatus, string)
}

type Span struct {
	impl SpanImpl
}

full example: https://github.com/Azure/azure-sdk-for-go/blob/96d7f70e55a1f334ff6c95b5acf458ce355f45a1/sdk/azcore/tracing/tracing.go#L87-L118

// cc @carlosalberto

MrAlias · 2023-02-23T20:07:14Z

Regarding non-extensible Go interfaces: our Azure SDK for Go team found a solution that allows designing APIs in an extensible way. It won't help to solve this problem since Span is already an interface, but could be used in the future to prevent such problems on APIs that are not stable yet.

Effectively we use a struct of functions instead of an interface. It's a bit more work on the implementation side but should not affect user experience if we do it right. Tagging @jhendrixMSFT who has much more context on this solution.

Here's an example: (we have tracing abstractions to avoid direct dependencies and support legacy tracing approaches)
type SpanImpl struct {
	End func()
	SetAttributes func(...Attribute)
	AddEvent func(string, ...Attribute)
	AddError func(err error)
	SetStatus func(SpanStatus, string)
}

type Span struct {
	impl SpanImpl
}
full example: https://github.com/Azure/azure-sdk-for-go/blob/96d7f70e55a1f334ff6c95b5acf458ce355f45a1/sdk/azcore/tracing/tracing.go#L87-L118

// cc @carlosalberto

Thanks for the suggestion. It was one considered but not implemented.

It is a moot point at this time as the interface in question has already been release as a stable v1.

lmolkova · 2023-02-23T23:09:00Z

@MrAlias my point there was to

prevent such problems on APIs that are not stable yet.

I hope we can consider designing future Go APIs in an extendable way.

Aneurysm9 · 2023-02-24T00:35:21Z

@MrAlias my point there was to

prevent such problems on APIs that are not stable yet.

I hope we can consider designing future Go APIs in an extendable way.

We did consider designing the existing Go APIs in an extensible way and will continue to do so. I resent the implication that we did not.

As @MrAlias stated, we considered embedding the an interface in a struct and rejected it as it is dangerous. We settled on an approach that we felt provided the most flexibility and safety. That included keeping open the option to add methods to interfaces expected to be implemented by an SDK, such as the Span interface. Doing so has lead to being questioned, rightly, by our community as to why we are even considering breaking changes in this manner.

trask · 2023-02-24T01:21:13Z

from @Aneurysm9's "being questioned" link above:

One thing that I'm not yet understanding is why the Go OTel API can't bump the version from v1 to v2 when they need to extend the API/interface.

I'm wondering if this is the best solution for everyone?

It allows the OpenTelemetry specification to evolve, and keeps the Go community happy(?)

Aneurysm9 · 2023-02-24T01:34:29Z

We are implementing the OpenTelemetry API v1. What happens when there is an OpenTelemetry API v2 but we have already made half a dozen major version increments because the v1 API has been expanded? Are we to expect users will know that go.opentelemetry.io/otel/trace/v8 is the first implementation of OpenTelemetry API v2?

Beyond that, we're required to support all major API versions for three years from the release of the next version. If we have to bump the major version every time a method is added to the API then we're quickly going to wind up supporting many distinct API versions. And SDK versions to match. And all of the modules in -contrib.

A v2 API is not a small undertaking and it should be the last resort. We should not break compatibility when there are other options available. We've spent a lot of time here asking why Go doesn't do something different and not a lot explaining why other, less invasive, options aren't viable.

lmolkova · 2023-02-24T02:04:58Z

I think there are two different discussions here:

do we need to record links after start? The answer seems to be yes, but not clear how to do it in Go given the limitations
should we limit OTel spec evolution only to what can be done in non-breaking manner in Go? I hope that answer is no and we should be ready to extend OTel Go accordingly.

Then the question is if this change deserves a major version update for OTel Go.

We have a couple of alternatives:

record links at the end of span
- Cons: sampling can't happen when link is discovered, and would have to be delayed.
record them as events
- Cons: (there is a discussion in Add RecordLink for Span. #3240)
  - we'd have two representations of links in OTLP (and probably need to retire Links one)
  - laconic "poor design and user experience" from @yurishkuro

regardless of events discussion and over-the-wire representation, we should be able to add RecordLink public API in languages that can do it and pick a different option for Go:

don't implement it until the more essential need for v2 comes and bundle breaking changes together
- if in the meantime we'll end up with events, this method will anyway be an optional sugar
implement it as an argument for EndSpan

Aneurysm9 · 2023-02-24T02:27:20Z

I think that allowing links to be recorded at the end of the span is the best path forward. It does not require any compatibility breaking changes and is no worse with regard to sampling than adding a span.RecordLink() method. We already have precedence for allowing changes to span data that cannot be utilized by head samplers, such as span.SetName() and span.SetAttributes().

lmolkova · 2023-02-24T02:37:03Z

I think that allowing links to be recorded at the end of the span is the best path forward. It does not require any compatibility breaking changes and is no worse with regard to sampling than adding a span.RecordLink() method. We already have precedence for allowing changes to span data that cannot be utilized by head samplers, such as span.SetName() and span.SetAttributes().

From the API standpoint, we don't pass attributes or events to end methods. I.e. it won't be consistent.

Also, I think the time at which link was captured could be significant and (assuming we use events underneath or stream data as it arrives) we can send links over the wire without keeping them in memory as they arrive.

Blocking all these possibilities and requiring callers to collect all their links and add them at the end in all languages would not be great.

carlosalberto · 2023-02-24T15:45:14Z

Blocking all these possibilities and requiring callers to collect all their links and add them at the end in all languages would not be great.

So this could be left to the SIGs so they add this operation either as a) a method of Span or b) Through Span.End(). FWIW, I think this may be the option that could make all of us happy. However, I remember some concerns @jmacd had (would you mind providing details?). There's no perfect solution anyway, etc.

jmacd · 2023-03-03T17:55:55Z

@carlosalberto

However, I remember some concerns @jmacd had (would you mind providing details?).

@Aneurysm9

no worse with regard to sampling than adding a span.RecordLink() method

I think see placing links in the end of a span as a problem, but my reasoning is based on my beliefs about the under-specified trace data model. I believe we should treat span links as bi-directional information, which means that we ought to be able to configure a sampler that records links on both sides of the link. If we force users to record span links at the end, it looks like a one-directional piece of information--the information could've been used to update the linked-to span in real time, but it was deferred so the tracer doesn't know about it.

The reason I prefer to see links presented to the tracer in the moment is that there is better a chance to record the To link when it happens. If an unsampled context links to a live, sampled span, it would be best to record the link to the live sampled span because it is going to export. Likewise, a "z-pages" style of reader will not be able to see links for in-progress spans.

I appreciate the problem that OTel-Go presents, and I see a workaround. I still prefer the option where we model late-arriving links as span events. We can formalize how to encode span links as event attributes--for demonstration purposes let's assume that #2888 is accepted and we can encode nested dictionaries and lists of values. A span link event would be encoded using semantic conventions that include all its details, using span_id and trace_id attributes along with the other link properties as nested structure.

Now, users need not call AddEvent() directly to record span links. In the v1.x API package this can be done as a stand-alone method that constructs the appropriate span link event using semantic conventions, e.g., a package-level function like trace.SpanLinkEvent(span1, span2, attributes...). When a v2 trace API is eventually required for some other reason, the package-level method can be deprecated and a new span method can be introduced. Future Tracers might be more efficient if they get the span arguments directly, but I don't think we should increment a major version for this.

lmolkova · 2023-03-03T18:55:15Z

Now, users need not call AddEvent() directly to record span links. In the v1.x API package this can be done as a stand-alone method that constructs the appropriate span link event using semantic conventions, e.g., a package-level function like trace.SpanLinkEvent(span1, span2, attributes...). When a v2 trace API is eventually required for some other reason, the package-level method can be deprecated and a new span method can be introduced. Future Tracers might be more efficient if they get the span arguments directly, but I don't think we should increment a major version for this.

I assume the same approach can be used to record links (that appear as links on OTLP and on SpanData). Using two different data structures for links (links and events) is very controversial as outlined in this discussion: #3240 (comment)

I wonder if we really have to decide on links/events discussion now. It seems we need a way to record links whatever they are and data structure and OTLP representation can change later independently if proven needed.

carlosalberto · 2023-03-03T18:59:40Z

Now, users need not call AddEvent() directly to record span links. In the v1.x API package this can be done as a stand-alone method that constructs the appropriate span link event using semantic conventions

I had been thinking about this recently, and even though feels odd, I have to say this could also satisfy the requirements overall (remember that, even if we send Links as Events, a stand-alone helper method tracing.RecordLink(Span, ..) would be handy, as opposed to specifying the raw even data).

Let's discuss this on the next Spec call.

lmolkova · 2023-03-03T19:05:54Z

tracing.RecordLink(Span, ..)

I'd even propose tracing.RecordLink(SpanContext ..) as it would also allow recording links fully independently of span lifetime

github-actions · 2023-03-11T03:17:07Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2023-03-18T03:17:09Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions · 2023-03-29T03:17:02Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

pyohannes requested review from a team as code owners February 7, 2023 23:33

github-actions bot assigned SergeyKanzhelev Feb 7, 2023

yurishkuro approved these changes Feb 7, 2023

View reviewed changes

specification/compatibility/opencensus.md Outdated Show resolved Hide resolved

reyang approved these changes Feb 7, 2023

View reviewed changes

MrAlias mentioned this pull request Feb 8, 2023

trace: changing interfaces will break backwards compability open-telemetry/opentelemetry-go#3277

Closed

jsuereth approved these changes Feb 9, 2023

View reviewed changes

carlosalberto reviewed Feb 13, 2023

View reviewed changes

specification/trace/api.md Outdated Show resolved Hide resolved

carlosalberto approved these changes Feb 13, 2023

View reviewed changes

pyohannes added 5 commits February 13, 2023 11:56

Allow adding links after span creation

dad7e48

MUST -> SHOULD (to keep APIs stable)

b307c62

Fix broken links

618af27

Add documentation requirement for adding links

dd745de

PR comment

703a06b

carlosalberto self-assigned this Feb 20, 2023

carlosalberto mentioned this pull request Feb 21, 2023

Add RecordLink for Span. #3240

Closed

jmacd mentioned this pull request Mar 3, 2023

Support maps and heterogeneous arrays as attribute values #2888

Closed

aabmass mentioned this pull request Mar 6, 2023

OpenCensus trace shim open-telemetry/opentelemetry-python#3203

Open

6 tasks

github-actions bot added the Stale label Mar 11, 2023

github-actions bot closed this Mar 18, 2023

carlosalberto reopened this Mar 21, 2023

github-actions bot closed this Mar 29, 2023

pyohannes mentioned this pull request Sep 26, 2023

Add a new AddLink() operation to Span. #3678

Merged

Allow adding links after span creation #3186

Allow adding links after span creation #3186

Conversation

pyohannes commented Feb 7, 2023

Changes

MrAlias commented Feb 8, 2023

reyang commented Feb 8, 2023

yurishkuro commented Feb 8, 2023

yurishkuro commented Feb 8, 2023

reyang commented Feb 8, 2023

lalitb commented Feb 8, 2023 • edited

MrAlias commented Feb 8, 2023 • edited

pyohannes commented Feb 8, 2023

MrAlias commented Feb 8, 2023

pyohannes commented Feb 8, 2023

MrAlias commented Feb 8, 2023

jsuereth Feb 9, 2023

Choose a reason for hiding this comment

Aneurysm9 commented Feb 9, 2023

MrAlias commented Feb 9, 2023 • edited

MrAlias commented Feb 9, 2023 • edited

carlosalberto Feb 13, 2023

Choose a reason for hiding this comment

carlosalberto Feb 13, 2023

Choose a reason for hiding this comment

pyohannes Feb 13, 2023

Choose a reason for hiding this comment

pyohannes commented Feb 21, 2023

yurishkuro commented Feb 21, 2023

pyohannes commented Feb 21, 2023

yurishkuro commented Feb 21, 2023

lmolkova commented Feb 23, 2023 • edited

MrAlias commented Feb 23, 2023

lmolkova commented Feb 23, 2023

Aneurysm9 commented Feb 24, 2023

trask commented Feb 24, 2023

Aneurysm9 commented Feb 24, 2023

lmolkova commented Feb 24, 2023 • edited

Aneurysm9 commented Feb 24, 2023

lmolkova commented Feb 24, 2023 • edited

carlosalberto commented Feb 24, 2023

jmacd commented Mar 3, 2023

lmolkova commented Mar 3, 2023 • edited

carlosalberto commented Mar 3, 2023

lmolkova commented Mar 3, 2023

github-actions bot commented Mar 11, 2023

github-actions bot commented Mar 18, 2023

github-actions bot commented Mar 29, 2023

lalitb commented Feb 8, 2023 •

edited

MrAlias commented Feb 8, 2023 •

edited

MrAlias commented Feb 9, 2023 •

edited

MrAlias commented Feb 9, 2023 •

edited

lmolkova commented Feb 23, 2023 •

edited

lmolkova commented Feb 24, 2023 •

edited

lmolkova commented Feb 24, 2023 •

edited

lmolkova commented Mar 3, 2023 •

edited