Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow adding links after span creation #3186

Conversation

pyohannes
Copy link
Contributor

Fixes #454

The limitation caused by preventing adding links after span creation limits OpenTelemetry users in the ability to efficiently and intuitively create spans and traces in certain scenarios (see examples in the linked issues and PRs below).

Adding links after span creation allows to create more meaningful traces in those scenarios. Best-effort sampling based on links is still possible, but only based on links added at creation time (similar to attributes). This was discussed with members of the sampling SIG, and from their side there are no concerns regarding removing this restriction (also see this PR comment).

Some use cases from the related issue:
#454 (comment)
#454 (comment)
#454 (comment)
#454 (comment)
#2278 (comment)

Users who are forced to come up with workarounds:

Changes

This moves the section about adding span links from "Span creation" to "Span operations", removing mentions of the limitation that links can only be added at creation time. The name AddLink is proposed, wording is kept consistent with the corresponding section for adding attributes.

Discussions about the need for indicating causality in link-relationships (e. g. via attributes) will be held in the messaging workgroup. If necessary, this will result in a follow-up PR.

specification/compatibility/opencensus.md Outdated Show resolved Hide resolved
@MrAlias
Copy link
Contributor

MrAlias commented Feb 8, 2023

Adding this method to a span is going to be a breaking change to the Go API for SDK developers. It is not something I support.

I have looked through the use cases, and they all seem to be the same situation: a user wants to "link" message IDs to an already existing span. I don't see this as a good enough reason to break the API, especially since a user can already wrap this batch receive operation with a child span and add the appropriate links there. Additionally, there were many other proposals to solve this use case in #454. Those alternate approach would not break the Go API. Can we consider those approaches instead?

cc @open-telemetry/go-maintainers

@reyang
Copy link
Member

reyang commented Feb 8, 2023

Adding this method to a span is going to be a breaking change to the Go API for SDK developers.

Is this due to a restriction in the Go programming language itself (adding any new API will become a breaking change instead of an additive change), or it is due to some design/implementation decision that OpenTelemetry Go has made (and it is hard to revert that decision)?

@yurishkuro
Copy link
Member

Adding this method to a span is going to be a breaking change to the Go API for SDK developers.

I know what you mean, but that's not a good argument as it means there can be no evolution, period. There are ways to add methods to Go API without breaking the existing interfaces, e.g. by defining new ones. It's not pretty, but what can you do - the new functionality is relatively niche, so I think it's an acceptable trade-off.

@yurishkuro
Copy link
Member

Is this due to a restriction in the Go programming language itself

it exists in any language that supports pure interfaces. E.g. in Java <=1.6, adding a method to an interface meant existing classes were incompatible, but I think Java deferred that to runtime, whereas Go fails at compile time. After Java introduced default implementations, this became non-issue (well, less of).

@reyang
Copy link
Member

reyang commented Feb 8, 2023

Is this due to a restriction in the Go programming language itself

it exists in any language that supports pure interfaces. E.g. in Java <=1.6, adding a method to an interface meant existing classes were incompatible, but I think Java deferred that to runtime, whereas Go fails at compile time. After Java introduced default implementations, this became non-issue (well, less of).

In C++, this is what folks normally do 😆

https://learn.microsoft.com/en-us/dotnet/framework/unmanaged-api/profiling/icorprofilerinfo-interface
image

@lalitb
Copy link
Member

lalitb commented Feb 8, 2023

Adding new method in API will break the ABI compatibility for OpenTelemetry C++ too. But we can increment API version through inline namespaces in case of ABI breaking change, so should be good. And will vote for this feature as there have been multiple asks for it from C++ community too.

@MrAlias
Copy link
Contributor

MrAlias commented Feb 8, 2023

I know what you mean, but that's not a good argument as it means there can be no evolution, period.

I didn't say the API cannot be evolved, those are your words. I said this change is not motivated enough to warrant the breakage it will cause.

There are alternatives that were proposed and there is an existing solution using a new span. Why do we need to add this change to accomplish what can already be achieved or can be achieved in an alternate, less destructive, manner?

@pyohannes
Copy link
Contributor Author

I said this change is not motivated enough to warrant the breakage it will cause.

I'm not sure what exactly qualifies a change as "motivated enough". The proposed change is probably not crucial to a majority of OpenTelemetry users, but users stumble over the limitation again and again. The description above mentions comments and PRs of at least seven different users, and those are only the ones who somehow made it to #454 (via comments or links).

Creating a separate (semantically meaningless) span for adding links is not an acceptable workaround in my opinion. What operation is such a span supposed to cover? What's the duration of such a span? What's the semantic meaning of this duration?

Regarding "breakage": I'm not a Go expert, but as far as I understand this doesn't refer to users (no existing Go OTel users will be broken), but it means that supporting this in a way that doesn't break existing users results in challenges and headaches for OTel Go maintainers?

@MrAlias
Copy link
Contributor

MrAlias commented Feb 8, 2023

Creating a separate (semantically meaningless) span for adding links is not an acceptable workaround in my opinion. What operation is such a span supposed to cover? What's the duration of such a span? What's the semantic meaning of this duration?

Creating a span to track the partial read result of a consumer request tracks the operation of reading a partial result. The duration of the span would represent the time it took to read part of the returned request. Calling it meaningless is subjective and not universal.

For those that do consider it meaningless, why not delay the real span creation until all known messages are read and set the start time to a previously recorded value?

The description above mentions comments and PRs of at least seven different users, and those are only the ones who somehow made it to #454 (via comments or links).

Would these users get what they want if documentation was updated to show how to achieve what they want by delaying a span creation? Or by creating child spans?

How many users will complain about the breaking changes to Go and C++? Will they be even more frustrated when they realize there were design choice that could have been implemented that wouldn't have disrupted things?

As one of those linked use cases mentions, the idea of adding a link via an event has still not been addressed.

Regarding "breakage": I'm not a Go expert, but as far as I understand this doesn't refer to users (no existing Go OTel users will be broken), but it means that supporting this in a way that doesn't break existing users results in challenges and headaches for OTel Go maintainers?

No, currently how these changes are written the Span interface will need to be extended. This is a breaking change to any SDK author that implements the current interface definition.

Adding an additional Go interface that an SDK span could implement is the proposed work around for Go (see https://github.com/open-telemetry/opentelemetry-go/blob/main/CONTRIBUTING.md#interfaces). The current changes in this PR would not be satisfied if that approach was taken in Go, our implementation would not be compliant with the Span interface as this PR defines it. It would also indeed create a jumbled mess. A mess that would affect all API users, not just the users that want to add a link to a span after it was created.

@pyohannes
Copy link
Contributor Author

For those that do consider it meaningless, why not delay the real span creation until all known messages are read and set the start time to a previously recorded value?

This was considered, but it breaks context propagation.

@MrAlias
Copy link
Contributor

MrAlias commented Feb 8, 2023

For those that do consider it meaningless, why not delay the real span creation until all known messages are read and set the start time to a previously recorded value?

This was considered, but it breaks context propagation.

Gotcha 👍

@@ -177,14 +177,10 @@ using the OpenCensus <-> OpenTelemetry bridge.
that [parent spans must be specified during span creation](../trace/api.md#span-creation).
This leads to some issues with OpenCensus APIs that allowed flexible
specification of parent spans post-initialization.
2. Links added to spans after the spans are created. This is [not supported in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have OC compliance matrix lines, but I think this would deserve one.

cc @dashpole - For tracking that this work will need to be done in the OC shims.

@Aneurysm9
Copy link
Member

For those that do consider it meaningless, why not delay the real span creation until all known messages are read and set the start time to a previously recorded value?

This was considered, but it breaks context propagation.

How? I might be missing something here. Can you spell this out for me?

@MrAlias
Copy link
Contributor

MrAlias commented Feb 9, 2023

For those that do consider it meaningless, why not delay the real span creation until all known messages are read and set the start time to a previously recorded value?

This was considered, but it breaks context propagation.

How? I might be missing something here. Can you spell this out for me?

I think the idea is the request to a messaging system is mean to be made with a context. The context should contain an active trace/span so the receiver can properly link to the trace. If you start the trace after the request returns there is no way to include it in the transmitted context.

@MrAlias
Copy link
Contributor

MrAlias commented Feb 9, 2023

Talking with @pyohannes in slack it was brought up that another option that could work here, in addition to adding a link to an event, is to add a span end option that accept links.

`Span`s can be from the same or a different trace (see
[Links between spans](../overview.md#links-between-spans)).

A `Link` is structurally defined by the following properties:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should massage these two as operation parameters, as done for other operations (as it stopped being tied to Span creation).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also realized this original section repeats below the same information regarding SpanContext and attributes :/ )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should massage these two as operation parameters, as done for other operations (as it stopped being tied to Span creation).

Do you think the point below (which is already part of the PR) suffices:

  • An API to record a single Link where the Link properties are passed as
    arguments.

@carlosalberto carlosalberto self-assigned this Feb 20, 2023
@pyohannes
Copy link
Contributor Author

There's an alternative PR up, exploring adding links as events: #3240

@yurishkuro
Copy link
Member

I don't understand why sampling consideration are trampling the basic data modeling considerations?

@pyohannes
Copy link
Contributor Author

I don't understand why sampling consideration are trampling the basic data modeling considerations?

Sampling isn't the issue here, but avoiding churn for SDK developers:

Alternate, less intrusive, approaches (events and span end options) can implemented instead of this change. Those alternates should be explored.

The aim is to evaluate alternative approaches that were brought up, and then to decide which way to go. Or, as it came up as an argument too, to decide that the proposed capability isn't worth the cost of any of the possible approaches.

@yurishkuro
Copy link
Member

but avoiding churn for SDK developers
The aim is to evaluate alternative approaches that were brought up

Alternative approaches involve just as much "churn", since the ask is to add new functionality.

@lmolkova
Copy link
Contributor

lmolkova commented Feb 23, 2023

Regarding non-extensible Go interfaces:
our Azure SDK for Go team found a solution that allows designing APIs in an extensible way. It won't help to solve this problem since Span is already an interface, but could be used in the future to prevent such problems on APIs that are not stable yet.

Effectively we use a struct of functions instead of an interface. It's a bit more work on the implementation side but should not affect user experience if we do it right.
Tagging @jhendrixMSFT who has much more context on this solution.

Here's an example:
(we have tracing abstractions to avoid direct dependencies and support legacy tracing approaches)

type SpanImpl struct {
	End func()
	SetAttributes func(...Attribute)
	AddEvent func(string, ...Attribute)
	AddError func(err error)
	SetStatus func(SpanStatus, string)
}

type Span struct {
	impl SpanImpl
}

full example: https://github.com/Azure/azure-sdk-for-go/blob/96d7f70e55a1f334ff6c95b5acf458ce355f45a1/sdk/azcore/tracing/tracing.go#L87-L118

// cc @carlosalberto

@MrAlias
Copy link
Contributor

MrAlias commented Feb 23, 2023

Regarding non-extensible Go interfaces: our Azure SDK for Go team found a solution that allows designing APIs in an extensible way. It won't help to solve this problem since Span is already an interface, but could be used in the future to prevent such problems on APIs that are not stable yet.

Effectively we use a struct of functions instead of an interface. It's a bit more work on the implementation side but should not affect user experience if we do it right. Tagging @jhendrixMSFT who has much more context on this solution.

Here's an example: (we have tracing abstractions to avoid direct dependencies and support legacy tracing approaches)

type SpanImpl struct {
	End func()
	SetAttributes func(...Attribute)
	AddEvent func(string, ...Attribute)
	AddError func(err error)
	SetStatus func(SpanStatus, string)
}

type Span struct {
	impl SpanImpl
}

full example: https://github.com/Azure/azure-sdk-for-go/blob/96d7f70e55a1f334ff6c95b5acf458ce355f45a1/sdk/azcore/tracing/tracing.go#L87-L118

// cc @carlosalberto

Thanks for the suggestion. It was one considered but not implemented.

It is a moot point at this time as the interface in question has already been release as a stable v1.

@lmolkova
Copy link
Contributor

@MrAlias my point there was to

prevent such problems on APIs that are not stable yet.

I hope we can consider designing future Go APIs in an extendable way.

@Aneurysm9
Copy link
Member

@MrAlias my point there was to

prevent such problems on APIs that are not stable yet.

I hope we can consider designing future Go APIs in an extendable way.

We did consider designing the existing Go APIs in an extensible way and will continue to do so. I resent the implication that we did not.

As @MrAlias stated, we considered embedding the an interface in a struct and rejected it as it is dangerous. We settled on an approach that we felt provided the most flexibility and safety. That included keeping open the option to add methods to interfaces expected to be implemented by an SDK, such as the Span interface. Doing so has lead to being questioned, rightly, by our community as to why we are even considering breaking changes in this manner.

@trask
Copy link
Member

trask commented Feb 24, 2023

from @Aneurysm9's "being questioned" link above:

One thing that I'm not yet understanding is why the Go OTel API can't bump the version from v1 to v2 when they need to extend the API/interface.

I'm wondering if this is the best solution for everyone?

It allows the OpenTelemetry specification to evolve, and keeps the Go community happy(?)

@Aneurysm9
Copy link
Member

We are implementing the OpenTelemetry API v1. What happens when there is an OpenTelemetry API v2 but we have already made half a dozen major version increments because the v1 API has been expanded? Are we to expect users will know that go.opentelemetry.io/otel/trace/v8 is the first implementation of OpenTelemetry API v2?

Beyond that, we're required to support all major API versions for three years from the release of the next version. If we have to bump the major version every time a method is added to the API then we're quickly going to wind up supporting many distinct API versions. And SDK versions to match. And all of the modules in -contrib.

A v2 API is not a small undertaking and it should be the last resort. We should not break compatibility when there are other options available. We've spent a lot of time here asking why Go doesn't do something different and not a lot explaining why other, less invasive, options aren't viable.

@lmolkova
Copy link
Contributor

lmolkova commented Feb 24, 2023

I think there are two different discussions here:

  1. do we need to record links after start? The answer seems to be yes, but not clear how to do it in Go given the limitations
  2. should we limit OTel spec evolution only to what can be done in non-breaking manner in Go? I hope that answer is no and we should be ready to extend OTel Go accordingly.

Then the question is if this change deserves a major version update for OTel Go.

We have a couple of alternatives:

  • record links at the end of span
    • Cons: sampling can't happen when link is discovered, and would have to be delayed.
  • record them as events
    • Cons: (there is a discussion in Add RecordLink for Span. #3240)
      • we'd have two representations of links in OTLP (and probably need to retire Links one)
      • laconic "poor design and user experience" from @yurishkuro

regardless of events discussion and over-the-wire representation, we should be able to add RecordLink public API in languages that can do it and pick a different option for Go:

  • don't implement it until the more essential need for v2 comes and bundle breaking changes together
    • if in the meantime we'll end up with events, this method will anyway be an optional sugar
  • implement it as an argument for EndSpan

@Aneurysm9
Copy link
Member

I think that allowing links to be recorded at the end of the span is the best path forward. It does not require any compatibility breaking changes and is no worse with regard to sampling than adding a span.RecordLink() method. We already have precedence for allowing changes to span data that cannot be utilized by head samplers, such as span.SetName() and span.SetAttributes().

@lmolkova
Copy link
Contributor

lmolkova commented Feb 24, 2023

I think that allowing links to be recorded at the end of the span is the best path forward. It does not require any compatibility breaking changes and is no worse with regard to sampling than adding a span.RecordLink() method. We already have precedence for allowing changes to span data that cannot be utilized by head samplers, such as span.SetName() and span.SetAttributes().

From the API standpoint, we don't pass attributes or events to end methods. I.e. it won't be consistent.

Also, I think the time at which link was captured could be significant and (assuming we use events underneath or stream data as it arrives) we can send links over the wire without keeping them in memory as they arrive.

Blocking all these possibilities and requiring callers to collect all their links and add them at the end in all languages would not be great.

@carlosalberto
Copy link
Contributor

Blocking all these possibilities and requiring callers to collect all their links and add them at the end in all languages would not be great.

So this could be left to the SIGs so they add this operation either as a) a method of Span or b) Through Span.End(). FWIW, I think this may be the option that could make all of us happy. However, I remember some concerns @jmacd had (would you mind providing details?). There's no perfect solution anyway, etc.

@jmacd
Copy link
Contributor

jmacd commented Mar 3, 2023

@carlosalberto

However, I remember some concerns @jmacd had (would you mind providing details?).

@Aneurysm9

no worse with regard to sampling than adding a span.RecordLink() method

I think see placing links in the end of a span as a problem, but my reasoning is based on my beliefs about the under-specified trace data model. I believe we should treat span links as bi-directional information, which means that we ought to be able to configure a sampler that records links on both sides of the link. If we force users to record span links at the end, it looks like a one-directional piece of information--the information could've been used to update the linked-to span in real time, but it was deferred so the tracer doesn't know about it.

The reason I prefer to see links presented to the tracer in the moment is that there is better a chance to record the To link when it happens. If an unsampled context links to a live, sampled span, it would be best to record the link to the live sampled span because it is going to export. Likewise, a "z-pages" style of reader will not be able to see links for in-progress spans.

I appreciate the problem that OTel-Go presents, and I see a workaround. I still prefer the option where we model late-arriving links as span events. We can formalize how to encode span links as event attributes--for demonstration purposes let's assume that #2888 is accepted and we can encode nested dictionaries and lists of values. A span link event would be encoded using semantic conventions that include all its details, using span_id and trace_id attributes along with the other link properties as nested structure.

Now, users need not call AddEvent() directly to record span links. In the v1.x API package this can be done as a stand-alone method that constructs the appropriate span link event using semantic conventions, e.g., a package-level function like trace.SpanLinkEvent(span1, span2, attributes...). When a v2 trace API is eventually required for some other reason, the package-level method can be deprecated and a new span method can be introduced. Future Tracers might be more efficient if they get the span arguments directly, but I don't think we should increment a major version for this.

@lmolkova
Copy link
Contributor

lmolkova commented Mar 3, 2023

Now, users need not call AddEvent() directly to record span links. In the v1.x API package this can be done as a stand-alone method that constructs the appropriate span link event using semantic conventions, e.g., a package-level function like trace.SpanLinkEvent(span1, span2, attributes...). When a v2 trace API is eventually required for some other reason, the package-level method can be deprecated and a new span method can be introduced. Future Tracers might be more efficient if they get the span arguments directly, but I don't think we should increment a major version for this.

I assume the same approach can be used to record links (that appear as links on OTLP and on SpanData). Using two different data structures for links (links and events) is very controversial as outlined in this discussion: #3240 (comment)

I wonder if we really have to decide on links/events discussion now. It seems we need a way to record links whatever they are and data structure and OTLP representation can change later independently if proven needed.

@carlosalberto
Copy link
Contributor

Now, users need not call AddEvent() directly to record span links. In the v1.x API package this can be done as a stand-alone method that constructs the appropriate span link event using semantic conventions

I had been thinking about this recently, and even though feels odd, I have to say this could also satisfy the requirements overall (remember that, even if we send Links as Events, a stand-alone helper method tracing.RecordLink(Span, ..) would be handy, as opposed to specifying the raw even data).

Let's discuss this on the next Spec call.

@lmolkova
Copy link
Contributor

lmolkova commented Mar 3, 2023

tracing.RecordLink(Span, ..)

I'd even propose tracing.RecordLink(SpanContext ..) as it would also allow recording links fully independently of span lifetime

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 11, 2023
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Mar 18, 2023
@carlosalberto carlosalberto reopened this Mar 21, 2023
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:api Cross language API specification issue spec:trace Related to the specification/trace directory Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Please (re)-allow recording links after Span creation time