Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(otel): Add ClientOptions.Instrumenter #679

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

costela
Copy link

@costela costela commented Jul 28, 2023

This small PR is a first attempt to fix the perceived issue in #678.

It extends sentry.ClientOptions with the Instrumenter parameter, analogous to other language SDKs. This can be optionally set to "otel" or "sentry" (the default).

This new option is then respected by middlewares to optionally skip creating a sentry transaction, allowing the consumer to only use "pure" otel for traces.

Considered alternatives:

Another option would be making the transaction created by the otel span processor visible to the middlewares. However, since the span processor cannot easily modify the request context, it becomes necessary to add additional helpers (e.g. a new "adapter" middleware) with the single task of injecting the otel-created transaction into the request context.

Fixes: #678 #788

@codecov
Copy link

codecov bot commented Jul 31, 2023

Codecov Report

Attention: Patch coverage is 97.43590% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 81.20%. Comparing base (255c172) to head (8249502).

Files Patch % Lines
otel/span_processor.go 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #679      +/-   ##
==========================================
+ Coverage   80.89%   81.20%   +0.30%     
==========================================
  Files          48       48              
  Lines        4717     4730      +13     
==========================================
+ Hits         3816     3841      +25     
+ Misses        763      754       -9     
+ Partials      138      135       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@costela
Copy link
Author

costela commented Jul 31, 2023

The windows test failure seems unrelated? Maybe flaky?

As for the coverage decrease, I could add some tests for this case if the general approach seems reasonable.

@cleptric
Copy link
Member

Yep, Windows tests are sadly a bit flaky 🙁
More tests are always welcome. I didn't look into how other SDKs handle this yet, did you maybe check https://github.com/getsentry/sentry-javascript already?

@costela
Copy link
Author

costela commented Aug 16, 2023

Sorry for the delay; vacation got in the way 😉

My javascript is pretty rusty (no 🦀 pun intended), but AFAICT the js integration works slightly differently. It always uses an existing span context (similarly to the proposed change in this PR), but it also has an extra instrumenter option, unlike the go integration (and this PR, for that matter).

The extra option seems a bit error-prone, compared to the sample-func approach in this PR: the code has to remember to check against known values, instead of explicitly passing exactly what we want it to do. But that's just my subjective (and superficial!) take on it.

@costela costela force-pushed the respect-otel-sampling branch 2 times, most recently from db4a1ea to 8e878cd Compare August 23, 2023 08:13
@costela
Copy link
Author

costela commented Aug 23, 2023

I've reworked the PR with a slightly different solution. Now instead of trying to use a sampler, we address the issue at the middleware level.

The problem - as I curently understand it - is related to middleware call-ordering: we want otel to be called first, to respect potential sampling decisions from a remote caller. That sampling decision should be passed to the sentry world, which was already done in the otel span processor, but was not working reliably (fixed in this PR).

After the otel middleware, we still want to use the sentry middleware (at least for event tracking and recovery), but at that point, the existing sentry span created in the span processor above is not in the context, so the sentry middleware happily creates a new one (and without a parent).
To avoid this, we need to tell the sentry middleware to also look for an existing otel span and get the sentry span from that.

Does that make sense?

@costela
Copy link
Author

costela commented Aug 24, 2023

Ok, third time is a charm!

The sentry.SpanOption doesn't quite work, because it can't modify it's received context. So we have to go with yet another middleware, which sits as an adapter between otel and sentryhttp.Handler.
It sets the transaction created via the span processor in the context passed to sentryhttp.Handler, so that it detects its existence and can react accordingly.

PTAL

@costela
Copy link
Author

costela commented Aug 28, 2023

@cleptric can you please allow the tests to run? 🙏

@costela
Copy link
Author

costela commented Sep 19, 2023

FYI: 🔁 rebased on master, please unlock tests 🙏

costela added a commit to exaring/sentry-go that referenced this pull request Sep 25, 2023
⚠️ this is only to allow using the forked version while getsentry#679 is open
@cleptric
Copy link
Member

cleptric commented Oct 4, 2023

Sorry that it took us a while to get back to you.

When using Otel, you shouldn't use any Sentry performance-related features. This is why we added the instrumenter option on other SDKs, which would disable all Sentry instrumentation. In Go, we opted not to add this option, given the manual nature of performance tracing.

So instead of using Sentry's net/http instrumentation, you would use https://github.com/open-telemetry/opentelemetry-go-contrib.

Our propagator considers incoming trace headers, so distributed tracing also works.

@costela
Copy link
Author

costela commented Oct 4, 2023

So instead of using Sentry's net/http instrumentation, you would use https://github.com/open-telemetry/opentelemetry-go-contrib.

So you mean skipping the http middleware? But then we'd also throw out error/panic tracking?

If that's the intended way to work with OTEL, then maybe I should change this PR to instead add an option to sentryhttp.Options to skip traces?

@cleptric
Copy link
Member

cleptric commented Oct 4, 2023

This is a good point about panic tracking I haven't thought about.

IMO, we could wrap this code https://github.com/getsentry/sentry-go/blob/master/http/sentryhttp.go#L97-L112 into an if condition that checks for ClientOptions.EnableTracing, so no transactions are created.

This might make it a bit easier than adding a new integration option.

@costela
Copy link
Author

costela commented Oct 4, 2023

@cleptric you mean disabling ClientOptions.EnableTracing? I think that alone won't quite work either: then all transactions get dropped (even if the OTEL propagator decides to sample them).

We could maybe refactor the sampling logic to respect sample decisions even if EnableTracing is false, but I fear that might be too surprising to users.

In the end we want sentryhttp.Handler to only start a transaction if none has been started already (e.g. by the otel propagator), but unfortunately I haven't found a nice way to pass that informatino along. The propagator can't modify the request context and they are otherwise unaware of each other. That's why I ended up with the current suggestion in this PR. 😕

It seems to me we have two options:

  • we either need to glue the sentry OTEL propagator to the sentryhttp middleware (e.g. via the proposed middleware in this PR)
  • or we add an explicit option to the sentryhttp middleware to not start transactions

@cleptric
Copy link
Member

cleptric commented Oct 4, 2023

Ok, we might need to add the instrumenter option to control the behaviour. If Otel is active, no transactions/spans should be created by the Sentry SDK.

@greywolve
Copy link
Contributor

I think adding the instrumenter option also seems to be consistent with the other sdks.

@costela
Copy link
Author

costela commented Oct 4, 2023

@cleptric @greywolve now this PR has an attempted implementation of the new TracesInstrumenter option. PTAL 🙏

Copy link
Member

@cleptric cleptric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments. This also needs tests.

client.go Outdated
@@ -133,6 +133,9 @@ type ClientOptions struct {
TracesSampleRate float64
// Used to customize the sampling of traces, overrides TracesSampleRate.
TracesSampler TracesSampler
// Which instrumentation to use for tracing. Either "sentry" or "otel" are supported.
// Setting this to "otel" will ignore TracesSampleRate and TracesSampler and assume sampling is performed by otel.
TracesInstrumenter string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call it Instrumenter as in other SDKs.

Copy link
Author

@costela costela Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing! Done! PTAL

client.go Outdated
Comment on lines 307 to 314
switch options.TracesInstrumenter {
case "":
options.TracesInstrumenter = "sentry"
case "sentry":
// noop
case "otel":
// sampling is performed by the OpenTelemetry SDK
options.TracesSampleRate = 1.0
options.TracesSampler = nil
default:
return nil, fmt.Errorf("invalid value for TracesInstrumenter (supported are 'sentry' and 'otel'): %q", options.TracesInstrumenter)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should do this. I would rather have people define this explicitly in their options.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the TracesSampleRate and TracesSampler? Should we then error if the values are set to anything else? Doing neither seems a bit error-prone: Instrumenter: "otel" + TracesSampleRate: 0 (default) would ignore otel sampling decisions and never send traces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure it does this? The span processor actually takes the Otel sampling decision into account, taking priority over the traces sample rate and traces sampler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. That might be a slightly different issue, though. One I had addressed in a previous commit.
The current span processor considers only the sampled decision of the parent, so it only actually respects it if the span is part of an incoming trace. If it's a new trace, without a parent, the otel span might be sampled, but the sentry span remains at SampleUndefined, which then means we fall back to the client-wide sample rate.

The current test doesn't catch it because both the sentry client and the otel stack are initialized with "always trace", so they both arrive at the same decision, even though the processor isn't doing anything.

I just added a commit with the fixed test and the fixed code. PTAL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping 😇

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look later this week.

We made similar changes for node.js in getsentry/sentry-javascript#9203, but I need to dive into this whole topic a bit deeper to give you actionable feedback.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping 😇 (I know these things get lower prio; just wanna avoid it getting forgotten)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry for that! I had to get some stuff shipped over on the PHP side of things, but the next few weeks will be again Go focus time for me.

@cleptric cleptric changed the title feat: respect otel sampling ref(otel): Add ClientOptions.Instrumenter Oct 4, 2023
@costela costela force-pushed the respect-otel-sampling branch 2 times, most recently from a775255 to 0c4ac8e Compare October 5, 2023 13:14
@costela costela requested a review from cleptric October 5, 2023 15:56
@costela
Copy link
Author

costela commented Oct 12, 2023

I want to add another perspective to this MR, that may mean the ClientOptions.Instrumenter is not enough: some client SDKs (e.g. sentry-cocoa) do not seem to support opentelemetry at all. These transactions would be lost when speaking to a sentry+otel service. The service only reads otel headers, the client only sends sentry headers.

The suggestion with the additional middleware would not have this problem: incoming requests could use either sentry or otel propagation, the service code can leverage existing otel integrations (e.g. for pgx; shameless plug 🙈)

Unless sentry plans to support otel propagation on client sdks, it may be necessary to make the service side accept both. Or am I overlooking something?

(nevermind, missed the propagator code, sorry for the noise 🙇)

@cleptric
Copy link
Member

We actually extract the Sentry trace header, see

https://github.com/getsentry/sentry-go/blob/master/otel/propagator.go#L77-L95

@costela
Copy link
Author

costela commented Oct 13, 2023

@cleptric ah, you're right of course. Appologies for the noise; disregard the last comment.

@costela costela force-pushed the respect-otel-sampling branch 2 times, most recently from b9e108b to d8a5714 Compare October 23, 2023 15:30
client.go Outdated Show resolved Hide resolved
@greywolve
Copy link
Contributor

LGTM, but I'll let @cleptric give the final thumbs up :)

@costela
Copy link
Author

costela commented Nov 21, 2023

hi!
just a little bi-monthly rebase + reminder about this PR 😇

@costela
Copy link
Author

costela commented Dec 18, 2023

maybe I can get this PR as a christmas present? 😇 🎅 🎄

(just rebased 🔁)

costela and others added 4 commits February 22, 2024 09:47
This enables instrumentation code (sentryhttp, sentrygin, etc) to skip
trace instrumentation while still performing panic recovery.
Co-authored-by: Oliver Powell <oliverpowell84@gmail.com>
@costela
Copy link
Author

costela commented Feb 23, 2024

Rebased again.

Any chance a paying customer can get some feedback? Even if this isn't the right solution?

@cleptric
Copy link
Member

It's still on our radar!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[otel] integration should sync sampling decision to/from otel in the "parentless" case
4 participants