-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracing: Recorded flag not propagated between activities #9011
Comments
Thanks for reporting, @Costo! |
Yeah, I think that is correct. OpenTelemetry requested we add this extra IsRemote bit to ActivityContext that is intended to indicate whether an ID originated from a remote machine across the wire or it was captured from some locally running Activity. Their sampling logic allows folks to use different heuristics depending on whether the parent is remote or not. However we already had APIs like ActivitySource.CreateActivity(string traceId, ...) that didn't accept the extra bit so they had to make an arbitrary choice whether to assume isRemote=true or isRemote=false for these calls. We wound up with setting isRemote=false as the default. In hindsight I wish we'd gone the other way but we've got back compat so no plans to change it. This fix works around that questionable choice of default by creating the ActivityContext yourself with the bit set correctly and then calling an overload that takes the ActivityContext. Let me know if you've got any questions. Hope that helps :) |
fyi @tarekgh |
Thanks for the explanation, @noahfalk! ActivityContext.TryParse(traceParent, traceState, isRemote: true, out ActivityContext parentContext);
activity = source.CreateActivity(context.Request.GetActivityName(), ActivityKind.Server, parentContext); When can |
It fails when the format of traceParent doesn't match the W3C spec (code here: https://source.dot.net/#System.Diagnostics.DiagnosticSource/System/Diagnostics/ActivityContext.cs,70)
There is an older .NET specific id format called hierarchical ID that ActivitySource.CreateActivity() also accepts, but I'm not sure if you'd every encounter it in your scenarios? For .NET Framework apps it is the default format produced when creating Activities but on .NET Core apps you'd only get it by explicitly opting into the legacy behavior which I suspect almost nobody does. If your code is .NET Core only I don't think you'd be giving up much by saying you only support W3C TraceParent IDs. https://learn.microsoft.com/en-us/dotnet/core/diagnostics/distributed-tracing-concepts#activity-ids |
Great, thanks again. We'll likely merge this, then. @Costo would you be happy to open a PR based on your change? |
Hi,
I'm experiencing something similar to this issue in Orleans: open-telemetry/opentelemetry-dotnet#4474
The recorded flag is not propagated between activities, making impossible to sample or exclude whole traces from the root activity.
I was able to fix the issue with a 1-line change to
ActivityPropagationOutgoingGrainCallFilter
.See the diff here: https://github.com/dotnet/orleans/compare/main...Costo:orleans:fix/recorded-flag?expand=1
I'm not sure I understand what the problem is and why my change works. I tried to reproduce the issue in
ActivityPropagationTests
but did not succeed.However, I can give the graph below as a "proof" that the fix works.
When we migrated to from Orleans 3.5 to Orleans 8.0 with OpenTelemetry, we quickly realized that something was wrong as we were quickly hitting the 100GB daily cap every day. Since the fix was applied, it is much better, even if we still have some root traces we can filter out.
The text was updated successfully, but these errors were encountered: