Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass Jaeger references with a 0 SpanID and TraceID #513

Merged
merged 8 commits into from
Jan 27, 2020

Conversation

joe-elliott
Copy link
Contributor

@joe-elliott joe-elliott commented Jan 21, 2020

Description:
This PR includes targeted changes to make the Jaeger pipeline tolerate references that have a Trace/Span ID of 0.

When translating from ocproto to jaeger a nil TraceID is returned and stored due to this code:

func UInt64ToByteTraceID(high, low uint64) []byte {
if high == 0 && low == 0 {
return nil
}

When the reverse translation was performed the translation failed with ErrNilTraceID:

func BytesToUInt64TraceID(traceID []byte) (uint64, uint64, error) {
if traceID == nil {
return 0, 0, ErrNilTraceID
}

When the ErrNilTraceID and ErrNilSpanID are returned 0's are used instead of throwing an error.
Changes have been tested manually to successfully pass bad references in a way consistent with the Jaeger pipeline/backend.

Fixes #491

Testing:
A reference with a 0 span and trace id was added to the protospan -> jaegerproto and protospan -> jaegerthrift tests.

Signed-off-by: Joe Elliott <number101010@gmail.com>
@joe-elliott joe-elliott changed the title Added support for nil trace/span ID and tests Pass Jaeger references with a 0 SpanID and TraceID Jan 21, 2020
@codecov-io
Copy link

codecov-io commented Jan 21, 2020

Codecov Report

Merging #513 into master will increase coverage by 0.18%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #513      +/-   ##
==========================================
+ Coverage   76.05%   76.24%   +0.18%     
==========================================
  Files         123      123              
  Lines        7639     7619      -20     
==========================================
- Hits         5810     5809       -1     
+ Misses       1553     1536      -17     
+ Partials      276      274       -2
Impacted Files Coverage Δ
translator/trace/big_endian_converter.go 100% <ø> (ø) ⬆️
...ranslator/trace/jaeger/protospan_to_jaegerproto.go 77.81% <ø> (+0.48%) ⬆️
...anslator/trace/jaeger/jaegerthrift_to_protospan.go 95.73% <100%> (+0.02%) ⬆️
...ranslator/trace/jaeger/jaegerproto_to_protospan.go 91.61% <100%> (+0.05%) ⬆️
receiver/zipkinreceiver/trace_receiver.go 62.41% <0%> (-0.26%) ⬇️
receiver/jaegerreceiver/trace_receiver.go 82.81% <0%> (-0.16%) ⬇️
receiver/vmmetricsreceiver/metrics_receiver.go 0% <0%> (ø) ⬆️
config/example_factories.go 66.16% <0%> (+1.93%) ⬆️
receiver/opencensusreceiver/opencensus.go 89.52% <0%> (+3.28%) ⬆️
receiver/prometheusreceiver/metrics_receiver.go 77.77% <0%> (+3.3%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34b2459...0fd706f. Read the comment docs.

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
@joe-elliott
Copy link
Contributor Author

@tigrannajaryan @owais
Taking our conversation into account the PR has been updated. The ErrNilSpan/TraceID errors have been removed and instead 0's are returned. This is a low impact change that, I believe, accomplishes our goals.

There is another approach that I like a bit more conceptually, but will result in more code changes:

  • Leave ErrNilTraceID and ErrNilSpanID.
  • Change UInt64ToByteTraceID and UInt64ToByteSpanID to return valid TraceID's when 0's are passed in instead of nil. i.e. return a []byte full of 0x00's.
  • Adjust impacted translation logic accordingly.

Thoughts?

@tigrannajaryan
Copy link
Member

@tigrannajaryan @owais
Taking our conversation into account the PR has been updated. The ErrNilSpan/TraceID errors have been removed and instead 0's are returned. This is a low impact change that, I believe, accomplishes our goals.

There is another approach that I like a bit more conceptually, but will result in more code changes:

* Leave `ErrNilTraceID` and `ErrNilSpanID`.

* Change `UInt64ToByteTraceID` and `UInt64ToByteSpanID` to return valid TraceID's when 0's are passed in instead of nil.  i.e. return a `[]byte` full of 0x00's.

* Adjust impacted translation logic accordingly.

Thoughts?

This is what I would do. Unclear why the code is not already written this way. I don't see why 0 trace ID is treated specially. 0 uint64 should convert to byte sequence of zeros and vice versa and that would happen if there were no explicit ifs that check for 0 uint64.

Perhaps there are edge cases where this matters? @owais are you aware of any?

@owais
Copy link
Contributor

owais commented Jan 23, 2020

No, don't know of any issues this could cause.

@tigrannajaryan
Copy link
Member

@pjanotti any thoughts on this? #513 (comment)

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
@joe-elliott
Copy link
Contributor Author

@tigrannajaryan @owais
Third time's the charm!

  • Removed logic that returns nil when trace/span id of 0 is requested
  • Added tests to confirm 0 trace/span id maps to slices of 0x00 bytes
  • Adjusted the way ParentSpanID's were translated. Previously the code had relied on the return of a nil slice to have an empty ParentSpanID
  • Adjusted tests that relied on the nil slice instead of 0's

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joe-elliott thank you for your patience.

// TODO: StackTrace: OpenTracing defines a semantic key for "stack", should we attempt to its content to StackTrace?
TimeEvents: jProtoLogsToOCProtoTimeEvents(jspan.Logs),
Links: jProtoReferencesToOCProtoLinks(jspan.References),
Status: sStatus,
}

if jspan.ParentSpanID() != 0 {
span.ParentSpanId = tracetranslator.UInt64ToByteSpanID(uint64(jspan.ParentSpanID()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this special-cased because we want ParentSpanId in OC format to be nil when ParentSpanID in Jaeger format is 0?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see we already have a special-case in the reverse translation [1] so this looks good to me.

[1]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So whenever we translate from a 0 trace/span ID we have to decide if that means a 0 id or a nil id. We've decided we feel like the default is all 0's which I like, but ParentSpanId is a special case because a nil ParentSpanId is common and correct.

There is a special case on the reverse translation back to Jaeger, but if we don't translate back to Jaeger then I believe we would want clearly flag "no parent id" with a nil instead of a 0's byte slice which might be confusing.

Another option is to go with back to my second attempt which was basically:

  • Translate all 0 span/trace ids as nil
  • When you translate any nil span/trace id back to Jaeger choose 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is good, no need to go back to the previous attempt.

@pjanotti
Copy link
Contributor

@pjanotti any thoughts on this? #513 (comment)

@tigrannajaryan I don't recall any good reason for those...

Copy link
Contributor

@pjanotti pjanotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @joe-elliott!

@pjanotti
Copy link
Contributor

@tigrannajaryan @joe-elliott ah I think it was because of this https://github.com/census-instrumentation/opencensus-proto/blob/master/gen-go/trace/v1/trace.pb.go#L139. That said this should not be purged in the pipeline by default (I already had some issues with it in the past). If someone wishes to enforce this OC policy they should not rely on the pipeline to do that. It can be a processor or an option on receiver/exporter. So I'm merging it :)

@pjanotti pjanotti merged commit 96bf8f9 into open-telemetry:master Jan 27, 2020
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this pull request Nov 11, 2021
hughesjj pushed a commit to hughesjj/opentelemetry-collector that referenced this pull request Apr 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve otel collector/jaeger pipeline compatibility
5 participants