Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaeger exporter intermittent "data does not fit within one UDP packet" #2664

Closed
Baliedge opened this issue Mar 8, 2022 · 1 comment
Closed
Labels
bug Something isn't working

Comments

@Baliedge
Copy link

Baliedge commented Mar 8, 2022

Description

As a followup from #2663, I found there is a buffer overrun issue in the Jaeger exporter causing intermittent drops of trace batches. Traces appear randomly in Jaeger as incomplete or missing.

I apply the workaround in #2663 that configures MaxPacketSize in the Jaeger exporter to 1472 so to remain compatible with network interfaces with likely default 1500 MTU. While my application runs, errors randomly appear like:

2022/03/08 16:39:15 data does not fit within one UDP packet; size 1482, max 1472, spans 6

I've been able to "fix" with the following workaround patch:

diff --git a/exporters/jaeger/agent.go b/exporters/jaeger/agent.go
index d18d891a..b64522e5 100644
--- a/exporters/jaeger/agent.go
+++ b/exporters/jaeger/agent.go
@@ -242,7 +242,7 @@ func (a *agentClientUDP) flush(ctx context.Context, batch *gen.Batch) error {
 func (a *agentClientUDP) calcSizeOfSerializedThrift(ctx context.Context, thriftStruct thrift.TStruct) (int, error) {
        a.thriftBuffer.Reset()
        err := thriftStruct.Write(ctx, a.thriftProtocol)
-       return a.thriftBuffer.Len(), err
+       return a.thriftBuffer.Len() + 19, err
 }

 // Close implements Close() of io.Closer and closes the underlying UDP connection.

This is because I found that the actual packet size was always 19 bytes larger than the value returned by calcSizeOfSerializedThrift().

Environment

  • OS: Occurs on Linux and Mac OS
  • Architecture: amd64
  • Go Version: 1.17
  • opentelemetry-go version: v1.4.1

Steps To Reproduce

  1. See Jaeger exporter sending oversized UDP thrift packets #2663 for tracer setup code example.
  2. Build a trace with many spans with sufficient complexity that the spans must be sent in multiple batches. This increases the probability that the packet will be oversized. Consider this unit test:
func TestSequentialSpans(t *testing.T) {
	const nestLevels = 15

	for nesting := 0; nesting < nestLevels; nesting++ {
		t.Run(fmt.Sprintf("Nest %d levels", nesting), func(t *testing.T) {
			ctx, span = tracer.Start(context.Background(), t.Name())
			defer span.End()
			time.Sleep(10 * time.Millisecond)

			for i := 0; i < nesting; i++ {
				ctx, span2 = tracer.Start(ctx, fmt.Sprintf("Nest level %d", i + 1))
				time.Sleep(5 * time.Millisecond)
				span2.End()
			}
		})
	}
}

Expected behavior

  • No errors should be printed by the Jaeger client.
  • Trace is viewable from Jaeger UI in full detail.
@Baliedge Baliedge added the bug Something isn't working label Mar 8, 2022
@Baliedge
Copy link
Author

Baliedge commented Mar 9, 2022

Please disregard, I was not actually running latest versions of all the otel packages like I thought and so I did not have the recent fix from #2489. This resolved my issue.

@Baliedge Baliedge closed this as completed Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant