Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race conditions in otelsarama.WrapAsyncProducer #755

Merged
merged 3 commits into from
Jun 24, 2021
Merged

Fix race conditions in otelsarama.WrapAsyncProducer #755

merged 3 commits into from
Jun 24, 2021

Conversation

pellared
Copy link
Member

@pellared pellared commented Apr 20, 2021

Why

Sample race condition from Splunk's fork:

==================
WARNING: DATA RACE
Write at 0x00c0250fca60 by goroutine 217:
  github.com/Shopify/sarama.(*topicProducer).partitionMessage()
      /.../go/pkg/mod/github.com/!shopify/sarama@v1.27.0/async_producer.go:467 +0x207
  github.com/Shopify/sarama.(*topicProducer).dispatch()
      /.../go/pkg/mod/github.com/!shopify/sarama@v1.27.0/async_producer.go:410 +0x2e5
  github.com/Shopify/sarama.(*topicProducer).dispatch-fm()
      /.../go/pkg/mod/github.com/!shopify/sarama@v1.27.0/async_producer.go:407 +0x41
  github.com/Shopify/sarama.withRecover()
      /.../go/pkg/mod/github.com/!shopify/sarama@v1.27.0/utils.go:43 +0x5a
Previous read at 0x00c0250fca60 by goroutine 44:
  github.com/signalfx/signalfx-go-tracing/contrib/Shopify/sarama.WrapAsyncProducer.func1()
      /.../go/pkg/mod/github.com/signalfx/signalfx-go-tracing/contrib/!shopify/sarama@v1.9.0/sarama.go:213 +0x900
Goroutine 217 (running) created at:
  github.com/Shopify/sarama.(*asyncProducer).newTopicProducer()
      /.../go/pkg/mod/github.com/!shopify/sarama@v1.27.0/async_producer.go:403 +0x390
  github.com/Shopify/sarama.(*asyncProducer).dispatcher()
      /.../go/pkg/mod/github.com/!shopify/sarama@v1.27.0/async_producer.go:369 +0x739
  github.com/Shopify/sarama.(*asyncProducer).dispatcher-fm()
      /.../go/pkg/mod/github.com/!shopify/sarama@v1.27.0/async_producer.go:322 +0x41
  github.com/Shopify/sarama.withRecover()
      /.../go/pkg/mod/github.com/!shopify/sarama@v1.27.0/utils.go:43 +0x5a
Goroutine 44 (running) created at:
  github.com/signalfx/signalfx-go-tracing/contrib/Shopify/sarama.WrapAsyncProducer()
      /.../go/pkg/mod/github.com/signalfx/signalfx-go-tracing/contrib/!shopify/sarama@v1.9.0/sarama.go:198 +0x3b0
  github.com/Vungle/jaeger/internal/kafka.newProducer()
      /.../go/src/github.com/Vungle/jaeger/internal/kafka/producer.go:98 +0x198
  github.com/Vungle/jaeger/internal/kafka.Initialize()
      /.../go/src/github.com/Vungle/jaeger/internal/kafka/kafka.go:16 +0xd6
  github.com/Vungle/jaeger/internal/services.Services.Initialize()
      /.../go/src/github.com/Vungle/jaeger/internal/services/services.go:24 +0x15c
  main.init.0()
      /.../go/src/github.com/Vungle/jaeger/cmd/jaeger/jaeger.go:96 +0x24c

The race can happen when the goroutine accesses data (Partition and Offset) that might be later processed. Take notice: https://github.com/Shopify/sarama/blob/41df78df10a9ef3c807cbe1f2814001e330fbdf1/async_producer.go#L200-L208

Reference fix: signalfx/signalfx-go-tracing#128

What

Do not access message fields that are set by sarama async producer before they are processed. Moreover do not set attributes based on message partition and offset in such cases.

@codecov
Copy link

codecov bot commented Apr 20, 2021

Codecov Report

Merging #755 (d9cd712) into main (077af5a) will not change coverage.
The diff coverage is 50.0%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main    #755   +/-   ##
=====================================
  Coverage   78.8%   78.8%           
=====================================
  Files         62      62           
  Lines       2707    2707           
=====================================
  Hits        2135    2135           
  Misses       440     440           
  Partials     132     132           
Impacted Files Coverage Δ
...n/github.com/Shopify/sarama/otelsarama/producer.go 94.8% <50.0%> (ø)

@Aneurysm9
Copy link
Member

I am not sure if it is good to call finishProducerSpan(mc.span, 0, 0, nil). Do we want to set the attributes based on the Offset and Partition that are totally artificial?

I share this concern. What options do we have for dealing with it? Is it possible for a valid message to be at offest 0 of partition 0, or can we ignore those attributes of both are 0?

@pellared
Copy link
Member Author

pellared commented Apr 30, 2021

What options do we have for dealing with it? Is it possible for a valid message to be at offest 0 of partition 0, or can we ignore those attributes of both are 0?

I would simply replace all calls to finishProducerSpan(mc.span, 0, 0, nil) with span.End(). There are already quite nice comments in place.

Is it possible for a valid message to be at offest 0 of partition 0, or can we ignore those attributes of both are 0?

I guess that it is possible. Even it would not be, I find an approach with if statements less readable and more error-prone.

@Aneurysm9
Copy link
Member

I would simply replace all calls to finishProducerSpan(mc.span, 0, 0, nil) with span.End().

Let's do that, then. At least, in this location. I'd prefer to put no additional information on the span than to put incorrect information on it, but when the information is available and usable we should use it.

@pellared
Copy link
Member Author

@Aneurysm9 Done

@pellared pellared changed the title Fix possible race condition in sarama instrumentation WrapAsyncProducer Fix race condition in otelsarama.WrapAsyncProducer Jun 21, 2021
@pellared pellared changed the title Fix race condition in otelsarama.WrapAsyncProducer Fix race conditions in otelsarama.WrapAsyncProducer Jun 21, 2021
@MrAlias MrAlias merged commit 6d226af into open-telemetry:main Jun 24, 2021
@pellared pellared deleted the patch-2 branch June 24, 2021 20:12
@Aneurysm9 Aneurysm9 mentioned this pull request Jul 26, 2021
plantfansam referenced this pull request in plantfansam/opentelemetry-go-contrib Mar 18, 2022
Ensure gRPC ClientStream override methods do not panic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants