Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddtrace/tracer: [bugfix] Atomically load error count when creating child spans #1894

Closed
wants to merge 4 commits into from

Conversation

JDeuce
Copy link

@JDeuce JDeuce commented Apr 12, 2023

What does this PR do?

The error count is written atomically but it can trigger data race when reading here if multiple child spans are created concurrently while errors are happening.

Motivation

This data race excerpt I captured as part of a larger application that does race detection while running complex unit tests on systems that create traces. Somehow it never triggered this with dd-trace-go v1.48.0 but blocks us from updating to v1.49.1.

WARNING: DATA RACE
Write at 0x00c00049c638 by goroutine 1991:
  sync/atomic.AddInt32()
      /usr/local/Cellar/go@1.19/1.19.7/libexec/src/runtime/race_amd64.s:281 +0xb
  sync/atomic.AddInt32()
      <autogenerated>:1 +0x1a
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).setTagError()
      /Users/jdeuce/go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.49.1/ddtrace/tracer/span.go:284 +0x176
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).Finish()


Previous read at 0x00c00049c638 by goroutine 1997:
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.newSpanContext()
      /Users/jdeuce/go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.49.1/ddtrace/tracer/spancontext.go:60 +0x27c
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*tracer).StartSpan()
      /Users/jdeuce/go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.49.1/ddtrace/tracer/tracer.go:473 +0xcb1
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.StartSpan()
      /Users/jdeuce/go/pkg/mod/gopkg.in/!data!dog/dd-trace-go.v1@v1.49.1/ddtrace/tracer/tracer.go:166 +0x8f
  gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.StartSpanFromContext()

Describe how to test/QA your changes

Locally used a version with this patch and saw the race go away

Reviewer's Checklist

  • Changed code has unit tests for its functionality.
  • If this interacts with the agent in a new way, a system test has been added.

@JDeuce JDeuce requested a review from a team April 12, 2023 17:16
@JDeuce JDeuce changed the title Atomically load error count [bugfix] Atomically load error count to resolve Data Race Apr 12, 2023
@JDeuce JDeuce force-pushed the patch-1 branch 4 times, most recently from 374fd5a to 586693a Compare April 14, 2023 14:20
@JDeuce JDeuce changed the title [bugfix] Atomically load error count to resolve Data Race ddtrace/tracer: [bugfix] Atomically load error count when creating child spans Apr 14, 2023
@JDeuce
Copy link
Author

JDeuce commented Apr 17, 2023

@ahmed-mez 👋 any way you can help me get this reviewed?

…ild spans

Otherwise, it's subject to a data race while concurrently creating child
spans and erroring. The count is written via atomic increments.
@dianashevchenko
Copy link
Contributor

@JDeuce Can you please share the steps to reproduce / regression test to ensure that's the fix or the only fix needed?

@JDeuce
Copy link
Author

JDeuce commented Apr 20, 2023

In testing with a private application it did seem to resolve the race condition.

I think it has to do with potentially continuing to create new child spans after span.Finish(WithError(...)) has been called, we are often creating new child spans within child functions before checking if the context was canceled, and we Finish aggressively before waiting for all child tasks to finish.

@katiehockman
Copy link
Contributor

@JDeuce Sorry for the delay here. From your latest message, it sounds like this PR doesn't fix the issue, is that correct? If that's the case, would you mind testing this with the latest dd-trace-go release and filing an issue so we can keep track of it there? Then we can close this PR and investigate the issue separately. We don't want to lose track of this issue if there is a race condition hidden somewhere.

Thank you!

@darccio
Copy link
Contributor

darccio commented May 8, 2024

Closing due to inactivity.

@darccio darccio closed this May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants