Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddtrace/tracer: switch atomics to 32-bit #1443

Merged
merged 1 commit into from Sep 15, 2022
Merged

ddtrace/tracer: switch atomics to 32-bit #1443

merged 1 commit into from Sep 15, 2022

Conversation

knusbaum
Copy link
Contributor

@knusbaum knusbaum commented Aug 23, 2022

sync/atomic has several issues. Among them is that it causes a panic when a
64-bit field isn't correctly aligned. Alignment must be manually ensured
and is easy to forget.

Instead, we will use 32-bit atomic integers which do not require manual
alignment. We can eventually trade them out for Go's new atomics APIs that
were introduced in go1.19, but we have to wait until 1.18 falls out of our
supported versions.

Fixes #1418

@knusbaum knusbaum marked this pull request as ready for review August 23, 2022 20:01
@knusbaum knusbaum requested a review from a team August 23, 2022 20:01
@knusbaum knusbaum requested a review from a team as a code owner August 23, 2022 20:01
@ajgajg1134 ajgajg1134 added this to the 1.42.0 milestone Aug 23, 2022
docker-compose.yaml Outdated Show resolved Hide resolved
ajgajg1134
ajgajg1134 previously approved these changes Aug 23, 2022
@nsrip-dd
Copy link
Contributor

⚠️ I still see this panic (I had to make the tracer use a no-op statsd client or I'd also get panics for datadog-go's statsd client):

# on a linux VM
$ sudo apt install qemu-user
$ env GOOS=linux GOARCH=i386 go test -c
$ ./tracer.test
--- FAIL: TestStartSpanFromContextRace (0.00s)
panic: unaligned 64-bit atomic operation [recovered]
	panic: unaligned 64-bit atomic operation

goroutine 13 [running]:
testing.tRunner.func1.2({0x85ab360, 0x86f37f8})
	/home/vagrant/sdk/go1.18.3/src/testing/testing.go:1389 +0x2ab
testing.tRunner.func1()
	/home/vagrant/sdk/go1.18.3/src/testing/testing.go:1392 +0x41f
panic({0x85ab360, 0x86f37f8})
	/home/vagrant/sdk/go1.18.3/src/runtime/panic.go:838 +0x1c3
runtime/internal/atomic.panicUnaligned()
	/home/vagrant/sdk/go1.18.3/src/runtime/internal/atomic/unaligned.go:8 +0x2d
runtime/internal/atomic.Cas64(0x9cb82bc, 0x0, 0x2)
	/home/vagrant/sdk/go1.18.3/src/runtime/internal/atomic/atomic_386.s:79 +0x11
go.uber.org/atomic.(*Int64).CompareAndSwap(...)
	/home/vagrant/go/pkg/mod/go.uber.org/atomic@v1.10.0/int64.go:77
gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*trace).keep(...)
	/dd-trace-go/ddtrace/tracer/spancontext.go:214
gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).finish(0x9c927e0, 0x170e161be1d901f2)
	/dd-trace-go/ddtrace/tracer/span.go:457 +0x2bd
gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.(*span).Finish(0x9c927e0, {0x0, 0x0, 0x0})
	/dd-trace-go/ddtrace/tracer/span.go:397 +0xe9
gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer.TestStartSpanFromContextRace(0x9c990e0)
	/dd-trace-go/ddtrace/tracer/context_test.go:98 +0xe0
testing.tRunner(0x9c990e0, 0x8653560)
	/home/vagrant/sdk/go1.18.3/src/testing/testing.go:1439 +0x113
created by testing.(*T).Run
	/home/vagrant/sdk/go1.18.3/src/testing/testing.go:1486 +0x374

I also saw the panic with GOOS=linux GOARCH=arm.

I think you might have to use atomic.NewInt64 to get the needed alignment, or use 32-bit atomics where they make sense.

@ajgajg1134 ajgajg1134 self-requested a review August 24, 2022 14:34
Copy link
Contributor

@ajgajg1134 ajgajg1134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing approval until we can get the tests passing and so we don't accidentally merge

@knusbaum
Copy link
Contributor Author

knusbaum commented Aug 24, 2022

Thanks for the testcase, @nsrip-dd. I assumed that go.uber.org/atomic took care of that based on the documentation. I should've double-checked.

I'm wondering if it's worth moving to go.uber.org at all, or if we should wait for go1.19's built-in equivalent, which (according to documentation) does automatically take care of this.
(See: https://pkg.go.dev/sync/atomic#Uint64)

Emphasis mine:

On ARM, 386, and 32-bit MIPS, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically via the primitive atomic functions (types Int64 and Uint64 are automatically aligned). The first word in an allocated struct, array, or slice; in a global variable; or in a local variable (because the subject of all atomic operations will escape to the heap) can be relied upon to be 64-bit aligned.

We obviously need a solution in the meantime, but maybe we can fix that ourselves. Do we even need 64-bit values for most of these? They're all counters that get reset and I don't think we're sending 2^32 anything.

Copy link
Contributor

@nsrip-dd nsrip-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, go.uber.atomic is still nice because it enforces atomic access, even if it doesn't solve alignment problems.

Do we even need 64-bit values for most of these? They're all counters that get reset and I don't think we're sending 2^32 anything.

Looking through the PR, I think you cant definitely change most of these to 32-bit. I've left comments where I think you can safely use 32-bit.

For stuff you're not sure about, you can make them *atomic.Int64, they just need to be initialized with atomic.NewInt64. From a performance perspective, atomic.Int32 can be used safely without paying the price of a heap allocation. But for things that aren't frequently created, like a tracer, the performance probably doesn't matter.

ddtrace/tracer/spancontext.go Outdated Show resolved Hide resolved
ddtrace/tracer/stats.go Outdated Show resolved Hide resolved
ddtrace/tracer/payload.go Outdated Show resolved Hide resolved
ddtrace/tracer/spancontext.go Outdated Show resolved Hide resolved
ddtrace/tracer/tracer.go Outdated Show resolved Hide resolved
@knusbaum
Copy link
Contributor Author

@nsrip-dd

IMO, go.uber.atomic is still nice because it enforces atomic access, even if it doesn't solve alignment problems.

Go's (new with go1.19) standard library types do exactly the same thing. They force atomic access, but also solve alignment issues. It looks almost identical to the Uber API, except it works.

I double-checked and the alignment is taken care of by the new std lib atomics.

It looks like we can move all the 64-bit integers to 32 bits (or bools) so I'm more in favor of doing that at the moment than adding go.uber.org/atomic, since the same API is going to be available to us in the standard library in about a year.

I wouldn't be as hesitant to change to Uber except:

  • It doesn't actually solve the original problem without messing with pointers, which is much uglier than just switching to 32 bits.
  • The standard library version is better because:
    • It's standard
    • It (like Uber's) enforces atomic access
    • It (unlike Uber's) actually fixes the alignment issue.

@nsrip-dd
Copy link
Contributor

Agreed, using the standard library would be my preference as well. Switching everything to 32-bit atomics for now and using the Go 1.19 atomic types as soon as we can makes sense to me.

sync/atomic has several issues. Among them is that it causes a panic when a
64-bit field isn't correctly aligned. Alignment must be manually ensured
and is easy to forget.

Instead, we will use 32-bit atomic integers which do not require manual
alignment. We can eventually trade them out for Go's new atomics APIs that
were introduced in go1.19, but we have to wait until 1.18 falls out of our
supported versions.

Fixes #1418
@knusbaum knusbaum changed the title ddtrace/tracer: switch to go.uber.org/atomic ddtrace/tracer: switch atomics to 32-bit Sep 13, 2022
@knusbaum knusbaum marked this pull request as ready for review September 13, 2022 01:33
nsrip-dd
nsrip-dd previously approved these changes Sep 13, 2022
Copy link
Contributor

@nsrip-dd nsrip-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I tested these changes on a Linux VM, using qemu to run with GOARCH=arm and GOARCH=386. There aren't any panics due to unaligned atomics, so this should be good to go.

As mentioned previously, I had to temporarily force the tracer to use a no-op statsd client because there are unaligned atomics upstream (see DataDog/datadog-go#260). But as far as I can tell this PR fixes the issues in this repository.

Also, this is a small thing and not worth blocking this PR IMO, but the unit tests in textmap_test.go don't pass on 32-bit architectures. e.g:

--- FAIL: TestTextMapPropagator/InvalidTraceTags (0.00s)
        textmap_test.go:322: 
            	Error Trace:	textmap_test.go:322
            	Error:      	Not equal: 
            	            	expected: "-478508587"
            	            	actual  : "1909628612072081877"
            	            	
            	            	Diff:
            	            	--- Expected
            	            	+++ Actual
            	            	@@ -1 +1 @@
            	            	--478508587
            	            	+1909628612072081877
            	Test:       	TestTextMapPropagator/InvalidTraceTags

The failing line is

assert.Equal(t, strconv.Itoa(int(childSpanID)), dst["x-datadog-parent-id"])

But int is 32 bits on 32-bit platforms and the right hand side comes from a 64-bit integer. I don't think this affects the correctness of the actual code, just the tests, though.

Copy link
Contributor

@katiehockman katiehockman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment, otherwise LGTM

ddtrace/tracer/payload.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tracer: unaligned atomic operation causes panic
4 participants