Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding locks where context is accessed #528

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
41 changes: 23 additions & 18 deletions span.go
Expand Up @@ -99,6 +99,9 @@ func (s *Span) SetTag(key string, value interface{}) opentracing.Span {
return s.setTagInternal(key, value, true)
}

// setTagInternal sets tags in a thread-safe manner if lock param is set to true.
// lock param can be set to false if concurrent access in not expected like when span is created.
// The caller shouldn't obtain any lock on the span while calling this as it will lead to a deadlock.
func (s *Span) setTagInternal(key string, value interface{}, lock bool) opentracing.Span {
s.observer.OnSetTag(key, value)
if key == string(ext.SamplingPriority) && !setSamplingPriority(s, value) {
Expand All @@ -120,8 +123,8 @@ func (s *Span) setTagInternal(key string, value interface{}, lock bool) opentrac

// SpanContext returns span context
func (s *Span) SpanContext() SpanContext {
s.Lock()
defer s.Unlock()
s.RLock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a number of other places that hold .Lock() when only .RLock() is needed. Seeing .StartTime(), .Duration(), .Tags() for instance. Probably not in the scope of this fix though.

defer s.RUnlock()
return s.context
}

Expand Down Expand Up @@ -345,7 +348,7 @@ func (s *Span) FinishWithOptions(options opentracing.FinishOptions) {
decision := s.tracer.sampler.OnFinishSpan(s)
s.applySamplingDecision(decision, true)
}
if s.context.IsSampled() {
if s.SpanContext().IsSampled() {
s.Lock()
s.fixLogsIfDropped()
if len(options.LogRecords) > 0 || len(options.BulkLogData) > 0 {
Expand All @@ -366,8 +369,8 @@ func (s *Span) FinishWithOptions(options opentracing.FinishOptions) {

// Context implements opentracing.Span API
func (s *Span) Context() opentracing.SpanContext {
s.Lock()
defer s.Unlock()
s.RLock()
defer s.RUnlock()
return s.context
}

Expand Down Expand Up @@ -425,12 +428,15 @@ func (s *Span) serviceName() string {
return s.tracer.serviceName
}

// applySamplingDecision modifies sampling state in a thread-safe manner if lock param is set to true.
// lock param can be set to false if concurrent access in not expected like when span is created.
// The caller shouldn't obtain any lock on the span while calling this as it will lead to a deadlock.
func (s *Span) applySamplingDecision(decision SamplingDecision, lock bool) {
if !decision.Retryable {
s.context.samplingState.setFinal()
s.SpanContext().samplingState.setFinal()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The presence of the lock parameter in this function indicates that it might be called while the lock on the span is already acquired by the current goroutine. What would happen in that case if we call SpanContext() that will try to acquire a read-lock? I remember in the past that in Go the RWMutex is not re-entrable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah rwmutex is not re-entrable, but so far this is not called in places where the caller already obtains a lock, the method is thread-safe in itself, so the caller shouldn't need to obtain locks, added documentation for that. lock param is set to false only in the start span method where too no locks are obtained.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think it's dangerous to ignore the lock parameter even if right now there doesn't happen to be a codepath that deadlocks. Perhaps access s.context directly here and move the if lock code to the top?

Alternatively the way it's been changed we're always locking and we've already taken the performance hit. We could just remove the lock param. Depending on benchmarks this may be an option.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is true, I started making changes that way to apply lock while calling all these methods, but this method also calls sampler and observer callbacks like sampler.OnSetOperationName(), etc., which won't be in the same package and have to call with locks, and there is high risk of these calling locks than callers to these internal methods.

We can call locks around these call backs, the code is going to lock and unlock twice at-least in each method, one before and after sampler callbacks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func (s *Span) setTagInternal(key string, value interface{}, lock bool) opentracing.Span {
	s.observer.OnSetTag(key, value)
	if lock {
		s.RLock()
	}
	if key == string(ext.SamplingPriority) && !setSamplingPriority(s, value) {
		return s
	}
	if !s.isSamplingFinalized() {
		if lock {
			s.RUnlock()
		}
		decision := s.tracer.sampler.OnSetTag(s, key, value)
		if lock {
			s.Lock()
			defer s.Unlock()
		}
		s.applySamplingDecision(decision, lock)
	} else {
		if lock {
			s.Lock()
			defer s.Unlock()
		}
	}
	if s.isWriteable() {
		s.appendTagNoLocking(key, value)
	}
	return s
}

Copy link
Author

@keer25 keer25 Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how some three methods in span.go would look, if this looks good we can go with this.
As a rule not a good idea to hold a lock while calling external methods (like sampler callbacks), but internal methods we can document if it is to be called with or without locks. And I believe the lock parameter was just so locks are not called when we need not bother about concurrency like when the span is started.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some benchmarks can tell which approach is better, I will add some micro benchmarks over the weekend

}
if decision.Sample {
s.context.samplingState.setSampled()
s.SpanContext().samplingState.setSampled()
if len(decision.Tags) > 0 {
if lock {
s.Lock()
Expand All @@ -445,12 +451,12 @@ func (s *Span) applySamplingDecision(decision SamplingDecision, lock bool) {

// Span can be written to if it is sampled or the sampling decision has not been finalized.
func (s *Span) isWriteable() bool {
state := s.context.samplingState
state := s.SpanContext().samplingState
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isWriteable and isSamplingFinalized are called beneath setTagInternal which receives a lock param that is ignored here.

return !state.isFinal() || state.isSampled()
}

func (s *Span) isSamplingFinalized() bool {
return s.context.samplingState.isFinal()
return s.SpanContext().samplingState.isFinal()
}

// setSamplingPriority returns true if the flag was updated successfully, false otherwise.
Expand All @@ -469,26 +475,25 @@ func setSamplingPriority(s *Span, value interface{}) bool {
if !ok {
return false
}
ctx := s.SpanContext()
if val == 0 {
s.context.samplingState.unsetSampled()
s.context.samplingState.setFinal()
ctx.samplingState.unsetSampled()
ctx.samplingState.setFinal()
return true
}
if s.tracer.options.noDebugFlagOnForcedSampling {
s.context.samplingState.setSampled()
s.context.samplingState.setFinal()
ctx.samplingState.setSampled()
ctx.samplingState.setFinal()
return true
} else if s.tracer.isDebugAllowed(s.operationName) {
s.context.samplingState.setDebugAndSampled()
s.context.samplingState.setFinal()
ctx.samplingState.setDebugAndSampled()
ctx.samplingState.setFinal()
return true
}
return false
}

// EnableFirehose enables firehose flag on the span context
func EnableFirehose(s *Span) {
s.Lock()
defer s.Unlock()
s.context.samplingState.setFirehose()
s.SpanContext().samplingState.setFirehose()
}
11 changes: 10 additions & 1 deletion span_test.go
Expand Up @@ -363,7 +363,6 @@ func TestSpan_References(t *testing.T) {
}

func TestSpanContextRaces(t *testing.T) {
t.Skip("Skipped: test will panic with -race, see https://github.com/jaegertracing/jaeger-client-go/issues/526")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful to add a few more mutators to this test, namely those that affect samplingState and flags.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah added a few, and also called span.Finish in the end, and found a few more data race conditions in the reportSpan() method and modified it

tracer, closer := NewTracer("test", NewConstSampler(true), NewNullReporter())
defer closer.Close()

Expand Down Expand Up @@ -395,6 +394,16 @@ func TestSpanContextRaces(t *testing.T) {
go accessor(func() {
span.BaggageItem("k")
})
go accessor(func() {
ext.SamplingPriority.Set(span, 0)
})
go accessor(func() {
EnableFirehose(span)
})
go accessor(func() {
span.SpanContext().samplingState.setFlag(flagDebug)
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe reset still races, but unsure if this is worth protecting. This is only called when the syncpool is enabled and generally the lifecycle code seems dangerous. It would be trivial for the calling application to hold a *Span after .Finish() and do terrible things with it so I think we have to trust the calling code.

Added a comment to the main thread with some final thoughts.

time.Sleep(100 * time.Millisecond)
span.Finish()
close(end)
}
4 changes: 2 additions & 2 deletions tracer.go
Expand Up @@ -439,7 +439,7 @@ func (t *Tracer) emitNewSpanMetrics(sp *Span, newTrace bool) {
func (t *Tracer) reportSpan(sp *Span) {
if !sp.isSamplingFinalized() {
t.metrics.SpansFinishedDelayedSampling.Inc(1)
} else if sp.context.IsSampled() {
} else if sp.SpanContext().IsSampled() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store the value of SpanContext().IsSampled() at the top of the method to only call SpanContext() once?

t.metrics.SpansFinishedSampled.Inc(1)
} else {
t.metrics.SpansFinishedNotSampled.Inc(1)
Expand All @@ -448,7 +448,7 @@ func (t *Tracer) reportSpan(sp *Span) {
// Note: if the reporter is processing Span asynchronously then it needs to Retain() the span,
// and then Release() it when no longer needed.
// Otherwise, the span may be reused for another trace and its data may be overwritten.
if sp.context.IsSampled() {
if sp.SpanContext().IsSampled() {
t.reporter.Report(sp)
}

Expand Down