Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing: reduce memory pressure throughout #64233

Merged
merged 1 commit into from May 18, 2021

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Apr 26, 2021

This commit attempts to reduce the memory overhead of tracing by doing a
few things, guided mostly by BenchmarkTracing and kv95 (see results
below). In decreasing order of impact:

  • We no longer materialize recordings when the trace is "empty" or
    non-verbose. When traces straddle RPC boundaries, we serialize the
    recording and to send it over the wire. For "empty" traces (ones with
    no structured recordings or log messages), this is needlessly
    allocation-heavy. We end up shipping the trace skeleton (parent-child
    span relationships, and the operation names + metadata that identify
    each span). Not only does this allocate within pkg/util/tracing, it
    also incurs allocations within gRPC where all this serialization is
    taking place. This commit takes the (opinionated) stance that we can
    avoid materializing these recordings altogether.
    We do this by having each span hold onto a reference to the rootspan,
    and updating an atomic value on the rootspan whenever anything is
    recorded. When materializing the recording, from any span in the tree,
    we can consult the root span to see if the trace was non-empty, and
    allocate if so.

  • We pre-allocate a small list of child pointer slots for spans to refer
    to, instead of allocating every time a child span is created.

  • Span tags aren't rendered unless the span is verbose, but we were
    still collecting them previously in order to render them if verbosity
    was toggled. Toggling an active span's verbosity rarely every happens,
    so now we don't allocate for tags unless the span is already verbose.
    This also lets us avoid the WithTags option, which allocates even if
    the span will end up dropping tags.

  • We were previously allocating SpanMeta, despite the objects being
    extremely shortlived in practice (they existed primarily to pass on
    the parent span's metadata to the newly created child spans). We now
    pass SpanMetas by value.

  • We can make use of explicit carriers when extracting SpanMeta from
    remote spans, instead of accessing data through the Carrier interface.
    This helps reduce SpanMeta allocations, at the cost of duplicating some
    code.

  • metadata.MD, used to carry span metadata across gRPC, is also
    relatively short-lived (existing only for the duration of the RPC).
    Its API is also relatively allocation-heavy (improved with
    metadata: reduce memory footprint in FromOutgoingContext grpc/grpc-go#4360), allocating for every key
    being written. Tracing has a very specific usage pattern (adding to
    the metadata.MD only the trace and span ID), so we can pool our
    allocations here.

  • We can slightly improve lock contention around the tracing registry by
    locking only when we're dealing with rootspans. We can also avoid
    computing the span duration outside the critical area.


Before this PR, comparing traced scans vs. not:

name                                 old time/op    new time/op    delta
Tracing/Cockroach/Scan1-24              403µs ± 3%     415µs ± 1%   +2.82%  (p=0.000 n=10+9)
Tracing/MultinodeCockroach/Scan1-24     878µs ± 1%     975µs ± 6%  +11.07%  (p=0.000 n=10+10)

name                                 old alloc/op   new alloc/op   delta
Tracing/Cockroach/Scan1-24             23.2kB ± 2%    29.8kB ± 2%  +28.64%  (p=0.000 n=10+10)
Tracing/MultinodeCockroach/Scan1-24    76.5kB ± 5%    95.1kB ± 1%  +24.31%  (p=0.000 n=10+10)

name                                 old allocs/op  new allocs/op  delta
Tracing/Cockroach/Scan1-24                235 ± 2%       258 ± 1%   +9.50%  (p=0.000 n=10+9)
Tracing/MultinodeCockroach/Scan1-24       760 ± 1%       891 ± 1%  +17.20%  (p=0.000 n=9+10)

After this PR:

name                                 old time/op    new time/op    delta
Tracing/Cockroach/Scan1-24              437µs ± 4%     443µs ± 3%     ~     (p=0.315 n=10+10)
Tracing/MultinodeCockroach/Scan1-24     925µs ± 2%     968µs ± 1%   +4.62%  (p=0.000 n=10+10)

name                                 old alloc/op   new alloc/op   delta
Tracing/Cockroach/Scan1-24             23.3kB ± 3%    26.3kB ± 2%  +13.24%  (p=0.000 n=10+10)
Tracing/MultinodeCockroach/Scan1-24    77.0kB ± 4%    81.7kB ± 3%   +6.08%  (p=0.000 n=10+10)

name                                 old allocs/op  new allocs/op  delta
Tracing/Cockroach/Scan1-24                236 ± 1%       241 ± 1%   +2.45%  (p=0.000 n=9+9)
Tracing/MultinodeCockroach/Scan1-24       758 ± 1%       793 ± 2%   +4.67%  (p=0.000 n=10+10)

kv95/enc=false/nodes=1/cpu=32 across a few runs also shows a modest
improvement before and after this PR, using a sampling rate of 10%:

36929.02 v. 37415.52  (reads/s)   +1.30%
 1944.38 v.  1968.94  (writes/s)  +1.24%

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif force-pushed the 210419.tracing-perf branch 4 times, most recently from 5ff9815 to 99c96b8 Compare April 30, 2021 20:00
@irfansharif irfansharif requested a review from a team April 30, 2021 20:00
@irfansharif irfansharif force-pushed the 210419.tracing-perf branch 2 times, most recently from e513da3 to 4a794d0 Compare April 30, 2021 20:01
@irfansharif irfansharif changed the title [wip,dnm] tracing: peephole optimizations round=2 tracing: reduce memory pressure throughout Apr 30, 2021
@irfansharif irfansharif removed the request for review from a team April 30, 2021 20:02
@irfansharif irfansharif requested a review from tbg April 30, 2021 20:10
@irfansharif
Copy link
Contributor Author

+cc @abarganier, @dhartunian, @rimadeodhar.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this took so long! The improvements look reasonable, but I'm worried that the pooling is racy. I also think this should get a close review by the folks on observability-infra, to make sure that everything is documented well. You'll know better who to tag.

Reviewed 13 of 13 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/sql/sem/builtins/builtins.go, line 3736 at r1 (raw file):

				var rootSpan *tracing.Span
				if err := ctx.Settings.Tracer.VisitSpans(func(span *tracing.Span) error {
					if span.TraceID() == traceID || rootSpan != nil {

🤔 don't you want span.TraceID() == traceID && rootSpan == nil?


pkg/util/tracing/crdbspan.go, line 135 at r1 (raw file):

}

func (c *childSpanRefs) list() []*crdbSpan {

This allocs for no good reason, just pointing it out in case this is in a hot path.


pkg/util/tracing/grpc_interceptor.go, line 244 at r1 (raw file):

}

func getPooledMetadata() (md metadata.MD, putBack func(md metadata.MD)) {

I'm seeing that we are basically using this as follows

md, putBack := getPooledMetadata()
mutate(md)
ctx = context.WithValue(ctx, raw{md:md})
cs := newClientStream(ctx)
putBack()
return cs

so... data race? We're putting md into ctx, then returning it to the pool, but there's no reason to assume that it wouldn't be read from the ctx later.


pkg/util/tracing/tracer.go, line 410 at r1 (raw file):

		helper.crdbSpan.rootSpan = rootSpan
	} else {
		helper.crdbSpan.traceEmpty.Store(true)

Does this allocate on every call? I assume it doesn't have to, but not sure it's smart enough to figure it out. Curious why you didn't use an atomic which would also keep crdbSpan smaller.


pkg/util/tracing/tracer.go, line 790 at r1 (raw file):

// differing only in the explicit carrier type. We do this to avoid allocating
// on the heap when iterating over kvs through the Carrier interface.
func (t *Tracer) extractMetaFromMapCarrier(carrier MapCarrier) (SpanMeta, error) {

Rather than duplicating this, can we remove MapCarrier? I think it's only used in two places, and these don't care about the particular type of carrier and could use a metadata carrier instead. If we have duplication like this we need testing that it doesn't diverge, and that will hurt given the amount of code duplicated here.

Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)


pkg/sql/sem/builtins/builtins.go, line 3736 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

🤔 don't you want span.TraceID() == traceID && rootSpan == nil?

ha, wow, I would've failed this interview. Done.


pkg/util/tracing/crdbspan.go, line 135 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This allocs for no good reason, just pointing it out in case this is in a hot path.

Done.


pkg/util/tracing/grpc_interceptor.go, line 244 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I'm seeing that we are basically using this as follows

md, putBack := getPooledMetadata()
mutate(md)
ctx = context.WithValue(ctx, raw{md:md})
cs := newClientStream(ctx)
putBack()
return cs

so... data race? We're putting md into ctx, then returning it to the pool, but there's no reason to assume that it wouldn't be read from the ctx later.

Good catch, it's not applicable here but still should be for unary RPCs, which I think is most things. Removed.


pkg/util/tracing/tracer.go, line 410 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Does this allocate on every call? I assume it doesn't have to, but not sure it's smart enough to figure it out. Curious why you didn't use an atomic which would also keep crdbSpan smaller.

You're right, it doesn't. Changed it to use an atomic int instead (I thought they'd be equally performant, alloc-wise, but I see it isn't).


pkg/util/tracing/tracer.go, line 790 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Rather than duplicating this, can we remove MapCarrier? I think it's only used in two places, and these don't care about the particular type of carrier and could use a metadata carrier instead. If we have duplication like this we need testing that it doesn't diverge, and that will hurt given the amount of code duplicated here.

Don't think we can. The MapCarrier (map[string]string) representation is serialized over the wire. On the other end of the RPC, we could convert this representation into a metadataCarrier (map[string][]string), but doing so would incur allocations which we're trying to avoid. To start using the metadataCarrier upstream of raft, for backwards compatibility reasons we'd have to do it over multiple releases, keeping this duplicated code around anyway.


Actually, found a way to do it without duplicating all the code. PTAL.

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier and @irfansharif)


pkg/util/tracing/grpc_interceptor.go, line 305 at r2 (raw file):

Seeing as how invokeCtx's scope extends as far as our use of the metadata

what does this mean? If I had to explain why this is safe, I would say

This context is only used on the sending path of gRPC, and we trust that gRPC (and any other client interceptors stacked under us) won't access the embedded md object after it has returned.

But ultimately, this isn't really inherently safe. Is this a big part of the perf wins? I think we shold back this out if it isn't.

@tbg
Copy link
Member

tbg commented May 7, 2021

PR message needs an update. I'll leave the actual review to @abarganier in the interest of building ownership muscle 💪🏽

@irfansharif
Copy link
Contributor Author

(bump, I'll get to the rest of the proposed changes here after soliciting another round of reviews from y'all)

Copy link
Member

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave the actual review to @abarganier in the interest of building ownership muscle 💪🏽

Thanks @tbg! And sorry for lagging on this @irfansharif - I'll be sure to chime in by EOD tomorrow

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @irfansharif, and @rimadeodhar)

Copy link
Collaborator

@rimadeodhar rimadeodhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM. My apologies for asking some fairly basic questions but needed to build some context on some of this code.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @irfansharif)


pkg/util/tracing/crdbspan.go, line 109 at r2 (raw file):

}

func (c *childSpanRefs) len() int {

Nit: The len and add methods could be made simpler by adding a variable to childSpanRefs which stores the current spanLen. So the add method would look something like this:
func (c *childSpanRefs) add(ref *crdbSpan) {
if c.spanLen < len(c.preAllocated) {
c.preAllocated[c.spanLen] = ref
}
if maxChildrenPerSpan - spanLen > 0 {
c.overflow = append(c.overflow, ref)
}
spanLen++
}

and len() simply returns spanLen.

However, given that tracing library is more memory constrained than cpu constrained, it might be okay to leave these methods as is in favor of not adding an additional variable.


pkg/util/tracing/grpc_interceptor.go, line 144 at r2 (raw file):

		serverSpan.SetTag(gRPCComponentTag.Key, gRPCComponentTag.Value)
		serverSpan.SetTag(ext.SpanKindRPCServer.Key, ext.SpanKindRPCServer.Value)

Why have we moved these out of the previous calls?


pkg/util/tracing/grpc_interceptor.go, line 254 at r2 (raw file):

Quoted 4 lines of code…
		if md.Len() == 2 && len(md[fieldNameTraceID]) == 1 && len(md[fieldNameSpanID]) == 1 {
			md[fieldNameTraceID][0] = "0"
			md[fieldNameSpanID][0] = "0"
			grpcMetadataPool.Put(md)

I don't fully understand what this is doing here? Why do we only add it to the pool if these conditions are met?

Copy link
Member

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay on this. Overall LGTM as well, piggybacking onto one of @rimadeodhar's questions and a convenience nit/curiosity question.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @irfansharif, and @rimadeodhar)


pkg/util/tracing/crdbspan.go, line 511 at r2 (raw file):

Quoted 8 lines of code…
    var children []*crdbSpan
    for i := 0; i < s.mu.recording.children.len(); i++ {
         children = append(children, s.mu.recording.children.get(i))
    }
    ...
    for _, child := range children {
         child.setVerboseRecursively(to)
    }

This is more of a nit/curiosity question than anything else. Shallow copying children into a slice to then perform some operation on each child span appears to be somewhat common. Would a convenience method for applying some operation to each child make sense?

// Only requires a single pass of the entirety of s.mu.recording.children.len()
func (c *childSpanRefs) apply(func(c *crdbSpan)) {
    // Apply provided fn first to preAllocated, then overflow
} 

Then you could do...

s.mu.recording.children.apply(func(child *crdbSpan) {
    child.setVerboseRecursively(to)
})

I know we're memory constrained here, so I'm also curious how something like this would compare memory-wise to the shallow copy.


pkg/util/tracing/grpc_interceptor.go, line 254 at r2 (raw file):

Previously, rimadeodhar wrote…
		if md.Len() == 2 && len(md[fieldNameTraceID]) == 1 && len(md[fieldNameSpanID]) == 1 {
			md[fieldNameTraceID][0] = "0"
			md[fieldNameSpanID][0] = "0"
			grpcMetadataPool.Put(md)

I don't fully understand what this is doing here? Why do we only add it to the pool if these conditions are met?

+1, also wondering about this

This commit attempts to reduce the memory overhead of tracing by doing a
few things, guided mostly by BenchmarkTracing and kv95 (see results
below). In decreasing order of impact:

- We no longer materialize recordings when the trace is "empty" or
  non-verbose. When traces straddle RPC boundaries, we serialize the
  recording and to send it over the wire. For "empty" traces (ones with
  no structured recordings or log messages), this is needlessly
  allocation-heavy. We end up shipping the trace skeleton (parent-child
  span relationships, and the operation names + metadata that identify
  each span). Not only does this allocate within pkg/util/tracing, it
  also incurs allocations within gRPC where all this serialization is
  taking place. This commit takes the (opinionated) stance that we can
  avoid materializing these recordings altogether.
  We do this by having each span hold onto a reference to the rootspan,
  and updating an atomic value on the rootspan whenever anything is
  recorded. When materializing the recording, from any span in the tree,
  we can consult the root span to see if the trace was non-empty, and
  allocate if so.

- We pre-allocate a small list of child pointer slots for spans to refer
  to, instead of allocating every time a child span is created.

- Span tags aren't rendered unless the span is verbose, but we were
  still collecting them previously in order to render them if verbosity
  was toggled. Toggling an active span's verbosity rarely every happens,
  so now we don't allocate for tags unless the span is already verbose.
  This also lets us avoid the WithTags option, which allocates even if
  the span will end up dropping tags.

- We were previously allocating SpanMeta, despite the objects being
  extremely shortlived in practice (they existed primarily to pass on
  the parent span's metadata to the newly created child spans). We now
  pass SpanMetas by value.

- When extracting SpanMetas for remote spans, through Carriers, instead
  of doing it through an interface (which incurs allocations), we can use
  explicit types.

- We can slightly improve lock contention around the tracing registry by
  locking only when we're dealing with rootspans. We can also avoid
  computing the span duration outside the critical area.

---

Before this PR, comparing traced scans vs. not:

    name                                 old time/op    new time/op    delta
    Tracing/Cockroach/Scan1-24              403µs ± 3%     415µs ± 1%   +2.82%  (p=0.000 n=10+9)
    Tracing/MultinodeCockroach/Scan1-24     878µs ± 1%     975µs ± 6%  +11.07%  (p=0.000 n=10+10)

    name                                 old alloc/op   new alloc/op   delta
    Tracing/Cockroach/Scan1-24             23.2kB ± 2%    29.8kB ± 2%  +28.64%  (p=0.000 n=10+10)
    Tracing/MultinodeCockroach/Scan1-24    76.5kB ± 5%    95.1kB ± 1%  +24.31%  (p=0.000 n=10+10)

    name                                 old allocs/op  new allocs/op  delta
    Tracing/Cockroach/Scan1-24                235 ± 2%       258 ± 1%   +9.50%  (p=0.000 n=10+9)
    Tracing/MultinodeCockroach/Scan1-24       760 ± 1%       891 ± 1%  +17.20%  (p=0.000 n=9+10)

After this PR:

    name                                 old time/op    new time/op    delta
    Tracing/Cockroach/Scan1-24              437µs ± 4%     443µs ± 3%     ~     (p=0.315 n=10+10)
    Tracing/MultinodeCockroach/Scan1-24     925µs ± 2%     968µs ± 1%   +4.62%  (p=0.000 n=10+10)

    name                                 old alloc/op   new alloc/op   delta
    Tracing/Cockroach/Scan1-24             23.3kB ± 3%    26.3kB ± 2%  +13.24%  (p=0.000 n=10+10)
    Tracing/MultinodeCockroach/Scan1-24    77.0kB ± 4%    81.7kB ± 3%   +6.08%  (p=0.000 n=10+10)

    name                                 old allocs/op  new allocs/op  delta
    Tracing/Cockroach/Scan1-24                236 ± 1%       241 ± 1%   +2.45%  (p=0.000 n=9+9)
    Tracing/MultinodeCockroach/Scan1-24       758 ± 1%       793 ± 2%   +4.67%  (p=0.000 n=10+10)

---

kv95/enc=false/nodes=1/cpu=32 across a few runs also shows a modest
improvement before and after this PR, using a sampling rate of 10%:

    36929.02 v. 37415.52  (reads/s)   +1.30%
     1944.38 v.  1968.94  (writes/s)  +1.24%

Release note: None
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! I've removed the metadata allocations so it should be less controversial. I'll not immediately go bump our gRPC dep to pick up grpc/grpc-go#4360, I assume we will at some point, but that should help size down the allocation profiles for tracing well.

Can a brother get a green stamp?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, @rimadeodhar, and @tbg)


pkg/util/tracing/crdbspan.go, line 109 at r2 (raw file):

Previously, rimadeodhar wrote…

Nit: The len and add methods could be made simpler by adding a variable to childSpanRefs which stores the current spanLen. So the add method would look something like this:
func (c *childSpanRefs) add(ref *crdbSpan) {
if c.spanLen < len(c.preAllocated) {
c.preAllocated[c.spanLen] = ref
}
if maxChildrenPerSpan - spanLen > 0 {
c.overflow = append(c.overflow, ref)
}
spanLen++
}

and len() simply returns spanLen.

However, given that tracing library is more memory constrained than cpu constrained, it might be okay to leave these methods as is in favor of not adding an additional variable.

Good idea, done. The pre-allocated array here only helps remove the total number of individual allocations, by allocating 4 pointers all at once. The actual size of this struct matters less, so adding a refcount is more than fine. Thanks!


pkg/util/tracing/crdbspan.go, line 511 at r2 (raw file):

Previously, abarganier (Alex Barganier) wrote…
    var children []*crdbSpan
    for i := 0; i < s.mu.recording.children.len(); i++ {
         children = append(children, s.mu.recording.children.get(i))
    }
    ...
    for _, child := range children {
         child.setVerboseRecursively(to)
    }

This is more of a nit/curiosity question than anything else. Shallow copying children into a slice to then perform some operation on each child span appears to be somewhat common. Would a convenience method for applying some operation to each child make sense?

// Only requires a single pass of the entirety of s.mu.recording.children.len()
func (c *childSpanRefs) apply(func(c *crdbSpan)) {
    // Apply provided fn first to preAllocated, then overflow
} 

Then you could do...

s.mu.recording.children.apply(func(child *crdbSpan) {
    child.setVerboseRecursively(to)
})

I know we're memory constrained here, so I'm also curious how something like this would compare memory-wise to the shallow copy.

In this case it wouldn't have allocated, cause we're simple calling setVerboseRecursively on each element, but in the general case where we collect responses from each element would allocate on the heap as Go needs to ensure the longevity across closure lifetimes. I think for that reason this pattern isn't a super familiar one in Go (also, in case you haven't noticed, we revel in the verbosity). I do agree it makes things cleaner though.

Example where we're collecting results per child:

	for _, child := range children {
		result = append(result, child.getRecording(everyoneIsV211, wantTags)...)
	}

pkg/util/tracing/grpc_interceptor.go, line 144 at r2 (raw file):

Previously, rimadeodhar wrote…
		serverSpan.SetTag(gRPCComponentTag.Key, gRPCComponentTag.Value)
		serverSpan.SetTag(ext.SpanKindRPCServer.Key, ext.SpanKindRPCServer.Value)

Why have we moved these out of the previous calls?

The WithTags option allocates even if the underlying span is going to end up being non-verbose, and will drop the tags anyway. The SetTag API on the other hand can avoid allocating cause it has visibility into the fully constructed span, and can see whether or not it's verbose.


pkg/util/tracing/grpc_interceptor.go, line 254 at r2 (raw file):

Previously, abarganier (Alex Barganier) wrote…

+1, also wondering about this

This has gone away cause I've since given up on trying to pool metadata objects, but the idea here was to only pool metadata.MD objects that pkg/tracing mutated. It's possible that the caller into package tracing had also embedded into ctx metadata of its own. It'd be erroneous to pool those objects, and possibly re-use that metadata for subsequent requests that didn't intend to specify that metadata.


pkg/util/tracing/grpc_interceptor.go, line 305 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Seeing as how invokeCtx's scope extends as far as our use of the metadata

what does this mean? If I had to explain why this is safe, I would say

This context is only used on the sending path of gRPC, and we trust that gRPC (and any other client interceptors stacked under us) won't access the embedded md object after it has returned.

But ultimately, this isn't really inherently safe. Is this a big part of the perf wins? I think we shold back this out if it isn't.

Done. It isn't a big part, though not super small either. Seeing as how we've collapsed empty recordings away, that gave us most of the wins here, so removed.

Copy link
Member

@abarganier abarganier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 13 files at r1, 2 of 4 files at r2, 2 of 2 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, and @rimadeodhar)

@irfansharif
Copy link
Contributor Author

Woot!

bors r=abarganier

Copy link
Collaborator

@rimadeodhar rimadeodhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @abarganier, @dhartunian, and @irfansharif)


pkg/util/tracing/grpc_interceptor.go, line 144 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

The WithTags option allocates even if the underlying span is going to end up being non-verbose, and will drop the tags anyway. The SetTag API on the other hand can avoid allocating cause it has visibility into the fully constructed span, and can see whether or not it's verbose.

Aah I see, thanks for the explanation!


pkg/util/tracing/grpc_interceptor.go, line 254 at r2 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This has gone away cause I've since given up on trying to pool metadata objects, but the idea here was to only pool metadata.MD objects that pkg/tracing mutated. It's possible that the caller into package tracing had also embedded into ctx metadata of its own. It'd be erroneous to pool those objects, and possibly re-use that metadata for subsequent requests that didn't intend to specify that metadata.

Thanks! That makes sense.


pkg/util/tracing/tracer.go, line 598 at r3 (raw file):

}

// var noopSpanMeta = (*SpanMeta)(nil)

Can we delete this comment?

@craig
Copy link
Contributor

craig bot commented May 18, 2021

Build succeeded:

@craig craig bot merged commit 73bcba9 into cockroachdb:master May 18, 2021
@irfansharif irfansharif deleted the 210419.tracing-perf branch May 19, 2021 01:44
Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/sem/builtins/builtins.go, line 3736 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

ha, wow, I would've failed this interview. Done.

Is this code correct? What makes this truly return the root span and not just some random span from the trace - the first in a random iteration order?

Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/sem/builtins/builtins.go, line 3736 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

Is this code correct? What makes this truly return the root span and not just some random span from the trace - the first in a random iteration order?

The map it's iterating over only contains root spans:

// activeSpans is a map that references all non-Finish'ed local root spans,
// i.e. those for which no WithLocalParent(<non-nil>) option was supplied.
// The map is keyed on the span ID, which is deterministically unique.
//
// In normal operation, a local root Span is inserted on creation and
// removed on .Finish().
//
// The map can be introspected by `Tracer.VisitSpans`. A Span can also be
// retrieved from its ID by `Tracer.GetActiveSpanFromID`.
activeSpans struct {

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/sem/builtins/builtins.go, line 3736 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

The map it's iterating over only contains root spans:

// activeSpans is a map that references all non-Finish'ed local root spans,
// i.e. those for which no WithLocalParent(<non-nil>) option was supplied.
// The map is keyed on the span ID, which is deterministically unique.
//
// In normal operation, a local root Span is inserted on creation and
// removed on .Finish().
//
// The map can be introspected by `Tracer.VisitSpans`. A Span can also be
// retrieved from its ID by `Tracer.GetActiveSpanFromID`.
activeSpans struct {

oh doh I forgot cause I'm in the middle of mucking with that.
But then is the rootSpan != nil condition necessary?

Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/sem/builtins/builtins.go, line 3736 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

oh doh I forgot cause I'm in the middle of mucking with that.
But then is the rootSpan != nil condition necessary?

Hm, no, we probably want to return a iterutil.StopIteration() when we find the first (and only) matching root span.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants