New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
channelz: include channelz identifier in logs #5192
Conversation
07ec1d6
to
606f3e1
Compare
606f3e1
to
116dade
Compare
Sample new output:
|
There is a small inconsistency between logs which show up as a result of calling one of the log functions to the ones which show up as a result of calling the trace function. This is from the trace function:
This is from the log function with the prefix.
Based on our offline discussion, I initially attempted to remove calls to So, this small inconsistency exists. But I think, this is still better than what we have currently, and we can possibly address it in the future. |
Why are two log lines for subchannel created/deleted? E.g.
|
We add a trace event on the subChannel with an event on the parent. The |
Maybe the parent can say "creating"? |
The Also, having the trace function add similar prefix is slightly tricky since it has to the same the same for the parent as well, and it will have to look up the channelz database to determine the parent and do this. If you think it is worth it, I can try doing it. |
cc.channelzID = channelz.RegisterChannel(&channelzChannel{cc}, 0, target) | ||
channelz.Info(logger, cc.channelzID, "Channel Created") | ||
pid := cc.dopts.channelzParentID | ||
cc.channelzID = channelz.RegisterChannel(&channelzChannel{cc}, pid, target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If channelz is not turned on, those IDs will all be nil, and then the logs will all have [channel(id:0)]
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, they all have []
(empty string)
tlogger.go:116: INFO clientconn.go:106 [core] Channel(id:0) created (t=+498.514µs)
tlogger.go:116: INFO clientconn.go:1624 [core] [] original dial target is: "whatever:///test.server" (t=+559.064µs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can generate channelz IDs even if channelz is off. It's just a self incrementing int anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the change to generate channelz IDs even when channelz is OFF. This also required changes to some tests. I've captured these changes as a separate commit. 786e6be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try deleting the ==nil
checks in channelz ID's methods?
Should all channelz IDs be non-nil now? Maybe not true in tests. But still might be interesting to try.
clientconn.go
Outdated
@@ -1319,7 +1321,7 @@ func (ac *addrConn) createTransport(addr resolver.Address, copts transport.Conne | |||
newTr, err := transport.NewClientTransport(connectCtx, ac.cc.ctx, addr, copts, func() { prefaceReceived.Fire() }, onGoAway, onClose) | |||
if err != nil { | |||
// newTr is either nil, or closed. | |||
channelz.Warningf(logger, ac.channelzID, "grpc: addrConn.createTransport failed to connect to %v. Err: %v", addr, err) | |||
channelz.Warningf(logger, ac.channelzID, "grpc: addrConn.createTransport failed to connect to %s. Err: %v", pretty.ToJSON(addr), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's before and after?
And maybe define a String()
on Address
(it can be implemented as pretty.ToJSON)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On inconsistency between channelz.Warningf
and channelz.AddTraceEvent
.
I don't even remember why we need channelz.Warningf
. Can't they all just call channelz.AddTraceEvent
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented a String()
method on Address
which calls pretty.ToJSON
.
With regards to channelz.Warningf
vs channelz.AddTraceEvent
, I think the former is much easier to call, and we seem to be using the latter only when we want to trace event on a child to add a reference in the parent as well (or the other way around).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
channelz.Warningf
and channelz.AddTraceEvent
are almost the same I think. AddTraceEvent
just copy the Warningf
parameters to a struct.
I'm OK to keep Warningf
as a thin wrapper, and so it looks more like logging.
How about the IsOn()
checks in Warningf
? Can those be deleted?
AddTraceEvent
also checks IsOn()
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the logging funcs thin wrappers around AddTraceEvent
.
clientconn.go
Outdated
channelz.AddTraceEvent(logger, ac.channelzID, 0, &channelz.TraceEventDesc{ | ||
Desc: "Subchannel Deleted", | ||
channelz.AddTraceEvent(logger, ac.channelzID, 0, &channelz.TraceEventDesc{ | ||
Desc: fmt.Sprintf("Subchannel(id:%d) deleted", ac.channelzID.Int()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The id
part in the Desc
seems to be duplicate with the other parameter ac.channelzID
.
And this Desc
is also used in channelz trace events. So you will see in subchannel 2
's events, one called subchannel 2 deleted
.
Can channelz.AddTraceEvent
prepend the channelz ID before calling log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can channelz.AddTraceEvent prepend the channelz ID before calling log?
The issue is that the implementation of the logging functions call AddTraceEvent
. And the prepend the channelz ID in the logging functions.
The other issue is that trace events have pointers to parent events. But these dont have the parent's ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change it so that the channelz ID is prepended before calling the real logging functions (grpclog
's logging functions)?
And parent is optional, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, parent is optional. And we specify parent in trace descriptions in only very few cases:
- nested channel add/delete
- subchannel add/delete
I pushed a commit to change prepending of ID to be done right before the actual calling function. But this still doesn't help with duplicate entries like this:
tlogger.go:116: INFO clientconn.go:1499 [core] [Channel #1 Channel #2 SubChannel #3] Subchannel(id:3) deleted (t=+1.623202ms)
tlogger.go:116: INFO clientconn.go:1499 [core] [Channel #1 Channel #2 SubChannel #3] Subchannel(id:3) deleted (t=+1.632913ms)
Or almost duplicate entries like this:
tlogger.go:116: INFO clientconn.go:1099 [core] [Channel #1 Channel #2] Channel(id:2) deleted (t=+1.660121ms)
tlogger.go:116: INFO clientconn.go:1099 [core] [Channel #1 Channel #2] Nested channel(id:2) deleted (t=+1.691113ms)
The reason for the above is because we specify a parent trace description. And in the first log output, the trace description on the actual entity and the parent are the same Subchannel(id:3) deleted
. And since we cannot determine the parent ID in AddTraceEvent
, we end up printing the parent's trace description also with the child's ID.
A few options to make things better:
- Get rid of including the parent information in the ID. But include some other detail (like target name for channel, IP address for subchannel etc). This would mean that the following print
tlogger.go:116: INFO clientconn.go:1499 [core] [Channel #1 Channel #2 SubChannel #3] Subchannel(id:3) deleted (t=+1.623202ms)
would instead look something like:
tlogger.go:116: INFO clientconn.go:1499 [core] [SubChannel #3 IP-Address] Subchannel(id:3) deleted (t=+1.623202ms)
This will still not get rid of the duplicate entry. But it will make the description for every entity a little bit more meaningful, and the option below helps with duplicate entries.
- When including parent trace descriptions, ensure that the entries are not duplicate. So, when creating a subChannel for example, on the actual subchannel, we can say something like:
[SubChannel #3 IP-Address] Subchannel(id:3) created
and for the parent trace description, we can include information about the parent in the message itself. So, it would be something like this:
[SubChannel #3] Subchannel(id:3) created within Channel(id:2)
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the ongoing discussion in previous comments.
Made the changes discussed in our offline discussion:
And there will not be a duplicate entry. The trace description added on the parent will include the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can you post the new logs after the change?
And another reason to not include IPs for SubChannel: the IP can change. |
|
That is true. |
Summary of changes:
Identifier
to uniquely identify entities in the channelz database.RegisterXxx
functions and will be accepted by log/trace functions.RegisterXxx
andRemoveEntry
functions work gracefully when channelz is not turned ON.WithChannelzParentID
dialoption to accept a*channelz.Identifier
.ChannelzParentID
field in balancer.BuildOptions to be of type*channelz.Identifier
.DialOption
and throughbalancer.BuildOptions
.RELEASE NOTES: