Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: attach debug information about stream to DEADLINE_EXCEEDED #5892

Merged
merged 14 commits into from
Jun 20, 2019

Conversation

zhangkun83
Copy link
Contributor

@zhangkun83 zhangkun83 commented Jun 17, 2019

Works for #4740

  • Subclasses of AbstractClientStream include remote address in insight if available.
  • DelayedStream adds buffered time, and the insight of real stream if it's set.
  • RetriableStream insights outputs of Substreams.
  • ManagedChannelImpl adds channel state. Other information, like last error status, would need API changes to LoadBalancer.

Example error message:

deadline exceeded after 8112071ns. [buffered_nanos=24763, remote_addr=foo.test.google.fr/127.0.0.1:44749]

or

deadline exceeded after 8112071ns. [buffered_nanos=22344324763, waiting_for_connection]

This is related to #4776 but taking a more usage-specific approach.

…XCEEDED

Most cases the message will include a dump of the transport
attributes, which gives important information such as server
addresses.

Example error message:
deadline exceeded after 9699625ns. Channel state: READY. Stream: {delegate=[NettyClientStream attrs={remote-addr=local:in-process-1, io.grpc.internal.GrpcAttributes.securityLevel=NONE, io.grpc.internal.GrpcAttributes.clientEagAttrs={}, local-addr=local:E:c9ab2b05}]
Example output:
deadline exceeded after 8112071ns. TimeoutDetails{channel_state=READY, server_addr=foo.test.google.fr/127.0.0.1:44749}
core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated Show resolved Hide resolved
core/src/main/java/io/grpc/internal/ClientCallImpl.java Outdated Show resolved Hide resolved
@@ -806,6 +807,10 @@ public void setDeadline(Deadline deadline) {
long effectiveTimeout = max(0, deadline.timeRemaining(TimeUnit.NANOSECONDS));
headers.put(TIMEOUT_KEY, effectiveTimeout);
}

@Override
public void appendTimeoutDetails(ToStringHelper toStringHelper) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I think StringBuilder is a better fit.

public void appendTimeoutDetails(StringBuilder buffer) {
  buffer.append(key).append('=').append(value).append(',')
}

The code that finally does toString can trim the last character. Alternatively, this could just be a Map<String, Object>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringBuilder just makes implementation strictly more complex, as I would have to care for formatting. I would be willing to submit to List<String>, but the generic is a hassle with mockito.

Since nothing is ideal, and most look like abuse to some extent, I just created a custom type InsightBuilder. This would give us more flexibility in terms of formatting, just in case we want to make changes in the future.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks fair. It's unfortunate it has to be thread-safe, but I understand why, and it doesn't seem to bother the implementations much. Only concern is bookkeeping for retriable stream. I don't care about ToStringHelper vs StringBuilder. I'm fine with either.

@@ -89,6 +92,10 @@
*/
private final AtomicBoolean noMoreTransparentRetry = new AtomicBoolean();

/** Keeps track of all substreams for informative purposes. */
private final ConcurrentLinkedQueue<Substream> allSubstreams =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about how long we'll keep these streams alive. We could potentially be retaining an entire transport for "no reason."

I think we should add a new Collection<Status> or Collection<Status.Code> to State for the completed streams. The stream's Status can be easily passed to State.substreamClosed(Substream), and so when the stream is removed from drainedSubstreams its results will be present in the other list. There's no need for the other stream data for completed streams (like address) since it isn't normally provided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this PR according to your suggestion. Now it includes the status codes from all closed substreams. @dapengzhang0, according to javadoc on the drainedSubstreams field, it doesn't seem to always include unfinished substreams. Please advice on how to proceed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, I have updated the code and added comments.

@zhangkun83 zhangkun83 changed the title core: attach debug information about channel and stream to DEADLINE_EXCEEDED core: attach debug information about stream to DEADLINE_EXCEEDED Jun 19, 2019
core/src/main/java/io/grpc/internal/InsightBuilder.java Outdated Show resolved Hide resolved
* information.
*/
public final class InsightBuilder {
private final ArrayList<String> buffer = new ArrayList<String>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters here... it would add one more import line.

@zhangkun83
Copy link
Contributor Author

PTAL

core/src/main/java/io/grpc/internal/RetriableStream.java Outdated Show resolved Hide resolved
@Override
public Void answer(InvocationOnMock in) {
InsightBuilder insight = (InsightBuilder) in.getArguments()[0];
insight.appendKeyValue("server_addr", "2.2.2.2:443");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existing terminology in our code I think points to this being a "backend_addr", not a server_addr. (or possibly remote_addr)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to remote_addr.

zhangkun83 added a commit to zhangkun83/grpc-java that referenced this pull request Jun 19, 2019
In grpc#5892 getAttributes() is called without any regard of timing.
Currently DelayedStream.getAttributes() wil throw if called before
passThrough was set.  Just to be safe, we are removing that
restriction and making it clear on the javadoc.

On the other hand, we intend to keep the timing restriction on
ClientCall.getAttributes().
zhangkun83 added a commit to zhangkun83/grpc-java that referenced this pull request Jun 19, 2019
In grpc#5892 getAttributes() is called without any regard of timing.
Currently DelayedStream.getAttributes() wil throw if called before
passThrough was set.  Just to be safe, we are removing that
restriction and making it clear on the javadoc.

On the other hand, we intend to keep the timing restriction on
ClientCall.getAttributes().
public void appendTimeoutInsight(InsightBuilder insight) {
State currentState;
synchronized (lock) {
insight.appendKeyValue("closed", closedSubstreamsInsight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closedSubstreamsInsight.toString()

It is very important that the insight is stringified within the lock. The insight implementation could have chosen to store the Object in the List and serialize at the end.

(Note: this could be considered an argument against passing Objects. Although we deal with this problem with logging already.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InsightBuilder already does that and that's documented in the javadoc.

InsightBuilder substreamInsight = new InsightBuilder();
sub.stream.appendTimeoutInsight(substreamInsight);
openSubstreamsInsight.append(substreamInsight);
insight.appendKeyValue("open", openSubstreamsInsight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be outside of the for loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed.

insight.appendKeyValue("buffered_nanos", streamSetTimeNanos - startTimeNanos);
realStream.appendTimeoutInsight(insight);
} else {
insight.append("waiting_for_connection");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it'd be nice if these cases were more symmetric in the output string, moving this to the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

zhangkun83 added a commit that referenced this pull request Jun 19, 2019
In #5892 getAttributes() is called without any regard of timing.
Currently DelayedStream.getAttributes() wil throw if called before
passThrough was set.  Just to be safe, we are removing that
restriction and making it clear on the javadoc.

On the other hand, we intend to keep the timing restriction on
ClientCall.getAttributes().
Copy link
Contributor

@carl-mastrangelo carl-mastrangelo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhangkun83 zhangkun83 merged commit ddbaf74 into grpc:master Jun 20, 2019
@zhangkun83 zhangkun83 deleted the useful_deadline_exceeded branch June 20, 2019 00:30
@lock lock bot locked as resolved and limited conversation to collaborators Sep 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants