Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: return better status for context err when writing header #5292

Merged

Conversation

idiamond-stripe
Copy link
Contributor

@idiamond-stripe idiamond-stripe commented Apr 4, 2022

I think this closes #4696

It looks like Write unconditionally returns status.Errorf(codes.Internal, "transport: %v", err) when we see an error writing the headers. However, in certain cases we'll actually have an overriding context error here, like a cancellation. This PR changes the behavior to return that instead if it's present.

This matches the behavior in the other branch of the conditional (

return ContextErr(s.ctx.Err())
), where if the stream is in state streamDone, this either returns ErrConnClosing or the s.ctx.Err(). We already check for the stream being streamDone inside WriteHeader, so I opted not to duplicate that check and instead just return the ctx error if it's present.

If we wanted a more specific fix, this could change

if s.ctx.Err() != nil {
    return ContextErr(s.ctx.Err())
}

to

if s.getState() == streamDone && s.ctx.Err() != nil {
    return ContextErr(s.ctx.Err())
}

but it duplicates the check inside WriteHeader.

(This is my first gRPC contribution, so apologies if I haven't followed the right process or I've overlooked something)

RELEASE NOTES:

  • server: return Canceled or DeadlineExceeded status code when writing headers to a stream that is already closed

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Apr 4, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: idiamond-stripe (4811f7a)

@idiamond-stripe idiamond-stripe force-pushed the idiamond/handle-ctx-error-in-write-header branch from 4811f7a to 3872708 Compare April 5, 2022 17:05
@idiamond-stripe idiamond-stripe force-pushed the idiamond/handle-ctx-error-in-write-header branch from 3872708 to 1715f7e Compare April 5, 2022 17:10
@dfawley dfawley self-requested a review April 5, 2022 17:38
@dfawley dfawley self-assigned this Apr 5, 2022
@dfawley dfawley added this to the 1.46 Release milestone Apr 5, 2022
@idiamond-stripe idiamond-stripe changed the title Handle context err in when writing header Handle context err when writing header Apr 5, 2022
@dfawley dfawley modified the milestones: 1.46 Release, 1.47 Release Apr 6, 2022
@idiamond-stripe idiamond-stripe force-pushed the idiamond/handle-ctx-error-in-write-header branch from 87e4962 to 6f33edb Compare April 6, 2022 21:27
@@ -53,10 +52,10 @@ import (
var (
// ErrIllegalHeaderWrite indicates that setting header is illegal because of
// the stream's state.
ErrIllegalHeaderWrite = errors.New("transport: the stream is done or WriteHeader was already called")
ErrIllegalHeaderWrite = status.Error(codes.Internal, "transport: the stream is done or WriteHeader was already called")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this text since we're more precise about sending this error now. E.g. "stream.SendHeader called multiple times". (It's SendHeader at the user API level.)

@@ -946,7 +950,7 @@ func (t *http2Server) WriteHeader(s *Stream, md metadata.MD) error {
}
if err := t.writeHeaderLocked(s); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GH won't let me comment on the right line, but on line 981, there's no guarantee this is a status error. I think it's always ErrConnClosing, in fact. Let's make that return status.Convert(err).Err() -- it's not really efficient or ideal, but it's only in an error condition so it should be okay. This means a closed connection will result in UNKNOWN, but since I'm pretty sure using statuses at all here is a bad idea in the first place, that should be fine. I think I'll separately change all the documentation on ServerStream to recommend against relying upon the returned errors' status codes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting changing the error inside writeHeaderLocked to be status.Convert(err).Err() or changing the handling of the return value to be status.Convert(err).Err()?

I changed the outside error, as that's the place where we converted it to status.Internal previously. Let me know if I understood wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is probably fine. There is only one other usage of writeHeaderLocked, in WriteStatus, and the ultimate user of that just logs the result.

internal/transport/http2_server.go Show resolved Hide resolved
@@ -933,9 +932,14 @@ func (t *http2Server) checkForHeaderListSize(it interface{}) bool {

// WriteHeader sends the header metadata md back to the client.
func (t *http2Server) WriteHeader(s *Stream, md metadata.MD) error {
if s.updateHeaderSent() || s.getState() == streamDone {
if s.getState() == streamDone {
return ContextErr(s.ctx.Err())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a small window where the state can be streamDone but the context hasn't been canceled yet. In these cases, we would be returning an INTERNAL error.

We could paper over this race by calling s.cancel() here, as we seem to do in Write. But honestly I think the right fix here is to move s.cancel() in deleteStream to the first line of both finishStream and cancelStream. And then we can remove the s.cancel() in Write. Would you mind doing this and hopefully (:crossed_fingers:) all the tests still pass?

Looking closer at Write, it looks like it has special logic to prefer returning an error that the whole connection was closed over the stream being closed. We probably want to do the same thing here. If you want to copy/paste the select, that's fine. Or if you don't mind refactoring, it looks like we need:

func (t *http2Server) streamContextErr(s *Stream) error {
	select {
	case <-t.done:
		return ErrConnClosing
	default:
	}
	return ContextErr(s.ctx.Err())
}

But I can do that as a follow-on later otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, tests seem to all pass with this! I changed it finishStream, but it looks like it's already called in deleteStream (which closeStream calls).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is done by deleteStream, but there is a small race, still, where the context is not canceled while the state is streamDone. So I was suggesting moving that cancelation to happen before setting the stream's state to streamDone, and also removing it from deleteStream.

@idiamond-stripe
Copy link
Contributor Author

Also is it possible to run the tests here? I'm a first time contributor so I think running the CI workflow requires signoff.

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think everything looks good except for moving the s.cancel call to closeStream directly, and out of deleteStream.

@@ -946,7 +950,7 @@ func (t *http2Server) WriteHeader(s *Stream, md metadata.MD) error {
}
if err := t.writeHeaderLocked(s); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is probably fine. There is only one other usage of writeHeaderLocked, in WriteStatus, and the ultimate user of that just logs the result.

@@ -933,9 +932,14 @@ func (t *http2Server) checkForHeaderListSize(it interface{}) bool {

// WriteHeader sends the header metadata md back to the client.
func (t *http2Server) WriteHeader(s *Stream, md metadata.MD) error {
if s.updateHeaderSent() || s.getState() == streamDone {
if s.getState() == streamDone {
return ContextErr(s.ctx.Err())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is done by deleteStream, but there is a small race, still, where the context is not canceled while the state is streamDone. So I was suggesting moving that cancelation to happen before setting the stream's state to streamDone, and also removing it from deleteStream.

}
// TODO(mmukhi, dfawley): Make sure this is the right code to return.
return status.Errorf(codes.Internal, "transport: %v", err)
return err
}
} else {
// Writing headers checks for this condition.
if s.getState() == streamDone {
// TODO(mmukhi, dfawley): Should the server write also return io.EOF?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be deleted now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO or the entire conditional? The tests appear to pass with the entire conditional removed but I'm not familiar enough with this code to understand whether we still need to check if the stream is done elsewhere in Write (if it's called after writing headers).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like writeQueue.get checks whether the writeQueue is done. Is the idea that that channel is set when the stream is closed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just referring to the TODO. I don't think this whole thing should be removed or else we'll potentially write operations into the control buffer for closed streams. If the controlbuf isn't checking that the stream is still valid and writes frames, then that's an HTTP/2 protocol violation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack. reverted and removed the TODO.

@idiamond-stripe idiamond-stripe force-pushed the idiamond/handle-ctx-error-in-write-header branch from 134a4b8 to 3399614 Compare April 7, 2022 20:45
@idiamond-stripe idiamond-stripe force-pushed the idiamond/handle-ctx-error-in-write-header branch from 3399614 to 32d8958 Compare April 7, 2022 20:47
@dfawley dfawley assigned menghanl and unassigned menghanl Apr 7, 2022
@dfawley dfawley requested a review from menghanl April 7, 2022 21:46
@menghanl menghanl assigned menghanl and unassigned dfawley Apr 7, 2022
Copy link
Contributor

@menghanl menghanl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@menghanl menghanl assigned dfawley and unassigned menghanl Apr 8, 2022
@dfawley dfawley changed the title Handle context err when writing header server: return better status for context err when writing header Apr 8, 2022
@dfawley dfawley merged commit 924e484 into grpc:master Apr 8, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gRPC server reports internal error when request is cancelled because of deadline/timeout
3 participants