Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transport: fail NewStream() with better error message when conn is closed #4426

Closed
wants to merge 5 commits into from

Conversation

menghanl
Copy link
Contributor

No description provided.

@menghanl
Copy link
Contributor Author

menghanl commented May 12, 2021

Found when debugging Test/KeepaliveClientStaysHealthyWithResponsiveServer #4424

@dfawley
Copy link
Member

dfawley commented May 13, 2021

The test you changed now appears to be flaky.

@dfawley dfawley assigned menghanl and unassigned dfawley May 13, 2021
Comment on lines +747 to +751
t.mu.Lock()
err := t.closeErr
t.mu.Unlock()
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the lock is necessary, but it's harmless in any case. t.cancel() is called only after t.closeErr is set, and t.closeErr can only be set once, before t.cancel().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.ctx is derived from a parent context, so t.cancel() isn't the only way to cancel t.ctx. And the read and write may race in the other case.
(That parent context is ClientConn.ctx, so this only happens when the ClientConn is closed)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, but is pretty complicated. How about:

func (t *http2Client) getCloseErr() {
	t.mu.Lock()
	defer t.mu.Unlock()
	if err := t.closeErr; err != nil {
		return err
	}
	if t.ctx.Err() != nil {
		return ErrConnClosing
	}
	return nil // Not closed, why are you calling this??
}

Or is there some way to block until t.Close finishes so we know t.closeErr is valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different?

And I don't like returning nil here. Even if it should never happen.

internal/transport/transport_test.go Show resolved Hide resolved
@menghanl menghanl assigned dfawley and unassigned menghanl May 17, 2021
Comment on lines +747 to +751
t.mu.Lock()
err := t.closeErr
t.mu.Unlock()
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, but is pretty complicated. How about:

func (t *http2Client) getCloseErr() {
	t.mu.Lock()
	defer t.mu.Unlock()
	if err := t.closeErr; err != nil {
		return err
	}
	if t.ctx.Err() != nil {
		return ErrConnClosing
	}
	return nil // Not closed, why are you calling this??
}

Or is there some way to block until t.Close finishes so we know t.closeErr is valid?

Comment on lines +786 to +787
// - if it is after the transport is closed (case <-ct.ctxDone),
// we don't care about the error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the transport be closed? It was gracefully closed, but we have an active stream (from line 755) so it should never close, should it? Unless the server ends the stream, but it looks like it does not, until it receives the client's end-stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We start 200 goroutines to run this.
Some of them could be so slow that they run after the end of the line 755 stream.

@dfawley dfawley assigned menghanl and unassigned dfawley May 17, 2021
@menghanl menghanl assigned dfawley and unassigned menghanl May 25, 2021
@dfawley
Copy link
Member

dfawley commented May 25, 2021

Tests are failing:

Error: ../internal/transport/http2_client.go:406:22: not enough arguments in call to t.controlBuf.finish

@dfawley dfawley assigned menghanl and unassigned dfawley May 25, 2021
@menghanl
Copy link
Contributor Author

menghanl commented May 25, 2021

This PR no longer works after moving controlbuf.finish() (#4447).
go test ./internal/transport -run Test/GracefulClose -count 100 is racy

NewStream() will fail due to controlbuf error, after the receiver exits, before the transport is closed (transport is closed by the sender, but it takes time for it to see the error, it's a race), so it won't be able to reliably return the transport close error.

This will need more thoughts, and probably more changes.

@menghanl menghanl closed this Jun 1, 2021
@dfawley
Copy link
Member

dfawley commented Jun 1, 2021

Can you file an issue to follow-up on this later? (Unless you have something you're actively working on already.)

@menghanl menghanl deleted the better_conn_close_error branch June 21, 2021 21:19
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants