Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate cause of intermittent "connection refused" errors in TestDetailedConnectionCloseErrorPropagatesToRpcError #4338

Closed
apolcyn opened this issue Apr 13, 2021 · 3 comments

Comments

@apolcyn
Copy link
Contributor

apolcyn commented Apr 13, 2021

This is a spinoff from the review conversation in #4311 (comment)

If the rpcStartedOnServer channel is removed from that test, some of RPCs will fail with:

end2end_test.go:1372: &{0xc0000ba360}.Recv() = _, rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:37309: connect: connection refused", want _, rpc error containing substring: |connection reset by peer| OR |error reading from server: EOF|

Note that the stub server Start method waits until the client conn is ready before returning. So the only explanation I can think of is that the LB policy is picking a connection for the RPC asynchronously (i.e. ss.Client.FullDuplexCall(ctx) does not wait for the LB policy to pick a connection - FWIW this behavior differs e.g. from C++), and that the LB policy's picking of a connection is sometimes losing the race against ss.S.Stop() - in that case, the previously ready connection would tear down and not come back up, and RPC's would fail with such an error.

Filing this bug to confirm the root cause though.

@menghanl
Copy link
Contributor

@apolcyn Have you got time to look at this?

@apolcyn
Copy link
Contributor Author

apolcyn commented Apr 29, 2021

I'll likely be busy with other issues for at least the next couple of weeks.

@github-actions
Copy link

github-actions bot commented Jun 1, 2021

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Jun 1, 2021
@github-actions github-actions bot closed this as completed Jun 8, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants