Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: fix stream creation issue with transparent retry #5503

Merged
merged 1 commit into from Jul 14, 2022

Conversation

dfawley
Copy link
Contributor

@dfawley dfawley commented Jul 13, 2022

Note: I am fairly sure the comment formatting change is caused by using a preview version of Go 1.19, meaning we all will see that behavior forced by gofmt soon.

Fixes #5341

Description:

We need to create a brand new attempt every time we retry the initial op that gets the transport and creates the stream. Once that op is committed and added to the replay log, it is handled by the retry+replay logic, but until then, it currently does retries using the original attempt. This was mostly OK, but leads to a potential infinite loop:

  1. getTransport() - succeeds
  2. newStream() - fails with a transparently retryable error, as the transport died at the exact right moment
  3. Retry
  4. getTransport - fails
  5. shouldRetry should not allow transparent retry in this case, but it does because the original attempt had the field set.

If no transport is ever attainable after this, it will loop until the RPC's context expires.

RELEASE NOTES:

  • client: fix an issue that, in the event of an unlikely race, could lead to RPCs timing out instead of failing more quickly with UNAVAILABLE

var err error
if cs.attempt, err = cs.newAttemptLocked(false /* isTransparent */); err != nil {
cs.mu.Unlock()
cs.finish(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cs.finish() grabs the lock again. Can something happen between us calling Unlock() on line 708 and cs.finish() calling Lock() that can cause trouble?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only for the first op, the clientStream we're using is only visible to the caller at this point (newClientStream).

@easwars easwars assigned dfawley and unassigned easwars Jul 14, 2022
@dfawley dfawley assigned easwars and unassigned dfawley Jul 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: Test/FailFast
2 participants