Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanner: refactor retrying #1418

Closed
jeanbza opened this issue May 3, 2019 · 5 comments
Closed

spanner: refactor retrying #1418

jeanbza opened this issue May 3, 2019 · 5 comments
Assignees
Labels
api: spanner Issues related to the Spanner API. type: cleanup An internal cleanup or hygiene concern.

Comments

@jeanbza
Copy link
Member

jeanbza commented May 3, 2019

Spanner should:

  • Use github.com/googleapis/gax-go/v2 for retrying, replacing all the custom logic in runRetryable and runRetryableNoWrap.
  • Retry on codes.Aborted only, removing entirely our transport is closing, streaming terminated by RST_STREAM, and unexpected EOF retries.
    • These should happen approximately never, and when they do happen are probably indicative of some critical failure that we should not retry.
    • These may have been returned by grpc-go when it was younger and buggier. It seems as though a lot of the retry logic in spanner was implemented pre- grpc-go 1.9. (grpc-go is currently at 1.20.1 by comparison)
  • Leave almost all retrying to gapic layer (including UNAVAILABLE and UNKNOWN). The only retrying that should happen in the manual client is retries around transactions in which the retry block (runInTransaction) encompasses all transactional RPCs, and ABORTED causes the entire transaction to be rolled back and retried.
  • Remove the assignment of gRPC codes to errors, as seen for example here and here. The only acceptable place to generate gRPC errors is gRPC.
    • This might need to be a separate errors epic.
@jeanbza jeanbza added api: spanner Issues related to the Spanner API. type: cleanup An internal cleanup or hygiene concern. labels May 3, 2019
@jeanbza
Copy link
Member Author

jeanbza commented May 3, 2019

cc @broady @olavloite

@jeanbza
Copy link
Member Author

jeanbza commented May 3, 2019

Made some progress on this in https://code-review.googlesource.com/c/gocloud/+/40790, but not sure if I have the time to continue. Thought I'd record my thoughts in the meantime if someone else has more time.

@olavloite
Copy link
Contributor

Reopening as there is still some work to do on this.

@olavloite olavloite reopened this Aug 18, 2019
@olavloite olavloite self-assigned this Aug 18, 2019
gopherbot pushed a commit that referenced this issue Sep 21, 2019
Create a customized GAX retryer that checks for any retry
information that Cloud Spanner might have returned.
If Cloud Spanner did not return any retry information, a
default delay is calculated by GAX and used.

Updates #1418

Change-Id: I56b9ab0bc4f64ad68f19edb9d18c89c3af7995d3
Reviewed-on: https://code-review.googlesource.com/c/gocloud/+/44172
Reviewed-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Jean de Klerk <deklerk@google.com>
gopherbot pushed a commit that referenced this issue Sep 21, 2019
Use a standard GAX retryer for resumableStreamDecoder and only
retry on standard gRPC codes. Removes the custom error checks
that were used by this decoder.

Updates #1418.

Change-Id: I8f339f31cf71fe3e5f9aebcb685b5444c8aa56b8
Reviewed-on: https://code-review.googlesource.com/c/gocloud/+/44173
Reviewed-by: kokoro <noreply+kokoro@google.com>
Reviewed-by: Jean de Klerk <deklerk@google.com>
@olavloite
Copy link
Contributor

Custom retry loops have been replaced with standard GAX retryers where applicable. All other retries are handled by GAPIC. As noted above: Removing/refactoring the assignment of gRPC codes to errors should be handled in a separate issue.

@apstndb
Copy link
Contributor

apstndb commented Sep 21, 2022

Retry on codes.Aborted only, removing entirely our transport is closing, streaming terminated by RST_STREAM, and unexpected EOF retries.

It seems that the retry on RST_STREAM has been restored. Now, is it considered safe to retry?
#6699

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. type: cleanup An internal cleanup or hygiene concern.
Projects
None yet
Development

No branches or pull requests

3 participants