Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spanner): retry spanner transactions and mutations when RST_STREAM error #6699

Merged
merged 2 commits into from Sep 20, 2022

Conversation

rahul2393
Copy link
Contributor

  • Clients need to retry the Spanner transactions in case of RST_STREAM error returned from the backend

…AM internal errors is returned from backend.
@rahul2393 rahul2393 requested review from a team as code owners September 20, 2022 09:01
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: spanner Issues related to the Spanner API. labels Sep 20, 2022
@rahul2393 rahul2393 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 20, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 20, 2022
// returned by Cloud Spanner, or if none is returned, the calculated delay with
// a minimum of 10ms and maximum of 32s. There is no delay before the retry if
// the error was Session not found.
func runWithRetryOnAbortedOrSessionNotFound(ctx context.Context, f func(context.Context) error) error {
retryer := onCodes(DefaultRetryBackoff, codes.Aborted)
retryer := onCodes(DefaultRetryBackoff, codes.Aborted, codes.Internal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make it retry on any Internal error, also those that are not retryable. This could cause an infinite (or at least very long-lasting) loop if someone tries to execute a transaction that happens to hit a feature that returns an internal error.

// creations/retrivals.
return ts, err
// Make a retryer for Aborted and certain Internal errors.
retryer := onCodes(DefaultRetryBackoff, codes.Aborted, codes.Internal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also could cause an infinite loop.

// retries it if it returns an Aborted or Session not found error. The retry
// is delayed if the error was Aborted. The delay between retries is the delay
// retries it if it returns an Aborted, Session not found error or certain Internal errors. The retry
// is delayed if the error was Aborted or Internal error. The delay between retries is the delay
// returned by Cloud Spanner, or if none is returned, the calculated delay with
// a minimum of 10ms and maximum of 32s. There is no delay before the retry if
// the error was Session not found.
func runWithRetryOnAbortedOrSessionNotFound(ctx context.Context, f func(context.Context) error) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a couple of tests for the changes in this PR that verify that transactions are retried if the RST_STREAM error is returned, but not for any other random internal error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olavloite Test "TestClient_ApplyAtLeastOnce_NonRetryableInternalErrors" added,
Please take a look at the code again.
This could cause an infinite => This is not true. The following code only says that:

if it is an internal error and its error message is not in the list, then return false.
If it is an internal error and it error message in the list, then let it pass to the next check r.Retryer.Retry(err).

It early returns false for non-retryable internal errors before the real retry check.

If the user has specified to not retry internal errors, then r.Retryer.Retry(err) will return false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that, thanks! The specific error messages were already there, we were just not using them anymore (and I guess you are now going to dig up that I was the one that actually removed it ;-) ).

@rahul2393 rahul2393 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. kokoro:force-run Add this label to force Kokoro to re-run the tests. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants