Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(internal/gensupport): wrap retry failures with context and prev error #1684

Merged
merged 4 commits into from Sep 20, 2022

Conversation

matheuscscp
Copy link
Contributor

@matheuscscp matheuscscp commented Sep 2, 2022

I have a use case in which my context is cancelling a sendAndRetry() operation, and the error being returned is the error returned by the Google API in the previous attempt of sendAndRetry() instead of ctx.Err(), which is misleading because I was investigating why the Google Storage Go library was giving up on a 503 error (which is not expected according to the retry/idempotency docs: https://cloud.google.com/storage/docs/retry-strategy).

closes: #1685

@google-cla
Copy link

google-cla bot commented Sep 2, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@codyoss
Copy link
Member

codyoss commented Sep 2, 2022

@matheuscscp could you please file an issue for discussion before I take a look at the PR. Thanks

@matheuscscp
Copy link
Contributor Author

@matheuscscp could you please file an issue for discussion before I take a look at the PR. Thanks

@codyoss here it is: #1685

sorry for not starting with that!

@@ -98,10 +98,11 @@ func sendAndRetry(ctx context.Context, client *http.Client, req *http.Request, r
case <-ctx.Done():
// If we got an error, and the context has been canceled,
// the context's error is probably more useful.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would another possible solution to #1685 be simply to remove this entire block, and let the context raise the deadline exceeded error on its own?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean just return nil, ctx.Err()? otherwise, where would you suggest to put something equivalent to case <-time.After(pause): (the backoff)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tritone Can you comment on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @quartzmo , thanks for flagging. I think the right answer is to return a wrapped error that includes both the context cancelation and the final error from the service, if available -- otherwise it can be pretty hard to debug either way. We implemented this in google-cloud-go: https://github.com/googleapis/google-cloud-go/blob/0da9ab0831540569dc04c0a23437b084b1564e15/internal/retry.go#L36

I can make a PR to implement this if folks agree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also I see that this PR already basically takes that approach -- happy to work with you on completing it yourself if you want; the main change I would ask for is to implement a custom error type that implements Is(error) bool so that we can preserve the validity of calls to errors.Is in user code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep I'd like to fix that as you suggested, I will do it tomorrow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's there, PTAL. I basically copied the wrappedCallErr struct from your other repo with the exception that I did Is() like this:

func (e wrappedCallErr) Is(target error) bool {
	return errors.Is(e.ctxErr, target) || errors.Is(e.wrappedErr, target)
}

basically leveraging errors.Is() instead of using plain == comparison. if that's too much pls let me know!

I also seized the opportunity to use this wrapper in all the other occurrences of some ctx+svc error inside this file

Copy link
Contributor

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, aside from change to send which I think we should hold off on.

@@ -98,10 +119,10 @@ func sendAndRetry(ctx context.Context, client *http.Client, req *http.Request, r
case <-ctx.Done():
// If we got an error, and the context has been canceled,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is no longer accurate, I would rephrase to something like:

// If we got an error and the context has been canceled, return an error acknowledging
// both the context cancelation and the service error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

@@ -43,7 +64,7 @@ func send(ctx context.Context, client *http.Client, req *http.Request) (*http.Re
if err != nil {
select {
case <-ctx.Done():
err = ctx.Err()
err = wrappedCallErr{ctx.Err(), err}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not make this change in send() for now. Reasons:

  1. The case that the call returned a non-context error but the context is expired/canceled is not a very common case without retries being in the picture. And the string from wrappedCallErr is specific to retries being at the root of the issue.
  2. send is much more widely used than sendAndRetry including by the manual clients built on top of this library. I would want to audit these to make sure they can handle this error (meaning that they are upgraded to use errors.Is rather than checking for equality) if they have error handling logic at that layer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

@tritone tritone added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 7, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 7, 2022
@codyoss codyoss added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 8, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 8, 2022
Copy link
Contributor

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @codyoss could you also give this a look before we merge?

Copy link
Member

@codyoss codyoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codyoss codyoss changed the title bugfix: do not swallow context error when context is done feat(internal/gensupport): do not swallow context error when context is done Sep 19, 2022
@codyoss codyoss changed the title feat(internal/gensupport): do not swallow context error when context is done feat(internal/gensupport): wrap retry failures with context and prev error Sep 19, 2022
@tritone tritone added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 19, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 19, 2022
@tritone tritone added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 19, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 19, 2022
@tritone tritone merged commit f427ee3 into googleapis:main Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sendAndRetry(): context error is not being returned when context is canceled
5 participants