Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reconnect loop for GCP pub/sub subscriptions #1293

Merged
merged 5 commits into from
Mar 14, 2022

Conversation

yordan-pavlov
Copy link
Contributor

@yordan-pavlov yordan-pavlov commented Nov 11, 2021

Description

Add a reconnection loop for GCP pub/sub subscriptions, very similar to the one that already exists for Azure Service Bus subscriptions here https://github.com/dapr/components-contrib/blob/master/pubsub/azure/servicebus/servicebus.go#L384

This is related to #1250 where the connection to a GCP pub/sub subscription could be lost due to a recoverable error. Even if the GCP pub/sub library does handle some recoverable errors internally, a reconnect loop can still be useful in a number of cases. For example:

  • the GCP pub/sub library might not handle all recoverable errors (that's why Fix #1250 by upgrading cloud.google.com/go/pubsub to v1.12.2 #1285 exists to upgrade the GCP pub/sub library)
  • a reconnect loop enables a scenario where the application has to wait for the subscription to be created
  • a reconnect loop would also report errors such as a "not found" error thus making the investigation of issues with a GCP pub/sub subscription easier

Issue reference

Please reference the issue this PR will close: #1250

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

Copy link
Contributor

@CodeMonkeyLeet CodeMonkeyLeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM; per the PR template, can you make sure that there's an issue filed in https://github.com/dapr/docs for the new config metadata properties? It helps to make sure that the added features are appropriately documented with each release.

pubsub/gcp/pubsub/pubsub.go Show resolved Hide resolved
pubsub/gcp/pubsub/pubsub.go Outdated Show resolved Hide resolved
CodeMonkeyLeet
CodeMonkeyLeet previously approved these changes Nov 18, 2021
Copy link
Contributor

@CodeMonkeyLeet CodeMonkeyLeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM!

@CodeMonkeyLeet CodeMonkeyLeet dismissed their stale review November 18, 2021 18:56

Tests reveal break

@CodeMonkeyLeet
Copy link
Contributor

Spoke too soon: Linter is complaining the code needs a gofmt

@yordan-pavlov
Copy link
Contributor Author

yordan-pavlov commented Nov 18, 2021

yes, it must be the automated commit for the log message typo, I will fix it tomorrow morning when I am back at work

CodeMonkeyLeet
CodeMonkeyLeet previously approved these changes Nov 19, 2021
Copy link
Member

@artursouza artursouza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pkedy Can this use the retry logic from the kit library?

pubsub/gcp/pubsub/pubsub.go Show resolved Hide resolved
@artursouza artursouza self-assigned this Nov 24, 2021
@pkedy
Copy link
Member

pkedy commented Nov 24, 2021

@pkedy Can this use the retry logic from the kit library?

Yes. I think dapr/kit/retry probably makes sense here. No loop needed. I’ll paste some links to examples in other components. Usage is pretty simple and straightforward.

Copy link
Member

@pkedy pkedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions for using the dapr/kit/retry package.

pubsub/gcp/pubsub/pubsub.go Show resolved Hide resolved
pubsub/gcp/pubsub/pubsub.go Show resolved Hide resolved
@yordan-pavlov
Copy link
Contributor Author

yordan-pavlov commented Nov 24, 2021

@pkedy

Yes. I think dapr/kit/retry probably makes sense here. No loop needed.

I don't think the behavior implemented with the retry loop in this PR can be achieved by simply using dapr/kit/retry without a loop - retry up to max retries every time the subscription is disconnected (not just one time).

Even if the back-off library is used within a loop, the behavior still won't be the same - retry attempts are used and slowly refilled back up over time - the longer the subscription connection lasts, the more retry attempts accumulate up to max retries. Only if retry attempts are used up in quick succession will the retry loop stop retrying.

@pkedy
Copy link
Member

pkedy commented Nov 25, 2021

@yordan-pavlov thanks for the explanation. I’ll take a closer look at this after a few days of Thanksgiving PTO.

@pkedy
Copy link
Member

pkedy commented Dec 6, 2021

@yordan-pavlov sorry for the delay. I'm preparing for a conference talk with week.

Signed-off-by: Yordan Pavlov <yordan.pavlov@dunnhumby.com>
@yordan-pavlov
Copy link
Contributor Author

yordan-pavlov commented Jan 17, 2022

@yaron2 I have fixed the DCO issues and now all checks are passing; it would be great if this PR could be included in the 1.6 release.

daixiang0
daixiang0 previously approved these changes Feb 7, 2022
@berndverst berndverst added the do-not-merge PR is not ready for merging label Feb 12, 2022
@berndverst
Copy link
Member

@pkedy can you give this another look to decide whether dapr/kit/retry works here? If not, any thoughts on what should be improved?

@Taction
Copy link
Member

Taction commented Mar 10, 2022

@yordan-pavlov can you please resolve conflicts?

Signed-off-by: Yordan Pavlov <yordan.pavlov@dunnhumby.com>
@yordan-pavlov
Copy link
Contributor Author

@Taction done - conflict resolved

@codecov
Copy link

codecov bot commented Mar 14, 2022

Codecov Report

Merging #1293 (edfbb88) into master (c4aa200) will decrease coverage by 0.02%.
The diff coverage is 22.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1293      +/-   ##
==========================================
- Coverage   35.79%   35.77%   -0.03%     
==========================================
  Files         160      160              
  Lines       14637    14680      +43     
==========================================
+ Hits         5240     5252      +12     
- Misses       8829     8860      +31     
  Partials      568      568              
Impacted Files Coverage Δ
pubsub/gcp/pubsub/pubsub.go 25.14% <22.64%> (+0.92%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4b0fdd...edfbb88. Read the comment docs.

@berndverst berndverst removed the do-not-merge PR is not ready for merging label Mar 14, 2022
@berndverst berndverst added this to the v1.7 milestone Mar 14, 2022
@berndverst berndverst dismissed artursouza’s stale review March 14, 2022 18:18

dapr/kit retry doesn't seem to be a good fit here, besides pkedy does not have time to further review

@berndverst berndverst merged commit 1bbefff into dapr:master Mar 14, 2022
@berndverst
Copy link
Member

@yordan-pavlov I have merged this. Please make sure to complete the docs PR against the v1.7 branch of docs as soon as possible. dapr/docs#1970

@yordan-pavlov
Copy link
Contributor Author

thanks @berndverst, docs pull request added here dapr/docs#2262

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCP pub/sub subscription stops working after error
7 participants