Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distribution: retry downloading schema config on retryable error #43291

Merged
merged 1 commit into from Mar 5, 2022

Conversation

pete-woods
Copy link
Contributor

@pete-woods pete-woods commented Feb 24, 2022

fixes #43267

- What I did
Add a simple retry mechanism to the schema download with exponential back-off, max retries, and that respects context cancellation.

- How to verify it
I think the tests cover it pretty well?

- Description for the changelog
Add retries for schema manifest downloads.

- A picture of a cute animal (not mandatory but encouraged)
cat-typing

@pete-woods pete-woods marked this pull request as ready for review February 24, 2022 12:31
@pete-woods
Copy link
Contributor Author

I have hard-coded the max attempts and 250ms exponential back-off at the minute. Wasn't sure if they should be configurable or not.

@thaJeztah thaJeztah added area/distribution kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/2-code-review labels Feb 24, 2022
@thaJeztah thaJeztah added this to the 21.xx milestone Feb 24, 2022
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I had a first glance over the changes, and left some comments / suggestions (and other ramblings) inline.

Don't hesitate to "discuss" if my comments don't make sense, or cause issues (e.g. if changing the tests to a sub-test makes things waay more complicated)

distribution/pull_v2.go Outdated Show resolved Hide resolved
distribution/pull_v2.go Outdated Show resolved Hide resolved
distribution/pull_v2.go Outdated Show resolved Hide resolved
distribution/pull_v2.go Outdated Show resolved Hide resolved
distribution/pull_v2_test.go Outdated Show resolved Hide resolved
distribution/pull_v2_test.go Show resolved Hide resolved
distribution/pull_v2.go Outdated Show resolved Hide resolved
@@ -858,7 +859,10 @@ func (p *v2Puller) pullManifestList(ctx context.Context, ref reference.Named, mf

func (p *v2Puller) pullSchema2Config(ctx context.Context, dgst digest.Digest) (configJSON []byte, err error) {
blobs := p.repo.Blobs(ctx)
configJSON, err = blobs.Get(ctx, dgst)
err = retry(ctx, 5, 250*time.Millisecond, func(ctx context.Context) (err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As to hard-coding max attempts and interval; hard-coding the value is probably ok for a start, but I'd suggest to at least define consts to make it slightly more descriptive; the xfer package uses a maxDownloadAttempts const (not sure if we want to export that one, and the name is a bit poorly chosen (should probably have been named defaultXXX), but perhaps ok (for now) to use the same name.

const maxDownloadAttempts = 5

As a follow-up (?) we could look if it's somehow possible to pass the same configuration as xfer is using (could be set as a property on v2Puller perhaps, but I'd have to have another look); see xfer.WithMaxDownloadAttempts(config.MaxDownloadAttempts);

downloadManager: xfer.NewLayerDownloadManager(config.LayerStore, config.MaxConcurrentDownloads, xfer.WithMaxDownloadAttempts(config.MaxDownloadAttempts)),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pulled out the constants, as a first pass here

@pete-woods pete-woods force-pushed the retry-image-schema-download branch 3 times, most recently from 434b65d to d3f5dcc Compare February 24, 2022 14:29
distribution/pull_v2.go Outdated Show resolved Hide resolved
distribution/xfer/transfer.go Outdated Show resolved Hide resolved
distribution/xfer/transfer.go Show resolved Hide resolved
@pete-woods pete-woods force-pushed the retry-image-schema-download branch 5 times, most recently from e266520 to 26a4002 Compare February 24, 2022 15:56
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating; had one comment, but looking good overall 👍

distribution/pull_v2_test.go Outdated Show resolved Hide resolved
@pete-woods
Copy link
Contributor Author

Thanks for updating; had one comment, but looking good overall 👍

👍 Great. Hopefully this fixes our customers' issues (once I've backported to 20.10.x and it's running on our instances).

@pete-woods pete-woods force-pushed the retry-image-schema-download branch 2 times, most recently from dc5c349 to 085ce92 Compare February 28, 2022 09:59
@pete-woods
Copy link
Contributor Author

Rebased out the conflict

@pete-woods
Copy link
Contributor Author

hi @thaJeztah - is there anything more I need to do on this PR? Solicit further review?

Copy link
Contributor

@ndeloof ndeloof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Seems to me proposed implementation leaks running Tickers, use of a Timer would be better

distribution/pull_v2.go Show resolved Hide resolved
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

fixes moby#43267

Signed-off-by: Pete Woods <pete.woods@circleci.com>
@thaJeztah thaJeztah force-pushed the retry-image-schema-download branch from 6d01cbe to 9f3b1a9 Compare March 5, 2022 18:39
@thaJeztah
Copy link
Member

😓 there was a small conflict in the imports due to #43323 being merged; I did a quick rebase and pushed to your branch

Let's merge this one when CI completes 👍

@pete-woods
Copy link
Contributor Author

😓 there was a small conflict in the imports due to #43323 being merged; I did a quick rebase and pushed to your branch

Let's merge this one when CI completes 👍

Awesome, thanks

@thaJeztah
Copy link
Member

All green 👍 🥳

@thaJeztah thaJeztah merged commit 18e20d3 into moby:master Mar 5, 2022
@pete-woods pete-woods deleted the retry-image-schema-download branch March 5, 2022 20:31
@pete-woods
Copy link
Contributor Author

20.10 backport created here: #43333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/distribution impact/changelog kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/4-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Retry required for manifest config blobs
3 participants