New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distribution: retry downloading schema config on retryable error #43291
Conversation
f68d432
to
fbb6e3b
Compare
I have hard-coded the max attempts and 250ms exponential back-off at the minute. Wasn't sure if they should be configurable or not. |
fbb6e3b
to
1b4ebd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I had a first glance over the changes, and left some comments / suggestions (and other ramblings) inline.
Don't hesitate to "discuss" if my comments don't make sense, or cause issues (e.g. if changing the tests to a sub-test makes things waay more complicated)
distribution/pull_v2.go
Outdated
@@ -858,7 +859,10 @@ func (p *v2Puller) pullManifestList(ctx context.Context, ref reference.Named, mf | |||
|
|||
func (p *v2Puller) pullSchema2Config(ctx context.Context, dgst digest.Digest) (configJSON []byte, err error) { | |||
blobs := p.repo.Blobs(ctx) | |||
configJSON, err = blobs.Get(ctx, dgst) | |||
err = retry(ctx, 5, 250*time.Millisecond, func(ctx context.Context) (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As to hard-coding max attempts and interval; hard-coding the value is probably ok for a start, but I'd suggest to at least define consts
to make it slightly more descriptive; the xfer
package uses a maxDownloadAttempts
const (not sure if we want to export that one, and the name is a bit poorly chosen (should probably have been named defaultXXX
), but perhaps ok (for now) to use the same name.
moby/distribution/xfer/download.go
Line 19 in 65b8bcc
const maxDownloadAttempts = 5 |
As a follow-up (?) we could look if it's somehow possible to pass the same configuration as xfer
is using (could be set as a property on v2Puller
perhaps, but I'd have to have another look); see xfer.WithMaxDownloadAttempts(config.MaxDownloadAttempts)
;
Line 62 in 65b8bcc
downloadManager: xfer.NewLayerDownloadManager(config.LayerStore, config.MaxConcurrentDownloads, xfer.WithMaxDownloadAttempts(config.MaxDownloadAttempts)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pulled out the constants, as a first pass here
434b65d
to
d3f5dcc
Compare
e266520
to
26a4002
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating; had one comment, but looking good overall 👍
26a4002
to
0a2ac87
Compare
👍 Great. Hopefully this fixes our customers' issues (once I've backported to |
dc5c349
to
085ce92
Compare
Rebased out the conflict |
hi @thaJeztah - is there anything more I need to do on this PR? Solicit further review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Seems to me proposed implementation leaks running Ticker
s, use of a Timer
would be better
085ce92
to
6d01cbe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
fixes moby#43267 Signed-off-by: Pete Woods <pete.woods@circleci.com>
6d01cbe
to
9f3b1a9
Compare
😓 there was a small conflict in the imports due to #43323 being merged; I did a quick rebase and pushed to your branch Let's merge this one when CI completes 👍 |
Awesome, thanks |
All green 👍 🥳 |
20.10 backport created here: #43333 |
fixes #43267
- What I did
Add a simple retry mechanism to the schema download with exponential back-off, max retries, and that respects context cancellation.
- How to verify it
I think the tests cover it pretty well?
- Description for the changelog
Add retries for schema manifest downloads.
- A picture of a cute animal (not mandatory but encouraged)