Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve pull handling #1320

Merged
merged 14 commits into from Jul 22, 2019
Merged

Improve pull handling #1320

merged 14 commits into from Jul 22, 2019

Conversation

rnorth
Copy link
Member

@rnorth rnorth commented Mar 18, 2019

Pulling of images is something that I feel we can generally improve. This PR attempts to:

  • Remove the current fixed timeout+retries for pulls, replacing it with a monitor of progress. If the download pauses for more than 30s, the pull will be aborted, but otherwise any duration of pull is allowed as long as progress is being made.

  • Make the logging more human friendly; we'll actually log the downloaded/extracted state of the layers - in a form that is informative but not too noisy in the logs.

An example of the logs from both of these changes together (where I cut the network connection during the pull):

19:34:25.198 INFO  馃惓 [ibmcom/db2express-c:latest] - Pulling image
19:34:25.198 INFO  馃惓 [ibmcom/db2express-c:latest] - Pulling image layers:  0 pending,  0 downloaded,  0 extracted, (0 bytes/0 bytes)
19:34:25.967 INFO  馃惓 [ibmcom/db2express-c:latest] - Pulling image layers: 12 pending,  1 downloaded,  0 extracted, (32 bytes/? MB)
19:34:27.363 INFO  馃惓 [ibmcom/db2express-c:latest] - Pulling image layers: 11 pending,  2 downloaded,  0 extracted, (1 MB/? MB)
19:34:58.519 ERROR 馃惓 [ibmcom/db2express-c:latest] - Docker image pull has not made progress in 30s - aborting pull
19:34:58.564 ERROR 馃惓 [ibmcom/db2express-c:latest] - Failed to pull image: ibmcom/db2express-c:latest. Please check output of `docker pull ibmcom/db2express-c:latest`

@bsideup
Copy link
Member

bsideup commented Apr 28, 2019

Wow, really great change! 馃憤 Seems to make our pulling process much more stable 馃挴 I just left a few comments, but I really like the idea in general

logger.info("Pulling docker image: {}. Please be patient; this may take some time but only needs to be done once.", imageName);
}

if (attempts++ >= 3) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you removed the retrying logic here. Is it really safe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is deliberate. I don't think that we have any more reasons to retry pulls.

A pull timeout was one case where a retry actually used to be useful, in that we could expect huge/slow downloads to maybe fail on the first attempt but succeed after that due to cached layers.

With the new code, we can now tolerate extremely long downloads as long as progress is being made. We shouldn't need to retry downloads because of these any more.

Other reasons for failure, such as unavailable images or auth failures make no sense to retry anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this comment unresolved for easy visiblity should we ever regret this change!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out I was wrong; I've noticed numerous failures in CI due to timeout errors at pull time.

Will reinstate retry logic for pulls!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #1712

Copy link
Member

@bsideup bsideup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 馃憤
Left a few question about the retrying mechanism, but other than that - good to go 馃憤

@rnorth rnorth changed the title WIP: Improve pull handling Improve pull handling Jul 21, 2019
@rnorth rnorth added this to the next milestone Jul 21, 2019
@rnorth rnorth self-assigned this Jul 21, 2019
@rnorth
Copy link
Member Author

rnorth commented Jul 22, 2019

I think I may revert 1d40670 and 315bc9d - I think the costs outweigh the benefit.

@bsideup
Copy link
Member

bsideup commented Jul 22, 2019

@rnorth what was the cost?

@rnorth
Copy link
Member Author

rnorth commented Jul 22, 2019

It's a shame you can't see the commit!

DockerClientFactory's runInsideDocker cannot use the singleton docker client, and needs to use a specific client instance for the pull. In order to make that work, I had to introduce an DockerClient parameter to RemoteDockerImage so that it could use that provided client. It 'worked' but is extending the public API somewhere else, and feels unpleasant.

We could consider a broader refactoring, but I'm not sure that it's worth it.

@bsideup
Copy link
Member

bsideup commented Jul 22, 2019

@rnorth ok 馃憤

@rnorth rnorth merged commit 4df20c2 into master Jul 22, 2019
@delete-merged-branch delete-merged-branch bot deleted the improve-pull-callback branch July 22, 2019 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants