Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a pull policy that only pulls new images periodically #1368

Open
edmorley opened this issue Feb 4, 2022 · 8 comments · May be fixed by #2075
Open

Add support for a pull policy that only pulls new images periodically #1368

edmorley opened this issue Feb 4, 2022 · 8 comments · May be fixed by #2075
Assignees
Labels
good for mentorship A good issue for a mentorship project. help wanted Need some extra hands to get this done. status/ready Issue ready to be worked on. type/enhancement Issue that requests a new feature or improvement.
Milestone

Comments

@edmorley
Copy link
Contributor

edmorley commented Feb 4, 2022

Description

Pack's default pull policy is always, which means that on every pack build, it attempts to pull images even if they exist locally (eg the builder image, run image and any buildpacks specified via docker:// URNs).

This is great for ensuring users don't accidentally build against outdated images, however:

  • It adds several seconds to every pack build (even for a no-op when images are up to date)
  • It counts against the container registry's rate limits (even for a no-op), which is inconvenient given Docker Hub and ECR's fairly low rate limits for unauthenticated requests.

These issues are particularly problematic when running integration tests, where pack build will be run many times.

Using the if-not-present pull policy (either via Pack global config, or by passing --pull-policy if-not-present to pack build), prevents these issues, however:

  • For CI: It's easy to forget to set the global pack pull policy to if-not-present, which can mean much slower integration tests (which can be hard to spot, given the Pack stdout mentioning pulling would typically be hidden in the test runner output unless the test fails).
  • When using pack locally: It risks using out of date images.

Proposed solution

  • Initially: Add support for a new Pack pull policy option (alongside the existing always and if-not-present options), that only pulls new images if the time since last pull of those images is greater than N hours/days.
  • Later on: Make this new pull policy the default, instead of always. This would (a) mean users get the performance benefit locally without having to know that the pull policy option exists, (b) reduce the boilerplate needed (and chance of forgetting to set it) in CI configs.

Open questions

  • Should the new pull policy allow the user to configure the time interval?
    • If yes, via what syntax? eg interval=1d
    • If no, what should the hardcoded interval be? 1 day?
  • What should the new pull policy be called? Examples:
    • --pull-policy interval=1d
    • --pull-policy if-not-recent
    • --pull-policy if-not-pulled-recently
  • How should the state of last pull time for each image be tracked?

Describe alternatives you've considered

  • Modify integration test runners to pass --pull-policy if-not-present to pack build both locally and in CI:
    • Pros: This will speed up integration tests everywhere.
    • Cons: Users can potentially be integration testing against out of date images locally (if they don't also run pack build manually from time to time, which would pull). Doesn't help speed up manual repeat pack builds locally when developing.
  • Modify integration test runners to pass --pull-policy if-not-present to pack build only in CI:
    • Pros: This will speed up integration tests in CI. No risk of running integration tests locally against stale images.
    • Cons: Doesn't help speed up integration tests locally, or running manual repeat pack builds locally when developing. Means having to add vendor-specific CI checks (eg known env vars) in integration test runners.
  • In the CI config only, set the Pack global pull policy to if-not-present:
    • Pros: This will speed up integration tests in CI. No risk of running integration tests locally against stale images. No vendor-specific checks in integration test runners.
    • Cons: Doesn't help speed up integration tests locally, or running manual repeat pack builds locally when developing. Easy to forget to configure (particularly when the end user projects are owned by different users to the test runner, so less familiar with best practices).

Additional context

@edmorley edmorley added status/triage Issue or PR that requires contributor attention. type/enhancement Issue that requests a new feature or improvement. labels Feb 4, 2022
@jkutner
Copy link
Member

jkutner commented Feb 6, 2022

I like this ideal of configurable intervals, but something as simple as the following might solve the 90% case:

  • --pull-policy hourly
  • --pull-policy daily
  • --pull-policy weekly

@edmorley
Copy link
Contributor Author

edmorley commented Feb 12, 2022

I like this ideal of configurable intervals, but something as simple as the following might solve the 90% case:

I like keeping this simpler - those names sound great to me.

How should the state of last pull time for each image be tracked?

I'm presuming there is no way to find out when a specific image was last pulled from Docker (even ignoring alternative implementations), since:

  • Docker seems to only track the image created time (at least by looking at docker inspect), when we need the pull time instead
  • Builder images have their timestamps normalised anyway (eg "Created": "1980-01-01T00:00:01Z",)

As such, Pack CLI will have to track the state somewhere itself, which leads to the following questions:

  1. Where should that place be? I presume we wouldn't want to taint ~/.pack/config.toml with non-settings related state, so perhaps an additional file in that pack directory?
  2. Do we also need to think about purging old entries in this file (eg if the max policy is weekly, then anything with a timestamp older than that is redundant, and could be dropped), or not worth it?

I'm presuming this would be implemented around here:

switch options.PullPolicy {
case PullNever:
img, err := f.fetchDaemonImage(name)
return img, err
case PullIfNotPresent:
img, err := f.fetchDaemonImage(name)
if err == nil || !errors.Is(err, ErrNotFound) {
return img, err
}
}
f.logger.Debugf("Pulling image %s", style.Symbol(name))
err = f.pullImage(ctx, name, options.Platform)
if err != nil && !errors.Is(err, ErrNotFound) {
return nil, err
}

@jromero
Copy link
Member

jromero commented Feb 16, 2022

Personally, I'm all for adding new pull policies. Preferably those defined by Joe.

I'm a little more hesitant with "Make this new pull policy the default.". I think we should discuss the potential of unexpected behavior for end-users. If we limit this issue and its PR to simply adding new pull policies, that's a big 👍 from me.

Where should that place be? I presume we wouldn't want to taint ~/.pack/config.toml with non-settings related state, so perhaps an additional file in that pack directory?

Agree with some other file in ~/.pack.

I wonder if it's worth considering using a proper file-based db like sqlite or if that would be overkill. (Partly trying to look forward at other features that we've thought about that might also require persistence like storing OCI annotations associated to images on the daemon).

Do we also need to think about purging old entries in this file (eg if the max policy is weekly, then anything with a timestamp older than that is redundant, and could be dropped), or not worth it?

If we had sqlite, this purging would also be trivial. :)

@jromero jromero added status/discussion-needed Issue or PR that requires in-depth discussion. and removed status/triage Issue or PR that requires contributor attention. labels Feb 16, 2022
@jromero
Copy link
Member

jromero commented Feb 16, 2022

@dfreilich I set this to status discussion-needed but if you have some thoughts that make this implementable please feel free to change it accordingly.

edmorley added a commit to heroku/libcnb.rs that referenced this issue Mar 7, 2022
Since it:
1. Saves ~2 seconds per integration test re-pulling an image that's already up to date.
2. Helps prevent hitting Docker Hub or ECR rate limits from duplicate (and redundant)
   image pulling (that counts agains the rate limit even when it's a no-op).

Longer term, if Pack CLI supports a periodic pulling mode (buildpacks/pack#1368), we
can switch to that, however for now this is the lesser of two evils - and in most cases
Pack usage outside of `libcnb-test` will ensure that newer builder images are pulled
from time to time.

Fixes #306.
edmorley added a commit to heroku/libcnb.rs that referenced this issue Mar 7, 2022
Since it:
1. Saves ~2 seconds per integration test re-pulling an image that's already up to date.
2. Helps prevent hitting Docker Hub or ECR rate limits from duplicate (and redundant)
   image pulling (that counts agains the rate limit even when it's a no-op).

Longer term, if Pack CLI supports a periodic pulling mode (buildpacks/pack#1368), we
can switch to that, however for now this is the lesser of two evils - and in most cases
Pack usage outside of `libcnb-test` will ensure that newer builder images are pulled
from time to time.

See #306 for more details.

See also:
https://buildpacks.io/docs/tools/pack/cli/pack_config_pull-policy/

Fixes #306.
GUS-W-10799284.
@jjbustamante
Copy link
Member

@edmorley

I think this will be a great feature to be included into pack but because of the complexity and effort of what is asking for, I will put this issues as good for mentorship and we can propose it to GSoC or LFX program in the future or if someone else from the community want to work on this we will be more than happy to guide them.

@jjbustamante jjbustamante added status/ready Issue ready to be worked on. good for mentorship A good issue for a mentorship project. and removed status/discussion-needed Issue or PR that requires in-depth discussion. labels Aug 17, 2023
@natalieparellano natalieparellano added the help wanted Need some extra hands to get this done. label Aug 18, 2023
@Parthiba-Hazra
Copy link
Contributor

Parthiba-Hazra commented Jan 19, 2024

Hi everyone, @jjbustamante @edmorley @jkutner I want to work on this issue, will need some guidance from you guys.
I know it needs some moderate level of effort, I can do that, if I get some help from you here.

@jjbustamante
Copy link
Member

Hi @Parthiba-Hazra, I was taking a look in your code but I am going to put my thoughts here first.

Note 🧵 with some discussion can be found here

From the slack thread

what if user delete some image from the local registry - If a user deletes an image from the local registry, we need to decide whether to retain or remove the log entry for that image from the JSON file.

I think that if we were not supposed to pull, because we haven't reached the amount of time after the last pull, and we expect to fetch the image from the daemon, but it is not there, because the end-user deleted.

  • I will remove the log entry from the json file
  • I will throw the not found error, that execution must fail, but next time the image will be pull

@Parthiba-Hazra
Copy link
Contributor

I think that if we were not supposed to pull, because we haven't reached the amount of time after the last pull, and we expect to fetch the image from the daemon, but it is not there, because the end-user deleted.

  • I will remove the log entry from the json file
  • I will throw the not found error, that execution must fail, but next time the image will be pull

yeah got it.

@jjbustamante jjbustamante modified the milestones: 0.34.0, 0.35.0 Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good for mentorship A good issue for a mentorship project. help wanted Need some extra hands to get this done. status/ready Issue ready to be worked on. type/enhancement Issue that requests a new feature or improvement.
Projects
None yet
6 participants