Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are digests necessary/ #402

Closed
vsoch opened this issue Jun 8, 2021 · 12 comments
Closed

Are digests necessary/ #402

vsoch opened this issue Jun 8, 2021 · 12 comments

Comments

@vsoch
Copy link
Member

vsoch commented Jun 8, 2021

This is a question @alecbcs and I are discussing - given that a user pulls shpc once and then wants to install the latest of a container, this would technically require a git pull from main here, as the digests won't update themselves. But what if we (instead of trying to pull a digest) just pull the tag? A user could always pull to get the latest version of a specific tag known as "latest" in the container.yaml file. The main issue in this respect is that the digest in that file becomes outdated / useless - the install would always get the latest digest regardless. But this is actually how Podman is currently implemented because if you pull a digest with podman there is no reference to the tag (and we can't manage it in the namespace so it doesn't work). Thus we always just install / pull the tag.

I'm also wondering what supporting scripts we can provide to the user to keep the registry up to date. Having a command that controls git feels a little too controlling for me, but I could maybe see a script that we suggest for the user to use as a cron job. Or we could do an shpc update command that does a git pull, checks for installed containers, and updates them all. What do you think @alecbcs @marcodelapierre? I do think I want to improve upon the current ability to easily update a registry, beyond git pull.

@marcodelapierre
Copy link
Contributor

Thanks for sharing these thoughts @vsoch .
I promote a lot the usage of containers for reproducible computational sciences workflows.

Therefore, I tend to prioritise reproducibility (of installations in this case) over automation. In the context of your question, I prefer digests over tags. This means that the final user has the best guarantee to be running the same container. If they want to update, they need to be willing to do it, and so I am happy if they have to manually update the registry - they're under control for what concerns reproducibility.
With tags, the underlying digest could change under the hood, hurting reproducibility.
I also advise users against using any "latest" label, unless they know what they're doing.

But this is actually how Podman is currently implemented because if you pull a digest with podman there is no reference to the tag (and we can't manage it in the namespace so it doesn't work). Thus we always just install / pull the tag.

Ah! I hadn't noticed that! Probably worth a note in the documentation!

@marcodelapierre
Copy link
Contributor

Some non-conclusive thoughts on help scripts/utilities for updating the downloaded registry..

I'm also wondering what supporting scripts we can provide to the user to keep the registry up to date. Having a command that controls git feels a little too controlling for me, but I could maybe see a script that we suggest for the user to use as a cron job. Or we could do an shpc update command that does a git pull, checks for installed containers, and updates them all.

For the reasons of reproducibility stated above, I would decouple updating the registry from updating the containers - they should be two distinct steps, actually three:

  1. utility/sub-command to update registry - list all available container updates
  2. utility/sub to review such list later on
  3. utility/sub to update those containers that can be updated (all or some)

I haven't reviewed that, I am assuming that right now SHPC can still manage installed containers when their tags/digests mismatch compared to the registry information...?

Not sure whether a cron job is needed, as I don't think a local registry would need to be updated that often (@vsoch do you know if Spack has that?)

In fact, in terms of usage scenarios,

A. system administrators: at our site we update versions of scientific software no more than every 2-3 months (except for buggy releases). So a manual registry update (utility or git pull), followed by manual review of update-able containers, followed by container updates would do (basically points 1 to 3 above). Here I am assuming the pool of installed containers is relatively small, a few tens of them.

B. end users: they would probably need it even less often and on fewer containers, so again a manual registry update would do.

I might be missing something in my writings, I am at the end of a very long work day today :-)

@vsoch
Copy link
Member Author

vsoch commented Jun 9, 2021

@marcodelapierre I'm thinking about the Podman issue - I'm wondering if we can pull by digest and then explicitly tag? I'll test that today.

@vsoch
Copy link
Member Author

vsoch commented Jun 9, 2021

utility/sub-command to update registry - list all available container updates
utility/sub to review such list later on
utility/sub to update those containers that can be updated (all or some)

I like this list! We can work more on shpc check (and maybe shpc update) to support that, not in this current PR but a future one.

I haven't reviewed that, I am assuming that right now SHPC can still manage installed containers when their tags/digests mismatch compared to the registry information...?

Yes that's correct. The issue came up because a colleague of @alecbcs was using shpc and was surprised the update didn't happen automagically when the container was updated in autamus.

@marcodelapierre
Copy link
Contributor

@marcodelapierre I'm thinking about the Podman issue - I'm wondering if we can pull by digest and then explicitly tag? I'll test that today.

this is a super-cool idea!

whats happens at uninstall? I guess shpc must issue 2 podman rm commands, one against the digest and one against the tag?

@vsoch
Copy link
Member Author

vsoch commented Jun 9, 2021

Oh good question - you're right, I just tested this and rmi of a tag only removes the tag! Let me update the PR to remove the image as well.

@vsoch
Copy link
Member Author

vsoch commented Jun 9, 2021

okay I think we are good! rmi with --force appears to delete the tag too. Docker has the flag too so it should work! Here is my testing:
Make sure we don't have python:3.9.5-alpine

$ podman images
REPOSITORY                TAG           IMAGE ID      CREATED       SIZE
docker.io/library/python  3.9.2-alpine  123823915946  2 months ago  47.6 MB
docker.io/library/python  3.9.2-slim    972ef8de24a4  2 months ago  120 MB
docker.io/vanessa/salad   <none>        daf4e5a88131  2 years ago   8 MB

Install it (note we are pulling by digest now)

$ shpc install --container-tech podman python
Trying to pull docker.io/library/python@sha256:02311d686cd35b0f838854d6035c679acde2767a4fd09904e65355fbd9780f8a...
Getting image source signatures
Copying blob d037ddac5dde done  
Copying blob 629719f9106a done  
Copying blob 0faf4e7f2207 done  
Copying blob f9ef3a05a91e done  
Copying blob 540db60ca938 done  
Copying config 2d64a2341b done  
Writing manifest to image destination
Storing signatures
2d64a2341b7c19d37cc1262280756014b071eb187e4605c6cc9f749311a3d0b8
Module python:3.9.5-alpine was created.

Make sure it was pulled (and tagged!)

$ podman images
REPOSITORY                TAG           IMAGE ID      CREATED       SIZE
docker.io/library/python  3.9.5-alpine  2d64a2341b7c  2 weeks ago   48 MB
docker.io/library/python  3.9.2-alpine  123823915946  2 months ago  47.6 MB
docker.io/library/python  3.9.2-slim    972ef8de24a4  2 months ago  120 MB
docker.io/vanessa/salad   <none>        daf4e5a88131  2 years ago   8 MB

Uninstall it

$ shpc uninstall --container-tech podman python
Are you sure you want to uninstall $module_base/python, and all content below it? y
$module_base/python and all subdirectories been removed.

Make sure it's gone

$ podman images
REPOSITORY                TAG           IMAGE ID      CREATED       SIZE
docker.io/library/python  3.9.2-alpine  123823915946  2 months ago  47.6 MB
docker.io/library/python  3.9.2-slim    972ef8de24a4  2 months ago  120 MB
docker.io/vanessa/salad   <none>        daf4e5a88131  2 years ago   8 MB

@vsoch
Copy link
Member Author

vsoch commented Jun 9, 2021

I think with this fix we can keep digests as they are now! 🎉

@marcodelapierre
Copy link
Contributor

Wow! Fantastic! :-)

@marcodelapierre
Copy link
Contributor

marcodelapierre commented Jun 10, 2021

On the automation utilities, I think it's great you are getting feedback from people with various deployment scenarios, it will help make SHPC more general purpose.
Although here at Pawsey, we might not need high automation, I appreciate that in frameworks like Autamus that would be highly beneficial.

@vsoch
Copy link
Member Author

vsoch commented Jun 10, 2021

Agreed! And I think probably what we will do is figure out some clear set of commands or extra script to explicitly add to allow for it, but it will never be a default for that reason.

@vsoch
Copy link
Member Author

vsoch commented Jun 11, 2021

This will be closed by #403

@vsoch vsoch closed this as completed Jun 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants