Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subject field ought to reference any digest #1012

Open
vbatts opened this issue Feb 6, 2023 · 12 comments
Open

subject field ought to reference any digest #1012

vbatts opened this issue Feb 6, 2023 · 12 comments

Comments

@vbatts
Copy link
Member

vbatts commented Feb 6, 2023

(creating an issue from #999 (comment))
https://github.com/opencontainers/image-spec/blob/a7ac485/manifest.md?plain=1#L70

currently the subject field in ./artifact.md and ./manifest.md says it can only point to "another manifest".

This seems unnecessarily limiting.

I'm sure there was conversation around this. I know I was in a call where I voiced strongly in favor of allowing pointing to any object.

As an ISV or content producer, I may want to put an artifact containg signature/attestation/whatever for say a specific layer in an image. This way anyone can build FROM that original image set of layers, and not lose that reference to one of the layers, because the original image manifest will no longer be relevant.
hypothetical example:

  • Red Hat publishes there base RHEL (or UBI) image
  • Some ISV publishes their database on this certified base layer
  • Some customer uses this database

if the referencing subject can only point at a manifest, then after the first FROM, the end user deployments can not easily discover them without traversing or something complicated.
Where as, allowing a publisher to say point their signature at the layer digests themselves, now would allow users to naturally discover the stack of referenced objects for all the layers/objects.

@sudo-bmitch
Copy link
Contributor

How would a client know what API to use to request and parse the response from an arbitrary digest? Do clients need to maintain a list of manifest media types (what happens to old clients when a new type is added)?

The big issue is if I can request referrers to any blob, that also means it's possible to create logical loops with an image/artifact manifest that has both a subject and layer pointing to the same blob. An image has a layer, the layer has referrers to the image, the image has a layer, repeat.

This also scales up the number of API calls I need to make in the common use cases. When looking for referrers to an image, I would need to check both the manifest and every layer and the config blob. When recursively deep-copying an image, I would need to check every blob in addition to every manifest for referrers.

@vbatts
Copy link
Member Author

vbatts commented Feb 6, 2023 via email

@afflom
Copy link

afflom commented Feb 6, 2023

Very cool. This issue has been part of the focus of my work for the past year and I'm excited to see this issue being raised.

If we are saying any object, then the content could exist in any registry or any namespace within a registry, which brings up needing a standard for cross-namespace references.

I'm hoping to see this addressed from the work here: oras-project/artifacts-spec#72

@imjasonh
Copy link
Member

imjasonh commented Feb 6, 2023

if the referencing subject can only point at a manifest, then after the first FROM, the end user deployments can not easily discover them without traversing or something complicated.

Base image annotations can help here. An image FROM some.signed.image can retain enough information about some.signed.image to discover that base image's signatures, etc., without having to be able to sign its individual layers. Indeed, these annotations are already in use today to enforce that images are built from signed base images, for example.

Also potentially complicating this issue, the subject doesn't have to refer to a manifest that exists -- we were very clear and all agreed that you should be able to push a signature for an image manifest that hasn't been pushed yet, or retain the signatures for images that have been deleted/GCed. So subject referring to "a manifest" doesn't mean "the registry must make sure that manifest exists", just that it's intended to refer to a manifest.

Signing individual blobs didn't come up at all as a use case in the WG, AFAIK. If that's something we're interested in supporting, I think we should discuss it more, but I don't think it should be considered a blocker for v1.1. AIUI if we wanted to let subjects point to manifests or blobs or other, that would be a fairly limited change to the spec, but one that we should discuss a lot more before adopting, mainly due to @sudo-bmitch 's cycle concerns.

@sudo-bmitch
Copy link
Contributor

You couldn't create this loop without a random guess of the hash. Unless you're meaning something else entirely. This does not sound like a real issue.

@vbatts Here's a logical loop, no digest guessing required:

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "size": 3101,
    "digest": "sha256:c621799bcec256bf9be20c1998aa087fcdb0bb7fff5c10a3df968eeda987906c"
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 85,
      "digest": "sha256:3d67ddc212ffba510628b93c0936f90dabcab9993f095cc1899fb1bcbe86b42a"
    }
  ],
  "subject": {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "size": 85,
      "digest": "sha256:3d67ddc212ffba510628b93c0936f90dabcab9993f095cc1899fb1bcbe86b42a"
    }
}

Walk the manifest, to the layers, to all referrers to the layer (effectively walking the subject link in reverse), back to the manifest, and repeat.

@vbatts
Copy link
Member Author

vbatts commented Feb 6, 2023 via email

@sudo-bmitch
Copy link
Contributor

Wouldn't this concern only be an implementation concern?
We could easily say "the subject MAY NOT refer to a digest that exists in
the layers/blobs" or similar.

This can also involve multiple or cross references, because you don't need to guess the manifest digest to predict blob digests.

blobA: sha256:aaaa
blobB: sha256:bbbb

Image 1: layer with blobA, subject with blobB
Image 2: layer with blobB, subject with blobA

@jonjohnsonjr
Copy link
Contributor

I will continue to remain skeptical of the GC cycle argument until we stop drawing the arrows backwards.

@sudo-bmitch
Copy link
Contributor

GC is one thing that probably doesn't like cycles, but this also applies to anything that recursively walks the graph. The purpose of referrers is to find artifacts that refer to a manifest and treat them as a child of the manifest. That can be anything performing a deep copy, a UI showing the multi platform image with associated artifacts in a filesystem like tree, and probably a bunch of use cases I haven't considered.

@imjasonh
Copy link
Member

imjasonh commented Feb 7, 2023

I still haven't heard a strong use case for attaching manifests to blobs, and one didn't come up in the entire WG discussion about references.

I'd like to reiterate my position that we can punt on attaching to blobs until demand arises, and keep the scope of v1.1 at its current size. We retain the flexibility to allow references to blobs in a future release, with the benefit of experience about how it's used in practice for manifests in v1.1.

@vbatts
Copy link
Member Author

vbatts commented Feb 7, 2023 via email

@sudo-bmitch
Copy link
Contributor

Can we close this as unplanned? We've managed to add logical loops into other parts of the spec, so that objection is no longer valid. But my question of how a client would know which registry API to query remains. In other parts of the spec, a descriptor reference is either a blob or a manifest, but not both.

Other concerns I have include the lack of a use case showing a real need for the change, and the API overhead to perform a deep copy of a manifest and all the referrers. As an example, for an image with 7 platforms, and 3 referrers per platform, I'm already up to 29 referrers API calls to copy the image (1 index, 7 images, 21 artifacts). If each image had 10 layers, that would add another 70 API calls to the registry to copy the image even if not a single layer had any referrers.

One final consideration is whether the changes to the image manifest spec, allowing it to be used for a single layer, cover the use cases being considered here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants