zstd:chunked blocker: TarSplitChecksumKey not used in a layer ID #1888

mtrmac · 2024-04-13T15:22:07Z

@giuseppe Zstd:chunked has uses a TarSplitChecksumKey annotation, separate from the TOC digest.

That’s fine as far as per-image individual trust goes, because the annotation is authenticated by the manifest digest.

But we deduplicate individual layers by the TOC digest alone; so it is possible for two images to use a layer with the same TOC digest, but different tar-split locations and contents (e.g. one with an added set-UID bit somewhere).

Locally, it seems by far easiest to move the tar-split digest into the TOC. That way the by-now-widespread assumption that the TOC digest is a sufficient identifier can be maintained.

For other purposes, like the BlobInfoCache and metadata-only reuse, the value returned from toc.GetTOCDigest is used both for reuse/match lookups, so whatever this function returns must be a complete identifier; it should be unambiguous, and we can never add more annotations that change the semantics of a layer with a matching GetTOCDigest value. Even if we replaced GetTOCDigest now with something that accounts for tar-split, we need this guarantee to be forward-compatible (consider pulling an image just before a new annotation type is added, upgrading, and then pulling another just after). I think that also argues for a long-term design of keeping everything anchored to a single digest value.

A bit more generally, I’ve been wondering whether the TOC digest is a good identifier long-term.

The way things are now, we basically can’t significantly change the format of the TOC, otherwise we risk the possibility of the same TOC blob being valid for both the old and new formats, allowing an attacker to trigger deduplication between layers with the same TOC digest but different TOC formats and differently-understood contents.

For example, we just can’t support a new partial-pull format with some other JSON-based TOC format if it could be constructed ambiguously with the zstd:chunked TOC format. I’m not 100% sure this is worth worrying about — but if it were ever to happen, we would need a fairly large restructuring, or at least renaming/re-documenting. So I’m raising it right now, in case such things were expected. (E.g. will this be required for composeFS?)

The text was updated successfully, but these errors were encountered:

mtrmac · 2024-04-13T15:43:46Z

Maybe one way to think about this is that I think it would be clean if #1887 were extended further, so that

(some future variant of) toc.GetTOCDigest returned a “complete” format+identity identifier (of a kind which is long-term stable across new added feature and new format variants)
that complete identifier were passed to chunked.GetDiffer, which just “executed” it without making any decision what format/variant to consume
that complete identifier were also used in everywhere we currently use “TOC digest”, i.e. in c/storage metadata/lookups, c/image BlobInfoCache as in WIP: Record (TOC digest → DiffID) mapping in BlobInfoCache image#2321, and so on.

Yes, I realize it’s very late to change all this. But if we knew that we would need to do it in the near future, I think there’s a strong argument to do it ASAP.

giuseppe · 2024-04-15T08:41:12Z

The way things are now, we basically can’t significantly change the format of the TOC, otherwise we risk the possibility of the same TOC blob being valid for both the old and new formats, allowing an attacker to trigger deduplication between layers with the same TOC digest but different TOC formats and differently-understood contents.

how can this happen? If we modify the TOC in any way, isn't safe to just assume the digests are different and they are treated as two different layers? A layer could have two variants with different TOCs, even today just by sorting the entries in a different way but they are treated as two different entities. Isn't this enough?

mtrmac · 2024-04-15T16:06:55Z

Consider things like

{
   "version": 1,
   "entries": [/*v1 entries */]
   "someOtherVendorVersion": "2.0",
    "someOtherVendorEntries": [/*…*/]
}

A v1 consumer will see a valid v1, a sameOtherVendor consumer will see a valid v2. But they are the same blob = same TOC digest = right now, same c/storage layer.

And which consumer is used probably depends on MIME type and annotations in the manifest.

Or, alternatively, consider what if “other” version uses a non-JSON format where it happens to be possible to make a single file valid in both formats.

(To be fair, the existing "version" field is a fairly strong countermeasure if everything uses JSON. Maybe we can make that a requirement.)

mtrmac · 2024-04-15T16:08:56Z

(some future variant of) toc.GetTOCDigest returned a “complete” format+identity identifier (of a kind which is long-term stable across new added feature and new format variants)

An unambiguous hash of (layer MIME type, all annotations) might work. Assuming manifests have per-layer MIME types and annotations :)

(We could also add the layer digest.) Of course the more values we add into the identifier, the fewer opportunities for reuse.

giuseppe · 2024-04-18T10:18:45Z

Would it be enough to just prepend the format used to the hashed data? e.g. the TOC Digest can be sha256sum("zstd:chunked:$version" + $tocData), that should be enough to protect from the same TOC being used by different formats. What do you think?

In practice the metadata file used by estargz and zstd:chunked is almost the same, so stuff like chunk offset is represented in the same way. There is not a way that some metadata is used by one of the two formats and ignored by the other, so in principle these two cannot conflict.

mtrmac · 2024-04-18T13:33:51Z

To be a bit more explicit, the primary thing that I need solving here is that TarSplitChecksumKey is not participating in the TOC digest. That can be done by hashing the annotations, or by moving TarSplitChecksumKey from the annotations into the TOC.

I agree the “ambiguous TOC format” concern is, short term, not urgent, because there is no ambiguity between estargz and zstd:chunked.

Quite reasonably we can just let things stay as they are, and decide to deal with that if/when a possibly-ambiguous TOC format is ever added.

OTOH if we can solve both the TarSplitChecksumKey and format ambiguity cheaply in one place, I wouldn’t mind doing that now.

I’m bringing this up primarily to benefit from your expertise about ComposeFS etc. — is a new TOC format something you expect?

WRT the specifics, sha256sum("zstd:chunked:$version" + $tocData), it would be more efficient if the identifier could be computed just from the manifest data, without having to read the TOC itself (extra I/O and extra roundtrips, + the TOC can be somewhat large). So sha256sum(unambiguousTuple("zstd:chunked", $version, $tocDigest)) would be better than raw $tocData. All that is assuming there are no other security-relevant annotations.

mtrmac changed the title ~~zstd:chunked TarSplitChecksumKey not used in a layer ID~~ zstd:chunked blocker: TarSplitChecksumKey not used in a layer ID Apr 13, 2024

mtrmac mentioned this issue Apr 13, 2024

WIP: Record (TOC digest → DiffID) mapping in BlobInfoCache containers/image#2321

Draft

This was referenced Apr 22, 2024

RFC: Move the tar-split digest into the TOC #1902

Draft

[skip-ci] Packit: New workflow for downstream Fedora and CentOS Stream 10 containers/common#1960

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zstd:chunked blocker: TarSplitChecksumKey not used in a layer ID #1888

zstd:chunked blocker: TarSplitChecksumKey not used in a layer ID #1888

mtrmac commented Apr 13, 2024

mtrmac commented Apr 13, 2024 •

edited

giuseppe commented Apr 15, 2024

mtrmac commented Apr 15, 2024

mtrmac commented Apr 15, 2024

giuseppe commented Apr 18, 2024

mtrmac commented Apr 18, 2024

zstd:chunked blocker: TarSplitChecksumKey not used in a layer ID #1888

zstd:chunked blocker: TarSplitChecksumKey not used in a layer ID #1888

Comments

mtrmac commented Apr 13, 2024

mtrmac commented Apr 13, 2024 • edited

giuseppe commented Apr 15, 2024

mtrmac commented Apr 15, 2024

mtrmac commented Apr 15, 2024

giuseppe commented Apr 18, 2024

mtrmac commented Apr 18, 2024

mtrmac commented Apr 13, 2024 •

edited