New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Record (TOC digest → DiffID) mapping in BlobInfoCache #2321
base: main
Are you sure you want to change the base?
Conversation
// UncompressedDigest returns an uncompressed digest corresponding to anyDigest. | ||
// Returns "" if the uncompressed digest is unknown. | ||
// FIXME: Does this need to record TOC/compression type? | ||
UncompressedDigestForTOC(tocDigest digest.Digest) digest.Digest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TOC digest is the checksum of the uncompressed JSON document, so I think the compression should not matter in this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we probably don’t need that right now (with GetTOCDigest
refusing to work on manifests which contain multiple TOC digest annotations, and presumably with the zstd / estargz code being unable to decompress the other one).
This comment is a looking a bit more into the future, for lookups in the other direction, where we will want to look up (UncompressedDigest → (compressed digest, TOC digest, algorithm)) and match that against “the user wants the destination to contain zstd:chunked” (i.e. reject estargz matches).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for lookups in the other direction,
That will be done in a separate data structure (an extension of RecordDigestCompressorName
: We need the full set of annotations for reuse of a TOC-compressed blob, so this simple mapping is not sufficient anyway. And the other structure does record the algorithm.
be098a2
to
b14f00b
Compare
d238714
to
6dae67d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: This is code-complete but I want to test it in practice.
// (and we assume the TOC digest also uniquely identifies the contents, i.e. there aren’t two | ||
// different formats/ways to parse a single TOC). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the c/storage+c/image code has been built around this assumption, but it is false currently (containers/storage#1888 ) and I’m not sure whether we need to revisit the design. Let’s discuss that in the c/storage issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this assumption is correct
2a542f7
to
9e3cace
Compare
Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
The new code is not called, so it should not change behavior (apart from extending the BoltDB/SQLite schema). Signed-off-by: Miloslav Trmač <mitr@redhat.com>
…storage by DiffID If we can, prefer identifying layers by DiffID, because multiple TOCs can map to the same DiffID; and because it maximizes reuse with non-TOC layers. For now, the new situation is unreachable. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
We will add one more instance of this, so share the code. Should not change behavior (it does remove one unreachable code path). Signed-off-by: Miloslav Trmač <mitr@redhat.com>
… is known - Multiple TOC values might correspond to a single DiffID (e.g. if different compression levels are used); try to share them all, identified by DiffID (so that we also reuse with non-TOC pulls). - LayersByTOCDigest only uses a single TOC digest per layer; BlobInfoCache allows multiple matches, matches layers which have been since deleted, and potentially matches TOC digests which we have created by pushing but haven't pulled yet. - On reuse, we can now use DiffID-based layer identities even if the reuse was TOC~driven. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
…hole layer This is similar to what putBlobToPendingFile does. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
…yers Signed-off-by: Miloslav Trmač <mitr@redhat.com>
To test: Before: # podman rmi alpine level1 level9
# rm -f /var/lib/containers/cache/blob-info-cache-v1.sqlite
# podman pull quay.io/libpod/alpine
# podman --log-level=debug push --compression-format zstd:chunked --compression-level 1 --force-compression quay.io/libpod/alpine localhost:50000/level1
## Even better would be to use two different destination registries, to be 100% certain the blobs are not reused
## (right now they are not reused, but we’ll fix that):
# podman--log-level=debug push --compression-format zstd:chunked --compression-level 9 --force-compression quay.io/libpod/alpine localhost:50000/level9
## Note the compressed digest, and TOC digest, values:
# skopeo inspect --raw docker://localhost:50000/level1 | jq .
# skopeo inspect --raw docker://localhost:50000/level9 | jq .
## No DigestTOCUncompressedPairs entries:
# sqlite3 /var/lib/containers/cache/blob-info-cache-v1.sqlite .dump
# podman rmi alpine level1 level9
## Triggers a partial pull: "Applying differ in …":
# podman --log-level=debug pull localhost:50000/level1
## Triggers a partial pull: "Applying differ in …"
# podman --log-level=debug pull localhost:50000/level9
## level1 and level9 have different image IDs:
# podman images
## Contains two copies of the layer, with the same expected-layer-diffid
# jq . < /var/lib/containers/storage/overlay-layers/layers.json ``` After:
|
A single DiffID may map to multiple TOC digest values. Record that in
BlobInfoCache
, and use it for layer reuse.Also prefer reusing even TOC-matched layers by DiffID, when available.
@giuseppe I’d appreciate a preliminary review of the new logic; see individual commits.
Draft: The
BlobInfoCache
implementations don’t actually store/record any data yet — so this is obviously completely untested.