Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

squashfs: passthrough squashfs compressed layers #1408

Closed
wants to merge 1 commit into from

Conversation

rchincha
Copy link

@rchincha rchincha commented Nov 17, 2021

squashfs blobs are internally compressed, so bypass compression/decompression

This is a follow-up to discussions in:
#947
#957

squashfs blobs are internally compressed, so bypass
compression/decompression

Signed-off-by: Ramkumar Chinchani <rchincha@cisco.com>
@tych0
Copy link
Contributor

tych0 commented Feb 3, 2022

Ping?

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping. This one slipped through!

Code LGTM, @mtrmac @giuseppe PTAL

@giuseppe
Copy link
Member

giuseppe commented Feb 3, 2022

should the compression algorithm just be called Noop or None? While it is used for squashfs, it seems more generic than that

@tych0
Copy link
Contributor

tych0 commented Feb 3, 2022

Can it be generic though? Doesn't the test against the squashfs byte header mean that it's squashfs specific?

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 3, 2022

In the first place, It’s not immediately obvious that recognizing Squashfs as blob compression format is the right thing to do here — Squashfs is a file format, not a compression format.

Maybe we should only compress recognized MIME types, or autodetected and recognized content types. Also consider OCI artifacts, at least conceptually.

(This is not an opinion on the PR, I haven’t thought about this carefully yet — this PR might well be the right approach; I’d just like to prevent the discussion from following the “no-op compression” train of thought purely by default, without considering alternatives.)

@tych0
Copy link
Contributor

tych0 commented Feb 3, 2022

Maybe we should only compress recognized MIME types, or autodetected and recognized content types. Also consider OCI artifacts, at least conceptually.

According to the OCI spec, we should only be compressing recognized content types. I haven't read the docker spec, but perhaps it has the same mandate?

In any case, we have been maintaining a fork with some version of this patch for over two years, and have tried to upstream three different versions of a patchset that would solve the problem for us. It would be great if we could converge on at least one solution. This one is by far the smallest and least invasive of the three.

@giuseppe
Copy link
Member

giuseppe commented Feb 3, 2022

Sorry for the bikeshedding. I am fine with the current approach (I am also guilty of adding zstd:chunked that is not a compression format as well :))

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the principle of not representing the true state of things, this actually has side effects in copyBlobFromStream: because isCompressed is set, this can allow decompression (on decompressed-only transports or copy --dest-decompress) — and even on a trivial no-op copy, BlobInfo.CompressionAlgorithm is set, triggering a MIME type update.

And any such update would fail because we have no MIME type mapping for the SquashFS “compression algorithm”.

Right now, for SquashFS-only images, all that is hidden by the layerDigestsDiffer logic if the layers aren’t modified, but that’s not a reasonable design (e.g. combining a tar layer and a SquashFS layer in one image could trigger these properties on a change of the tar layer).

Testing with the magic numbers changed so that the SquashFS path triggers on ordinary gzip images EDIT and with layerDigestsDiffer nerfed:

% bin/skopeo --override-os linux copy --debug --insecure-policy docker://quay.io/libpod/busybox dir:t               

DEBU[0002] GET https://quay.io/v2/libpod/busybox/blobs/sha256:9758c28807f21c13d05c704821fdd56c0b9574912f9b916c65e1df3e6b8bc572 
DEBU[0003] Detected compression format squashfs         
DEBU[0003] blob sha256:9758c28807f21c13d05c704821fdd56c0b9574912f9b916c65e1df3e6b8bc572 with type application/vnd.docker.image.rootfs.diff.tar.gzip should be compressed with gzip, but compressor appears to be squashfs 
DEBU[0003] Using original blob without modification     

FATA[0003] creating an updated image manifest: preparing updated manifest, layer "sha256:9758c28807f21c13d05c704821fdd56c0b9574912f9b916c65e1df3e6b8bc572": unknown compressed with algorithm squashfs variant for type application/vnd.docker.image.rootfs.diff.tar.gzip 

In my experience, such technical debt never disappears (compare the sad history of InternalUnstableUndocumentedMIMEQuestionMark ).

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 3, 2022

It would be great if we could converge on at least one solution.

OK, so let’s figure this out.

  • Whether we can even try to compress/decompress a blob is not primarily a property of a destination transport (contra Detect layer compression #957), but a property of the image format (schema1 / OCI image / some specific OCI artifact). So, that’s something like types.Image.CanChangeLayerCompression(mimeType string)
    • … except that, just like Detect layer compression #957, we can’t just add a method to types.Image.
    • Luckily, we don’t actually need to update the public interface. We only care about the user in c/image/copy, which comes from image.FromUnparsedImage. All other references to types.Image in the repo produce it, but don’t consume externally-produced objects. So, we can effectively abandon that interface, add an internal/image.FromUnparsedImage (which returns an internal interface, or just a non-interface SourcedImage struct pointer).
      • At this point, I’d lean towards all API around the “new version of types.Image” being internal-only, primarily because there are quite a few warts in the design of Image.UpdatedImage that we’d want to rectify before committing to a new stable version, and short-term we will need other additions for Cosign signatures. That would make c/image/image a very dumb wrapper around the internal objects, luckily the API surface of c/image/image itself is very small.
  • The format implementations in c/image/image would just delegate to manifest handlers in c/image/manifest. Those are just structs, not interfaces, so we can add a new CanChangeLayerCompression method directly.
    • Ultimately, the data already exists in the []compressionMIMETypeSet tables, so a shared helper in c/image/manifest/common.go could do that, based on those tables along with an indication whether to reject or silently accept unrecognized MIME types.
  • The uploadCompressionFormat value in copyBlobFromStream needs to be kept nil for unrecognized formats, so that we don’t trigger the MIME update table lookups.

How’s that for a plan?


Now, the infrastructure of creating an internal-only version of types.Image is not hard, but it’s some multiple :) of the size of this PR. That is going to happen Real Soon Now anyway, (for Cosign support), (and yes, I know I said something similar in #957 ) — see #1439 for a related draft and a record of design concerns.

While the making internal-only versions of ImageSource/ImageDestination has some serious dependencies and unknowns, just making a private new version of image.FromUnparsedImage with this new “can change layer compression” method can happen right now — so if you are interested in doing that work, it would be very welcome. Alternatively, it should happen in a few weeks, for our primary current project.

@mtrmac mtrmac mentioned this pull request Feb 3, 2022
@tych0
Copy link
Contributor

tych0 commented Feb 4, 2022

How’s that for a plan?

As with most plans, it sounds good :). I'll look forward to following #1439.

@mtrmac
Copy link
Collaborator

mtrmac commented Jun 17, 2022

I believe this is all now implemented in #1574 : layers in non-image OCI artifacts, and layers with unrecognized MIME types, are not compressed/decompressed.

Testing would be appreciated; if something about it doesn’t work, we’d love to hear.

@mtrmac mtrmac closed this Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants