Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document eStargz with zstd compression instead of only gzip #1596

Open
aochagavia opened this issue Mar 6, 2024 · 4 comments
Open

Document eStargz with zstd compression instead of only gzip #1596

aochagavia opened this issue Mar 6, 2024 · 4 comments
Labels
documentation Improvements or additions to documentation

Comments

@aochagavia
Copy link

aochagavia commented Mar 6, 2024

Currently, the docs on the structure of eStargz mention that layers are compressed using gzip. However, as far as I understand, eStargz also supports zstd as a compression mechanism (it is mentioned in the nerdctl docs, though they call it zstdchunked).

It would be great to update the docs to reflect zstd support. Specifically, the following questions need to be answered when using zstd as the compression format:

  1. Are gzip blobs replaced by zstd frames (as defined in the zstd specification)? Are chunks also zstd frames?
  2. How does the footer look like? The docs define it in terms that are closely tied to gzip and it's not clear to me how it translates to zstd.

For context, I'm working on an image bakery application in Rust and I want to support eStargz with zstd compression.

@ktock
Copy link
Member

ktock commented Mar 6, 2024

It would be great to update the docs to reflect zstd support.

SGTM

Are gzip blobs replaced by zstd frames (as defined in the zstd specification)? Are chunks also zstd frames?

Yes

How does the footer look like? The docs define it in terms that are closely tied to gzip and it's not clear to me how it translates to zstd.

- zstd skippable frame header (64bits)
- TOC offset (64bits)
- zstd compressed TOC length (64bits)
- Uncompressed TOC length (64bits)
- manifest type (=1) (64bits)
- zstd:chunked magic number (64bits)

Please see also source code :

// zstdFooterBytes returns the 40 bytes footer.
func zstdFooterBytes(tocOff, tocRawSize, tocCompressedSize uint64) []byte {
footer := make([]byte, FooterSize)
binary.LittleEndian.PutUint64(footer, tocOff)
binary.LittleEndian.PutUint64(footer[8:], tocCompressedSize)
binary.LittleEndian.PutUint64(footer[16:], tocRawSize)
binary.LittleEndian.PutUint64(footer[24:], manifestTypeCRFS)
copy(footer[32:40], zstdChunkedFrameMagic)
return footer
}
func appendSkippableFrameMagic(b []byte) []byte {
size := make([]byte, 4)
binary.LittleEndian.PutUint32(size, uint32(len(b)))
return append(append(skippableFrameMagic, size...), b...)
}

Or discussion threads: containers/storage#775 #293

@ktock ktock added the documentation Improvements or additions to documentation label Mar 6, 2024
@aochagavia
Copy link
Author

Perfect, thanks! Once I have a clear idea of how this all works I might open a PR to update the docs 👍

@aochagavia
Copy link
Author

I'm not yet confident enough in my knowledge to update the docs, but here's some relevant information for whoever is interested in creating an independent implementation of eStargz + zstd:

  • When using zstd, the TOC is not included in the layer's tar archive. Instead, it's included as a raw string inside a skippable frame. This diverges from what gzip does (according to the diagram in the docs).
  • As a consequence of the previous point, the tocOffset doesn't point to a tar header (there is none). Instead, it points directly to the start of the TOC's JSON.

@aochagavia
Copy link
Author

aochagavia commented Mar 27, 2024

Here's another bit of information (not sure whether it's zstd-specific or also applies to gzip): in the TOC, the offset field for .no.prefetch.landmark doesn't link to the beginning of the header in the tar archive, but links directly to its body instead (header and body are each compressed in their own zstd frame to allow this). This diverges from the way other files are handled (normally the header and the body are inside the same zstd frame and the offset points to the beginning of the frame).

Update: my comment above is incorrect. For all files (including the landmark), the offset indeed links directly to the start of the compressed body, which is obvious because any non-empty body gets its own zstd frame. Sorry for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants