Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Squash hard link/opaque whiteout handling #38

Merged
merged 6 commits into from Mar 28, 2024

Conversation

tri-adam
Copy link
Member

@tri-adam tri-adam commented Mar 20, 2024

Improve mutate.Squash so that it handles opaque whiteouts and hard links.

Prior to this PR, oci-tools relied on Extract from the github.com/google/go-containerregistry (GGCR) module. Unfortunately, Extract has a couple of known issues:

Fixing these is not trivial due to the way the GGCR code works. The original code reads the layers in reverse order, allowing it to handle files that are modified/deleted efficiently, since the "upper" layer content is always read first. Although this works well in the vast majority of cases, the bugs above highlight some of the challenges:

  • In general, whiteouts apply only to the content of lower level layers. This is simple to implement for explicit whiteouts, since an explicit whiteout (which deletes an entry) should never appear in a layer containing a matching entry (which modifies an entry.) Opaque whiteouts are more challenging, since they apply recursively to a path, and may overlap with entries in the layer containing the whiteout. In order to handle opaque whiteouts correctly, we must ensure that only higher level whiteouts are applied.
  • Hard links are especially challenging, since they reference content from another entryThis creates subtle edge cases such as:
    • A hard link need not appear in the same layer as the content it references. With GGCR's approach of reading layers in reverse order, this means that a hard link could be encountered before the content it references. The output squashed layer must contain the content before hard link(s) that reference it, which is the opposite order we encounter them in.
    • To complicate things further, we cannot assume the entry a hard link references has not itself been deleted. This appears to be the cause of mutate.Extract can fail for images which create a hard-link and later remove it google/go-containerregistry#977.

To work around these issues, the proposed code here:

  • Reads image layers in reverse order, accumulating several pieces of state from the current layer:
    • layerWhiteouts accumulates whiteouts from the current layer.
    • Hard links encountered are appended to imageLinks for later processing.
    • layerEntries accumulates entries that are encountered in the current layer that are not directories, hard links, or whiteouts. If these are not "shadowed", they are written to the flattened output; otherwise, the contents are temporarily stored.
    • imageShadows accumulates the effects upper level layers have on lower level layers. This covers both regular entries as well as explicit/opaque whiteouts.
  • When the end of a layer is encountered:
    • The effects of the layer's whiteouts (layerWhiteouts) are merged into the imageShadows.
    • We write out any entries from imageLinks that reference entries present in the current layer (layerEntries.) If the link target was shadowed, we ensure the content is written out with the link name, and that any other links are updated to reference it.

@tri-adam tri-adam self-assigned this Mar 20, 2024
@tri-adam tri-adam force-pushed the hard-link-whiteouts branch 4 times, most recently from bfae7e5 to 7bcc4c7 Compare March 21, 2024 20:18
Previously, Squash used mutate.Extract from the
google/go-containerregistry module, but that does not currently handle
hard links or opaque whiteouts correctly. Replace that with an in-module
implementation.
@tri-adam tri-adam marked this pull request as ready for review March 26, 2024 20:36
Copy link
Member

@dtrudg dtrudg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have spent a bunch of time with a morning coffee on this.. so hopefully my brain is awake enough!

The only issue I can see is related to Xattr handling where:

  • File A is created with xattr(s)
  • B is created as a hard link to A
  • File A is deleted

Then it's important that when B is written out, it carries the xattr(s) that were on the shadowed A. It's important that xattrs are carried across properly, as they are used for selinux and rootless container purposes.

Experimenting a little, it appears that when you hard link against a file that carries xattrs, then the tar header for the link won't carry those xattrs... so they have to be copied across from root -> link.

Create a file with an xattr, and hard link against it

touch xattr_root
setfattr -n user.foo -v bar xattr_root
ln xattr_root xattr_link

Both will show the xattr

$ getfattr *
# file: xattr_link
user.foo

# file: xattr_root
user.foo

Create a tar file

$ tar --xattrs -cvf test.tar xattr_root xattr_link
xattr_root
xattr_link

Examine the tar headers with a go program...

Note we are missing Xattrs and PAXRecords values on the link header vs the root file header.

$ go run tar.go 
Header of xattr_root:
{
	"Typeflag": 48,
	"Name": "xattr_root",
	"Linkname": "",
	"Size": 0,
	"Mode": 420,
	"Uid": 1000,
	"Gid": 1000,
	"Uname": "dtrudg-sylabs",
	"Gname": "dtrudg-sylabs",
	"ModTime": "2024-03-27T09:50:16.539671952Z",
	"AccessTime": "2024-03-27T09:50:16.539671952Z",
	"ChangeTime": "2024-03-27T09:50:16.543672112Z",
	"Devmajor": 0,
	"Devminor": 0,
	"Xattrs": {
		"security.selinux": "unconfined_u:object_r:user_home_t:s0\u0000",
		"user.foo": "bar"
	},
	"PAXRecords": {
		"SCHILY.xattr.security.selinux": "unconfined_u:object_r:user_home_t:s0\u0000",
		"SCHILY.xattr.user.foo": "bar",
		"atime": "1711533016.539671952",
		"ctime": "1711533016.543672112",
		"mtime": "1711533016.539671952"
	},
	"Format": 4
}
Header of xattr_link:
{
	"Typeflag": 49,
	"Name": "xattr_link",
	"Linkname": "xattr_root",
	"Size": 0,
	"Mode": 420,
	"Uid": 1000,
	"Gid": 1000,
	"Uname": "dtrudg-sylabs",
	"Gname": "dtrudg-sylabs",
	"ModTime": "2024-03-27T09:50:16.539671952Z",
	"AccessTime": "2024-03-27T09:50:16.539671952Z",
	"ChangeTime": "2024-03-27T09:50:16.543672112Z",
	"Devmajor": 0,
	"Devminor": 0,
	"Xattrs": null,
	"PAXRecords": {
		"atime": "1711533016.539671952",
		"ctime": "1711533016.543672112",
		"mtime": "1711533016.539671952"
	},
	"Format": 4
}

pkg/mutate/squash.go Outdated Show resolved Hide resolved
pkg/mutate/squash.go Show resolved Hide resolved
test/images/gen_images.go Show resolved Hide resolved
Add image to test corpus that allows us to test a hard linked file with
extended attributes that is later deleted.
Ensure extended attributes are copied over when the destination of a
hard link is deleted, to ensure they are preserved.
@tri-adam tri-adam force-pushed the hard-link-whiteouts branch 2 times, most recently from feac908 to 9df9b41 Compare March 27, 2024 21:04
@tri-adam
Copy link
Member Author

Thanks @dtrudg! Ready for another pass...

@tri-adam tri-adam requested a review from dtrudg March 27, 2024 21:05
Copy link
Member

@dtrudg dtrudg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@tri-adam tri-adam merged commit c73916d into sylabs:main Mar 28, 2024
19 of 20 checks passed
@tri-adam tri-adam deleted the hard-link-whiteouts branch March 28, 2024 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants