-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Squash hard link/opaque whiteout handling #38
Conversation
bfae7e5
to
7bcc4c7
Compare
Previously, Squash used mutate.Extract from the google/go-containerregistry module, but that does not currently handle hard links or opaque whiteouts correctly. Replace that with an in-module implementation.
7bcc4c7
to
889c049
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have spent a bunch of time with a morning coffee on this.. so hopefully my brain is awake enough!
The only issue I can see is related to Xattr handling where:
- File A is created with xattr(s)
- B is created as a hard link to A
- File A is deleted
Then it's important that when B is written out, it carries the xattr(s) that were on the shadowed A. It's important that xattrs are carried across properly, as they are used for selinux and rootless container purposes.
Experimenting a little, it appears that when you hard link against a file that carries xattrs, then the tar header for the link won't carry those xattrs... so they have to be copied across from root -> link.
Create a file with an xattr, and hard link against it
touch xattr_root
setfattr -n user.foo -v bar xattr_root
ln xattr_root xattr_link
Both will show the xattr
$ getfattr *
# file: xattr_link
user.foo
# file: xattr_root
user.foo
Create a tar file
$ tar --xattrs -cvf test.tar xattr_root xattr_link
xattr_root
xattr_link
Examine the tar headers with a go program...
Note we are missing Xattrs
and PAXRecords
values on the link header vs the root file header.
$ go run tar.go
Header of xattr_root:
{
"Typeflag": 48,
"Name": "xattr_root",
"Linkname": "",
"Size": 0,
"Mode": 420,
"Uid": 1000,
"Gid": 1000,
"Uname": "dtrudg-sylabs",
"Gname": "dtrudg-sylabs",
"ModTime": "2024-03-27T09:50:16.539671952Z",
"AccessTime": "2024-03-27T09:50:16.539671952Z",
"ChangeTime": "2024-03-27T09:50:16.543672112Z",
"Devmajor": 0,
"Devminor": 0,
"Xattrs": {
"security.selinux": "unconfined_u:object_r:user_home_t:s0\u0000",
"user.foo": "bar"
},
"PAXRecords": {
"SCHILY.xattr.security.selinux": "unconfined_u:object_r:user_home_t:s0\u0000",
"SCHILY.xattr.user.foo": "bar",
"atime": "1711533016.539671952",
"ctime": "1711533016.543672112",
"mtime": "1711533016.539671952"
},
"Format": 4
}
Header of xattr_link:
{
"Typeflag": 49,
"Name": "xattr_link",
"Linkname": "xattr_root",
"Size": 0,
"Mode": 420,
"Uid": 1000,
"Gid": 1000,
"Uname": "dtrudg-sylabs",
"Gname": "dtrudg-sylabs",
"ModTime": "2024-03-27T09:50:16.539671952Z",
"AccessTime": "2024-03-27T09:50:16.539671952Z",
"ChangeTime": "2024-03-27T09:50:16.543672112Z",
"Devmajor": 0,
"Devminor": 0,
"Xattrs": null,
"PAXRecords": {
"atime": "1711533016.539671952",
"ctime": "1711533016.543672112",
"mtime": "1711533016.539671952"
},
"Format": 4
}
Add image to test corpus that allows us to test a hard linked file with extended attributes that is later deleted.
Ensure extended attributes are copied over when the destination of a hard link is deleted, to ensure they are preserved.
feac908
to
9df9b41
Compare
Thanks @dtrudg! Ready for another pass... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Improve mutate.Squash so that it handles opaque whiteouts and hard links.
Prior to this PR,
oci-tools
relied on Extract from thegithub.com/google/go-containerregistry
(GGCR) module. Unfortunately,Extract
has a couple of known issues:mutate.Extract
can fail for images which create a hard-link and later remove it google/go-containerregistry#977mutate.Extract
does not handle whiteouts of directories properly google/go-containerregistry#1897Fixing these is not trivial due to the way the GGCR code works. The original code reads the layers in reverse order, allowing it to handle files that are modified/deleted efficiently, since the "upper" layer content is always read first. Although this works well in the vast majority of cases, the bugs above highlight some of the challenges:
mutate.Extract
can fail for images which create a hard-link and later remove it google/go-containerregistry#977.To work around these issues, the proposed code here:
layerWhiteouts
accumulates whiteouts from the current layer.imageLinks
for later processing.layerEntries
accumulates entries that are encountered in the current layer that are not directories, hard links, or whiteouts. If these are not "shadowed", they are written to the flattened output; otherwise, the contents are temporarily stored.imageShadows
accumulates the effects upper level layers have on lower level layers. This covers both regular entries as well as explicit/opaque whiteouts.layerWhiteouts
) are merged into theimageShadows
.imageLinks
that reference entries present in the current layer (layerEntries
.) If the link target was shadowed, we ensure the content is written out with the link name, and that any other links are updated to reference it.