Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlay: support AdditionalImagesStore on vfs #1899

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented Apr 22, 2024

extend the overlay driver to use an additional image store on a vfs backend.

It works as overlay can simply use the upper most vfs layer as the only lower layer for the container.

It is useful on a system already protected by composefs, to distribute images as part of the operating system itself. The read-only store cannot be on overlay since composefs itself is using overlay, and it causes conflicts with whiteout files

Copy link
Contributor

openshift-ci bot commented Apr 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -2283,7 +2283,7 @@ func (d *Driver) Changes(id string, idMappings *idtools.IDMappings, parent strin

// AdditionalImageStores returns additional image stores supported by the driver
func (d *Driver) AdditionalImageStores() ([]string, []string) {
return d.options.imageStores, []string{d.name}
return d.options.imageStores, []string{d.name, "vfs"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this part should be in a separate commit

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just a very minimal skim of what this is about.)

It is useful on a system already protected by composefs, to distribute images as part of the operating system itself. The read-only store cannot be on overlay since composefs itself is using overlay, and it causes conflicts with whiteout files

At a very high level, I understand how this works fine for running containers.

But what about access to the parent layers, e.g. due to layer deduplication? Would we be shipping OS images with a baked-in VFS store, where a 10-layer image contains 10 copies of the same file? (10 real copies? 10 hard-linked copies? 10 ref-linked copies?)

The alternative of not shipping the parent layers at all seems impractical to me, that would break quite a lot of assumptions. Or would we restrict this feature to squashed images where the difference doesn’t matter?

if errors.Is(err, syscall.EROFS) {
logrus.Debugf("Ignoring creation of lockfiles on read-only file systems %q, %v", gipath, err)
continue
for _, driver := range additionalGraphDrivers {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trying a cartesian product of (store, driver) and all combinations must exist, otherwise this fails.

I can’t see how that makes sense even with one additional store, let alone multiple. Am I missing something?

I would have expected .AdditionalImageStores() to return a []struct InternalAdditionalStoreSpec { storePath String, driver stringOrMaybeInterface } or something like that. (Well, modulo the pre-existing thing that the division of responsibility for parsing/processing these options between store.go and the graph driver seems convoluted to me.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newImageStore automatically creates the directory. This is a pre-existing behavior that I've kept in the current version.

I agree it is not ideal, I will add a patch to just ignore stores that do not exist instead of silently creating them

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means silent failures on incorrect paths in the config. (We do validate their existence.)

The previous behavior of automatically creating a directory seems undesirable to me; but if we find no matching driver, shouldn’t we fail outright instead of silently ignoring the store?

Also the logic in load() and getROLayerStoresLocked() should be consistent; that’s also easier if the graph driver detection is moved behind driver.AdditionalImageStores (I suppose into overlay’s parseOptions or even earlier).

for _, driver := range additionalGraphDrivers {
glpath := filepath.Join(store, driver+"-layers")

rls, err := newROLayerStore(rlpath, glpath, s.graphDriver)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t this use the right graph driver for that store?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only case we enabled now is overlay on top of vfs, and the overlay graph driver works well with layers stored as vfs. I'll add a comment, not even sure how we would deal with multiple graphdrivers otherwise

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, the overlay driver is, via AdditionalImageStores(), in charge; so I imagine it could

  • detect the driver / format of the additional store
  • fail on unexpected ones
  • return the driver ID so that this line can use the right one.

See elsewhere.

I suppose I can live with the added comment…

drivers/overlay/overlay.go Outdated Show resolved Hide resolved
}
}
for _, p := range allImageStores {
l := path.Join(p, vfs.Name, "dir", id)
Copy link
Collaborator

@mtrmac mtrmac Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coding VFS feels a bit surprising.

If the same layer exists in multiple image stores, this will probably not find them in the same order store.go will. Does that matter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should not really matter, but I'll change the lookup code in overlay to behave in the same way as in store.go

@giuseppe
Copy link
Member Author

But what about access to the parent layers, e.g. due to layer deduplication? Would we be shipping OS images with a baked-in VFS store, where a 10-layer image contains 10 copies of the same file? (10 real copies? 10 hard-linked copies? 10 ref-linked copies?)

with composefs it won't matter, because these files will be deduplicated anyway. So having multiple copies of the same file is fine, as the underlying storage will keep only one copy of that file.

This feature is only useful on a composefs system, where vfs doesn't have any additional cost compared to overlay since the deduplication is performed in the layer below.

@alexlarsson
Copy link
Contributor

But what about access to the parent layers, e.g. due to layer deduplication? Would we be shipping OS images with a baked-in VFS store, where a 10-layer image contains 10 copies of the same file? (10 real copies? 10 hard-linked copies? 10 ref-linked copies?)

The alternative of not shipping the parent layers at all seems impractical to me, that would break quite a lot of assumptions. Or would we restrict this feature to squashed images where the difference doesn’t matter?

I think there is maybe something to this worry, but its not quite what you seem to say here.

First, lets start with the goal of this: We want to enable a bootc image that contains other containers, such that when you boot the container as a fullblown OS it can run containers. In such a deployment, the outer image would be using ostree with composefs, and the vfs storage directory would be part of this composefs image.

The composefs image will naturally and automatically de-duplicate any identical files. So, shipping all the layers as vfs stores will not be a problem in the final image.

That said, wouldn't we still de duplicating the files in the OCI layer tarballs, making e.g. downloads larger?

@alexlarsson
Copy link
Contributor

Or would zstd-chunked help here?

@giuseppe
Copy link
Member Author

That said, wouldn't we still de duplicating the files in the OCI layer tarballs, making e.g. downloads larger?

the downside is when you pull an image to vfs, it first needs to copy all the underlying layers, and then applies the current one. zstd:chunked is not implemented for vfs (not sure it is worth the effort), but once you've the store and you've added it to a composefs image, then there are no additional costs of vfs (except the # of inodes)

@giuseppe giuseppe force-pushed the overlay-on-vfs branch 2 times, most recently from c4217f4 to 30cf5cf Compare April 23, 2024 10:26
@rhatdan
Copy link
Member

rhatdan commented Apr 23, 2024

But running this image as a container could be huge. As long as users of this feature understand the downsides, it should not be a problem.

Doing a bootc upgrade with one of these images would be able to take advantage of zstd:chunked though, correct?

@alexlarsson
Copy link
Contributor

That said, wouldn't we still de duplicating the files in the OCI layer tarballs, making e.g. downloads larger?

the downside is when you pull an image to vfs, it first needs to copy all the underlying layers, and then applies the current one. zstd:chunked is not implemented for vfs (not sure it is worth the effort), but once you've the store and you've added it to a composefs image, then there are no additional costs of vfs (except the # of inodes)

That is not exactly what I mean though. Say you have a Dockerfile that creates a bootc container, and it does RUN podman pull --vfs --ais=/usr/lib/my-images some-image. Now, supposed some-image has several layers, and these are put into the container image. Wouldn't the contents of these layers be duplicated in the resulting bootc OCI image tarball, for example when you pull it to you bootc host.

@alexlarsson
Copy link
Contributor

Doing a bootc upgrade with one of these images would be able to take advantage of zstd:chunked though, correct?

Technically it should be possible. For example, if a chunked image contains 10 copies of the same data in different files, then we should be able to only download it once. I don't know if the current implementation works this way though, as it is probably focusing on the slightly different case where a single file in the image is already locally available.

@giuseppe
Copy link
Member Author

That said, wouldn't we still de duplicating the files in the OCI layer tarballs, making e.g. downloads larger?

the downside is when you pull an image to vfs, it first needs to copy all the underlying layers, and then applies the current one. zstd:chunked is not implemented for vfs (not sure it is worth the effort), but once you've the store and you've added it to a composefs image, then there are no additional costs of vfs (except the # of inodes)

That is not exactly what I mean though. Say you have a Dockerfile that creates a bootc container, and it does RUN podman pull --vfs --ais=/usr/lib/my-images some-image. Now, supposed some-image has several layers, and these are put into the container image. Wouldn't the contents of these layers be duplicated in the resulting bootc OCI image tarball, for example when you pull it to you bootc host.

yes, in this case all the layers will be stored as exploded inside the container image itself

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 23, 2024

That said, wouldn't we still de duplicating the files in the OCI layer tarballs, making e.g. downloads larger?

the downside is when you pull an image to vfs, it first needs to copy all the underlying layers, and then applies the current one.

I’m not quite clear about the overall picture, still: Under $assumptions, once layers 1…M exist in all stores backing a single c/storage.Store, a pull of a child image with layers 1…M…N should see that all the parent layers are already available locally, not pull them, and then tell the top-level graph driver to only create the new layers.

If the primary graph driver is overlay, that should only create overlay layers.

($assumptions are that either no layers are zstd:chunked, or that they are the same exact zstd:chunked TOC in the two images.)

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a bit more — the diffs look plausible to me; I didn’t delve deep enough to understand the full impact (but at least looking at .get there seems to be more to be found).

@@ -138,7 +138,7 @@ type ProtoDriver interface {
Cleanup() error
// AdditionalImageStores returns additional image stores supported by the driver
// This API is experimental and can be changed without bumping the major version number.
AdditionalImageStores() []string
AdditionalImageStores() ([]string, []string)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I’m being annoying:

drivers.Driver is a publicly visible API, reachable by external callers via Store.GraphDriver().

OTOH, yes, neither Podman nor CRI-O call this function. Meh. I’ll defer to actual c/storage owners.

drivers/overlay/overlay.go Show resolved Hide resolved
drivers/overlay/overlay.go Show resolved Hide resolved
@@ -1175,19 +1192,27 @@ func (d *Driver) getLower(parent string) (string, error) {
}

func (d *Driver) dir(id string) string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick skim it seems to me that various other callers of this also need updating:

  • e.g. getComposefsData, getAdditionalLayerPathByID would now read user-controlled contents of the VFS layer
  • calls to the effect of filepath.Join(d.dir(id), "diff") are also fairly frequent

There seem to be enough examples that I didn’t look into that exhaustively.


Worrying about every caller handling all the cases is becoming unwieldy … I’m starting to think there should be a layer abstraction, to handle the differences between native overlay / VFS / FUSE additionalLayerStore / composeFS.

if err := fileutils.Exists(dir); err != nil {
return "", err
}

if singleLayer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ID mapping just doesn’t happen?

drivers/overlay/overlay.go Outdated Show resolved Hide resolved
drivers/overlay/overlay.go Outdated Show resolved Hide resolved
drivers/overlay/overlay.go Outdated Show resolved Hide resolved
@@ -1586,6 +1624,15 @@ func (d *Driver) get(id string, disableShifting bool, options graphdriver.MountO
}
break
}
// now try in the vfs store
lower = path.Join(p, vfs.Name, "dir", l)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I’d very much like the interdependencies between this and VFS to be transparent to IDEs (to make it clear what needs to be updated on format changes).

Maybe create drivers/internal/vfs.go, with func VFSLayersDir(vfsDriverHomeDir string) string) and func VFSLayerDir(vfsDriverHomeDir, layerID string) string; and call one or the other in every single location, with only exactly one place hard-coding the "dir" string.

extend the AdditionalImageStores functionality to return a list of
backends for each driver.  It is a preparation patch for a successive
commit, to allow overlay to use vfs as an additional image store.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe
Copy link
Member Author

I've addressed some of your comments, but I leave this still marked as Draft as I want to do another pass

@giuseppe giuseppe marked this pull request as ready for review April 30, 2024 07:31
@giuseppe giuseppe added the jira label May 2, 2024
@giuseppe
Copy link
Member Author

giuseppe commented May 8, 2024

ping

@rhatdan
Copy link
Member

rhatdan commented May 13, 2024

@mtrmac PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants