New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overlay: support AdditionalImagesStore on vfs #1899
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
drivers/overlay/overlay.go
Outdated
@@ -2283,7 +2283,7 @@ func (d *Driver) Changes(id string, idMappings *idtools.IDMappings, parent strin | |||
|
|||
// AdditionalImageStores returns additional image stores supported by the driver | |||
func (d *Driver) AdditionalImageStores() ([]string, []string) { | |||
return d.options.imageStores, []string{d.name} | |||
return d.options.imageStores, []string{d.name, "vfs"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this part should be in a separate commit
1eb8ad8
to
ad0dd59
Compare
ad0dd59
to
767d3e3
Compare
767d3e3
to
16c8e71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just a very minimal skim of what this is about.)
It is useful on a system already protected by composefs, to distribute images as part of the operating system itself. The read-only store cannot be on overlay since composefs itself is using overlay, and it causes conflicts with whiteout files
At a very high level, I understand how this works fine for running containers.
But what about access to the parent layers, e.g. due to layer deduplication? Would we be shipping OS images with a baked-in VFS store, where a 10-layer image contains 10 copies of the same file? (10 real copies? 10 hard-linked copies? 10 ref-linked copies?)
The alternative of not shipping the parent layers at all seems impractical to me, that would break quite a lot of assumptions. Or would we restrict this feature to squashed images where the difference doesn’t matter?
if errors.Is(err, syscall.EROFS) { | ||
logrus.Debugf("Ignoring creation of lockfiles on read-only file systems %q, %v", gipath, err) | ||
continue | ||
for _, driver := range additionalGraphDrivers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is trying a cartesian product of (store, driver) and all combinations must exist, otherwise this fails.
I can’t see how that makes sense even with one additional store, let alone multiple. Am I missing something?
I would have expected .AdditionalImageStores()
to return a []struct InternalAdditionalStoreSpec { storePath String, driver stringOrMaybeInterface }
or something like that. (Well, modulo the pre-existing thing that the division of responsibility for parsing/processing these options between store.go
and the graph driver seems convoluted to me.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newImageStore
automatically creates the directory. This is a pre-existing behavior that I've kept in the current version.
I agree it is not ideal, I will add a patch to just ignore stores that do not exist instead of silently creating them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means silent failures on incorrect paths in the config. (We do validate their existence.)
The previous behavior of automatically creating a directory seems undesirable to me; but if we find no matching driver, shouldn’t we fail outright instead of silently ignoring the store?
Also the logic in load()
and getROLayerStoresLocked()
should be consistent; that’s also easier if the graph driver detection is moved behind driver.AdditionalImageStores
(I suppose into overlay’s parseOptions
or even earlier).
for _, driver := range additionalGraphDrivers { | ||
glpath := filepath.Join(store, driver+"-layers") | ||
|
||
rls, err := newROLayerStore(rlpath, glpath, s.graphDriver) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t this use the right graph driver for that store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only case we enabled now is overlay on top of vfs, and the overlay graph driver works well with layers stored as vfs. I'll add a comment, not even sure how we would deal with multiple graphdrivers otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the overlay driver is, via AdditionalImageStores()
, in charge; so I imagine it could
- detect the driver / format of the additional store
- fail on unexpected ones
- return the driver ID so that this line can use the right one.
See elsewhere.
I suppose I can live with the added comment…
drivers/overlay/overlay.go
Outdated
} | ||
} | ||
for _, p := range allImageStores { | ||
l := path.Join(p, vfs.Name, "dir", id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coding VFS feels a bit surprising.
If the same layer exists in multiple image stores, this will probably not find them in the same order store.go
will. Does that matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should not really matter, but I'll change the lookup code in overlay to behave in the same way as in store.go
with composefs it won't matter, because these files will be deduplicated anyway. So having multiple copies of the same file is fine, as the underlying storage will keep only one copy of that file. This feature is only useful on a composefs system, where vfs doesn't have any additional cost compared to overlay since the deduplication is performed in the layer below. |
I think there is maybe something to this worry, but its not quite what you seem to say here. First, lets start with the goal of this: We want to enable a bootc image that contains other containers, such that when you boot the container as a fullblown OS it can run containers. In such a deployment, the outer image would be using ostree with composefs, and the vfs storage directory would be part of this composefs image. The composefs image will naturally and automatically de-duplicate any identical files. So, shipping all the layers as vfs stores will not be a problem in the final image. That said, wouldn't we still de duplicating the files in the OCI layer tarballs, making e.g. downloads larger? |
Or would zstd-chunked help here? |
the downside is when you pull an image to vfs, it first needs to copy all the underlying layers, and then applies the current one. zstd:chunked is not implemented for vfs (not sure it is worth the effort), but once you've the store and you've added it to a composefs image, then there are no additional costs of vfs (except the # of inodes) |
c4217f4
to
30cf5cf
Compare
But running this image as a container could be huge. As long as users of this feature understand the downsides, it should not be a problem. Doing a |
That is not exactly what I mean though. Say you have a Dockerfile that creates a bootc container, and it does |
Technically it should be possible. For example, if a chunked image contains 10 copies of the same data in different files, then we should be able to only download it once. I don't know if the current implementation works this way though, as it is probably focusing on the slightly different case where a single file in the image is already locally available. |
yes, in this case all the layers will be stored as exploded inside the container image itself |
I’m not quite clear about the overall picture, still: Under $assumptions, once layers 1…M exist in all stores backing a single c/storage.Store, a pull of a child image with layers 1…M…N should see that all the parent layers are already available locally, not pull them, and then tell the top-level graph driver to only create the new layers. If the primary graph driver is overlay, that should only create overlay layers. ($assumptions are that either no layers are zstd:chunked, or that they are the same exact zstd:chunked TOC in the two images.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking a bit more — the diffs look plausible to me; I didn’t delve deep enough to understand the full impact (but at least looking at .get
there seems to be more to be found).
@@ -138,7 +138,7 @@ type ProtoDriver interface { | |||
Cleanup() error | |||
// AdditionalImageStores returns additional image stores supported by the driver | |||
// This API is experimental and can be changed without bumping the major version number. | |||
AdditionalImageStores() []string | |||
AdditionalImageStores() ([]string, []string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, I’m being annoying:
drivers.Driver
is a publicly visible API, reachable by external callers via Store.GraphDriver()
.
OTOH, yes, neither Podman nor CRI-O call this function. Meh. I’ll defer to actual c/storage owners.
@@ -1175,19 +1192,27 @@ func (d *Driver) getLower(parent string) (string, error) { | |||
} | |||
|
|||
func (d *Driver) dir(id string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a quick skim it seems to me that various other callers of this also need updating:
- e.g.
getComposefsData
,getAdditionalLayerPathByID
would now read user-controlled contents of the VFS layer - calls to the effect of
filepath.Join(d.dir(id), "diff")
are also fairly frequent
There seem to be enough examples that I didn’t look into that exhaustively.
Worrying about every caller handling all the cases is becoming unwieldy … I’m starting to think there should be a layer abstraction, to handle the differences between native overlay / VFS / FUSE additionalLayerStore / composeFS.
drivers/overlay/overlay.go
Outdated
if err := fileutils.Exists(dir); err != nil { | ||
return "", err | ||
} | ||
|
||
if singleLayer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ID mapping just doesn’t happen?
@@ -1586,6 +1624,15 @@ func (d *Driver) get(id string, disableShifting bool, options graphdriver.MountO | |||
} | |||
break | |||
} | |||
// now try in the vfs store | |||
lower = path.Join(p, vfs.Name, "dir", l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I’d very much like the interdependencies between this and VFS to be transparent to IDEs (to make it clear what needs to be updated on format changes).
Maybe create drivers/internal/vfs.go
, with func VFSLayersDir(vfsDriverHomeDir string) string)
and func VFSLayerDir(vfsDriverHomeDir, layerID string) string
; and call one or the other in every single location, with only exactly one place hard-coding the "dir"
string.
extend the AdditionalImageStores functionality to return a list of backends for each driver. It is a preparation patch for a successive commit, to allow overlay to use vfs as an additional image store. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
30cf5cf
to
4791448
Compare
I've addressed some of your comments, but I leave this still marked as Draft as I want to do another pass |
ping |
@mtrmac PTAL |
extend the overlay driver to use an additional image store on a vfs backend.
It works as overlay can simply use the upper most vfs layer as the only lower layer for the container.
It is useful on a system already protected by composefs, to distribute images as part of the operating system itself. The read-only store cannot be on overlay since composefs itself is using overlay, and it causes conflicts with whiteout files