New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change merge strategy to preserve Plugins when importing Configs #7347
base: main
Are you sure you want to change the base?
Change merge strategy to preserve Plugins when importing Configs #7347
Conversation
Hi @rayburgemeestre. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
90bd1f4
to
fc7c9bc
Compare
Thanks for the detailed description and extra tests to validate behavior. Overall I understand what is being improved here and it seems valuable. I guess one (lazy) question: does this now mean that load order is important for imports which may all have defined settings for the same entry? I haven't ever looked closely at mergo or our specific use of it, but I guess the open question in my mind is whether we now have a potential ordering/"race", and if so, maybe more importantly, is that ordering reliable or random (if I run merge one time and file 3 wins over file 2, will that always be true, or will sometimes file 2 win over file 3 for the same setting)? Mostly I just want to understand so we know whether or not we need to define expected behavior. I think with our current less "intelligent" implementation this just wasn't an issue as a single file could only impact an entire plugin and there was no true merging of line-by-line settings within a tree branch. |
Hi Phil, thanks for the feedback! You're right, it depends on the order in which the files are listed in the "imports" array. Luckily everything is all sorted and deterministic. The following in
Will have Globs can also be used in this imports array. Those are luckily expanded to files in lexicographical order, I confirmed it with a debugger and some extra tests. This function ( containerd/services/server/config/config.go Lines 260 to 267 in 4551278
Zooming in on One combined example to demo all the behavior:
Let's say there are a few files in
The code that is responsible for merging each config can be seen in the caller of containerd/services/server/config/config.go Line 215 in 4551278
|
@estesp Is there anything I can do to help get this PR merged? I think it's an important one generally for configuration, it's important to me at least! |
This looks like a great change, we have very similar usecases adding nvidia and registry drop in files at runtime and this would enable us to remove old-standing workarounds. |
Added some unit tests that show why we might want to get rid of it. Signed-off-by: Ray Burgemeestre <rayb@nvidia.com>
fc7c9bc
to
a37ddca
Compare
Sorry if the history of this PR now looks a little messy/verbose, it's because in the meantime the code base has changed. I've left the minimally required fix needed at this point + unit tests. This makes at least the diff for this PR much simpler. I'll also include the output for the tests without and without the fix (just to show how the added tests would fail without the change), for convenience. Please let me know if I need to change something more! |
@rayburgemeestre since cri-o already supports drop-in files for overriding the config, does following a similar mechanism here work too? See https://github.com/cri-o/cri-o/blob/e0e17ee187c9f52d870b80cee9116c4fd5ca279e/pkg/config/config.go#L699 I haven't dug too much into whether only leaves of the config tree are replaced this way, but I know that this is commonly used to configure the NVIDIA runtime through a drop-in file, for example. |
@elezar You can specific an import path or glob for loading configurations files which can override the base config. This is just changing the logic to merge the plugin configs rather than overriding the entire config. We also support dynamic drop in files for stuff like registry host configuration and cni. I think this change is fine to accept as is. There are probably some surprise cases which we should make sure we document. I also found that the equal check in arrays of structs didn't work as well as I would have hoped. We can probably add more tests around that. A separate put related change to this, we discussed changing |
}) | ||
} | ||
|
||
func TestMergingTwoPluginConfigsRecursively(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a config with an array, possibly array of structs you can include? Those are the cases which hit the more complex logic today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you mean, I will try to update this PR with that idea (hopefully!) tomorrow. :)
I've been a bit quiet past few days, but it's taking me a bit longer to figure some things out, I've been trying to test a lot of real configurations today, mostly from actual configuration files that we use, and I think I found a few edge cases that were broken after I rebased this PR last week. The unit tests succeed, but I believe they are not testing enough.
EDIT: I have updated this test now to use a new test helper function that should include an array of configs. I think it is now more readable also to see what is being tested (especially because of switching to string for expected)
…rged configs. Added extra tests, and test helpers to make the tests more readable. Signed-off-by: Ray Burgemeestre <rayb@nvidia.com>
bd8c292
to
ca76cdb
Compare
After quite a bit of confusion because of switching back and forth between version 1.7.13 for work and 'master' for this PR. It turned out that Some changes I made are:
The original change from earlier commits is still there and needed in my opinion. The simplest failure is from the first test here: https://github.com/containerd/containerd/pull/7347/files#diff-9dcf619ce07838a6db58d9519023dc77b0e4356c8e3e56ddea8029e43b937d93R257-R276. The |
cmd/containerd/command/config.go
Outdated
// this command, generate the max configuration version | ||
config.Version = srvconfig.CurrentConfigVersion | ||
|
||
return toml.NewEncoder(os.Stdout).SetIndentTables(true).Encode(config) | ||
// Now merge on top of this, the actual user config that is present on disk | ||
if err = srvconfig.MergeConfig(config, userConfig); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to under the role the Merge is taking here. the plugin Decode
is important to run on the user/default configuration as that will replace the interface{}
initially decoded as map[string]interface{}
into the struct provided by the plugin. I suspect when this Merge
is called we may have the to
case as the default plugin config struct and the from
as the map[string]interface{}
from dump (or nil from default)
Calling merge config after plugin decode seems like it may cause issues with the types not matching. The Decode takes the Do you have more details on what you were seeing that made it unreliable? |
I see, I think I did overlook that something changed. When trying to 'demo' in a concise manner I see that the original code gives warnings that are no longer shown after my changes. I will dig deeper into into. But in the meantime I'll share what was my motivation for the change. --snip-- (cutting out part of my message, because I discovered I am not sure what I wrote is correct) EDIT: Sorry, I picked an example that customized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so a few hours later...
I finally understand what the original function is doing, also thanks to your clarification w/r/t Decode @dmcgowan 🙏. Now I realize my previous commit actually broke the correct behavior, sorry.
So what I learned is that go has init()
functions that normally load stuff before main()
. So plugins that are loaded during execution of containerd config dump
were not being loaded in my newly added unit test. So in my unit tests more values were actually accepted than would have been in practice. Even though I got the correct result in my unit test, it was actually not the correct desired behavior, because the marshalling back and forth wasn't happening. Anyway, that is now undone in my third/last commit.
I've updated my unit test, so now it actually does test similarly to the "config dump" command, and I've updated my inputs for the test to be correct. E.g., not using [plugins.'io.containerd.grpc.v1.cri'.cni]
but [plugins.'io.containerd.grpc.v1.runtime'.cni]
instead.
Hope the changes + tests make more sense now again :)
{Expected: true, Value: "/my-custom-certs.d-config-path"}, | ||
{Expected: true, Value: expected_runtimes}, | ||
} | ||
testMergeConfig(t, []string{data1, data2, data3, data4, data5, data6, data7, data8, data9}, asserts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was worth keeping this test because as opposed to the unit tests in server/config/config_test.go
, this one also tests the mapping back and forth (and removal of fields that are not mapping to the structs).
If you do not agree it's that useful, I can remove them. (If we want to remove, I can also undo the splitting of outputConfig
into outputConfig
& generateConfig
in command/config.go
)
… test with fixes. Signed-off-by: Ray Burgemeestre <rayb@nvidia.com>
d6fad3d
to
c757805
Compare
Hi all, I thought long and hard about this, but I decided to propose a replacement PR. Some context has changed since I've opened this PR:
Then finally also potentially too many additional unit tests, that even though I might find them useful, perhaps all of it make this PR too hard to review. Let me know if this makes sense, then I'll close this PR, and kindly request a review on that one instead 👼 😁 |
Hi guys,
Hope you are all doing well. At work we ran into the following issue. Which is also being discussed here: #5837. This PR tries to propose a solution.
Example
Given the following
config.toml
for containerd:With multiple files that match this import:
The current merge strategy for imports will overwrite the entire plugin
io.containerd.grpc.v1.cri
.This means that depending on the filenames, containerd ends up with either just the
cni
config, or just theregistry
config. This was a conscious decision in the past. I hope however we can slightly modify this behavior in containerd, to merge the Plugin configuration instead? 😇Current behavior
The two imports from the example effectively result in containerd having this configuration for the plugin:
Note the absence of the cni bin_dir configuration, that was overwritten.
New proposed behavior
The change in this PR will have the merge result in the following configuration instead:
There are also two extra unit tests that attempt to demonstrate the change.
Why this change?
Whether the proposed behavior is better is probably subjective, but reasons why I am in favor of it ...
Let's say we have three configurations,
cni.toml
(for cni bin),registry.toml
(for cert config path) andnvidia.toml
(for changing the default runtime tonvidia
).Then it's nice to have separate smaller files, also arguably easier to get rid of certain configuration (get rid of nvidia being default runtime, rm -rf nvidia.toml). The problem now is that they all add something to the
io.containerd.grpc.v1.cri
plugin, and the last one wins.If you have existing configuration files, it's in my opinion extra burden for the user to have the user first figure out which of the files already configures something for the same plugin before they can add their configuration without removing existing config.
In our case we try to automate this, so our code becomes more complex because of this as well 😛
How the old merge works
The following function seems to take care of the merging:
containerd/services/server/config/config.go
Line 291 in 4551278
The
config.Config
being merged, hasPlugins map[string]toml.Tree
and is the only field in the struct that includes thetoml.Tree
struct. The above mentioned merge function will handle the actual merging of this struct here:containerd/vendor/github.com/imdario/mergo/merge.go
Lines 90 to 96 in 4551278
This will already have the effect of overwriting the whole
io.containerd.grpc.v1.cri
plugin, which surprised me! Because that means that the following for loop is unnecessary (which is why I removed it in this PR):containerd/services/server/config/config.go
Lines 296 to 299 in 4551278
How the new merge works
In my PR I used the Transformer feature from mergo (see https://github.com/imdario/mergo#transformers)
With that feature we can hook into
toml.Tree
's merge fromsrc
todst
with a callback function. Since it will overwrite the struct completely, the best idea I could come up with is to preserve the values that would have been discarded by copying them from destination to source. This is slightly confusing, but knowing thatmergo.merge
will overwrite destination with source later, the values need to be present in source 😃How I tested this code change
In order to avoid cluttering this already long PR description, I will add a text document as an attachment that shows a "real" example, how I tested if this code is also working on a real machine, and not just in unit tests.
Please let me know if this approach makes sense, and if not, how to do it differently.
It would make my life quite a bit easier if this would be the merge behavior for importing configs, but would be even better if others agree and see it as an improvement as well.
Thanks,
Ray
manual_testing.txt