Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support in-place updates to Infrastructure Machines specs #10629

Open
muraee opened this issue May 15, 2024 · 9 comments
Open

Support in-place updates to Infrastructure Machines specs #10629

muraee opened this issue May 15, 2024 · 9 comments
Labels
area/machinedeployment Issues or PRs related to machinedeployments area/machineset Issues or PRs related to machinesets kind/feature Categorizes issue or PR as related to a new feature. kind/proposal Issues or PRs related to proposals. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@muraee
Copy link
Contributor

muraee commented May 15, 2024

What would you like to be added (User Story)?

As a user I would like that my changes to platform supported mutable fields in the InfrastructureMachineTemplate is propagated in-place without causing a rollout.

Detailed Description

Problem:

For example, the AWSMachine supports day-2 modification of .spec.additionalTags field, which is then reconciled and applied to existing machines (EC2 instances) in-place.

However, to update .spec.additionalTags when using a machineDeployment or machineSet with an AWSMachineTemplate as the InfrastructureRef, a new AWSMachineTemplate with the new tags has to be created which will trigger a complete rollout instead of updating the existing machines in-place.

Suggested solution:

MachineSet

#8060 added support for in-place propagating of labels and annotations to the infraMachines.
We could build on top of that work to also fetch the spec of the referenced template within syncMachines() func and propagate that to the existing infraMachines alongside annotations and labels. so that updating the referenced InfraMachineTemplate in-place would just work.

MachineDeployment

The current UX of triggering upgrades, relies on users changing the infraMachineTemplate reference name in the MachineDeployment.
We could preserve that UX for in-place upgrades as well, by changing the way the MachineDeployment controller determines when a rollout is required as the following:

  1. The user would set an annotation/label on the InfraMachineTemplate that contains the hash of its spec.
  2. the MachineDeployment controller definition of "semantic equality" should exclude the InfraMachineTemplate ref, and include the hash instead, i.e. a rollout is only triggered if the hash of the new InfraMachineTemplate is different than the current template hash.
  3. [Backward-compatibility] if the hash is not found, keep the current "semantic equality" behavior.

As the hash is created and set by the user, they could exclude any mutable field from the hash calculation, e.g.: two AWSMachineTemplate that only differs in .spec.additionalTags would have the same hash to not trigger a rollout, but instead propagate in-place

Anything else you would like to add?

No response

Label(s) to be applied

/kind feature
/area machineset
/area machinedeployment

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. area/machineset Issues or PRs related to machinesets area/machinedeployment Issues or PRs related to machinedeployments needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 15, 2024
@muraee
Copy link
Contributor Author

muraee commented May 15, 2024

cc @enxebre

@sbueringer
Copy link
Member

cc @g-gaston

@g-gaston
Copy link
Contributor

This could tangentially be related to the work being done by the in-place upgrades working group. Although I'm not sure if that's what we are looking for here. The in-place upgrades we have been working is for things that need to change in the actual infra/host. This seems to be more about metadata.

Without having thought much about it and not being an expert in the aws provider, could this be implemented in the AWSMachineTemplate controller? when changes are detected in this field, propagate to all child AwsMachines. Updating the template object should not trigger a MachineRollout from the MD controller.

@enxebre
Copy link
Member

enxebre commented May 20, 2024

Yes, I see this as orthogonal to inplace upgrade effort. I see this as an extension of the fields inplace propagation effort to include support for cloud tags.

Interesting thinking on the AWSMachineTemplate. I think I'd rather keep it "dumb" to honour current design and explore Mulham proposal to enable an extension enhancement linked above and let the higher level controllers md/ms orchestrate it. But that has also some caveats. Let's hear more thoughts :)

@fabriziopandini
Copy link
Member

fabriziopandini commented May 22, 2024

I'm personally -1 to the idea of asking the users to add an annotation with an hash defining semantic equality, it seems too complex (we are increasing operational and conceptual complexity for Cluster API’s users)

I'm also not sure this fits into the work we did for field propagation, because it was strictly scoped to Kubernetes objects metadata and a small set of fields that impacts only CAPI controllers behaviour.
Metadata applied to infrastructure components doesn't seems to fit into the category above (they are different metadata).
Also, metadata / AWS's additionalTags seems to be just an example, while the ask is about a wider set of "mutable fields" .

I think instead this request could fit well into the in-place upgrade work, because based on my understanding this work is meant to introduce an extension point that allows to determine what could be changed in place and what not, and this gives every infrastructure provider the freedom to choose what makes sense for them e.g.

  • Most infrastructure provider have a notion of metadata, and usually those metadata can be changed in place, but unfortunately the definition of metadata is specific for each provider.
  • Some infrastructure provider allows in-place mutations of things like disk size, memory etc.

Most important, this extension point also could gives the ability to take decisions when multiple changes are applied at the same time (what if I change tags, disk size and region in a single operation?)

@vincepri
Copy link
Member

/kind proposal
/priority important-soon
/triage accepted

@k8s-ci-robot k8s-ci-robot added kind/proposal Issues or PRs related to proposals. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-priority Indicates an issue lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 22, 2024
@alexander-demicev
Copy link
Contributor

We briefly discussed this topic at the in-place update feature group meeting yesterday. I believe that with some changes it might fit into the current design https://hackmd.io/Wv_u2xXJQsaj4wWFQ3PqCQ?view. I like the idea proposed by @fabriziopandini, we can have multiple in-place update extensions. When users select ExternalUpdate strategy it will be the responsibility of the in-place update extension to either upgrade K8S components on the hosts or update the infrastructure-related fields.
There are a couple of questions here we will need to answer for this approach, for example: in what order in-place update extensions will be executed? Is ordering needed there at all or should we just ensure that only extensions run at the same time?

@fabriziopandini fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label May 23, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 23, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@fabriziopandini
Copy link
Member

Let's add triage accepted only when the issue is actionable.
At the current stage we are still seeking for consensus about a way forward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/machinedeployment Issues or PRs related to machinedeployments area/machineset Issues or PRs related to machinesets kind/feature Categorizes issue or PR as related to a new feature. kind/proposal Issues or PRs related to proposals. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

8 participants