Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MachineSet Inplace update does not work during machine deletion #10588

Open
davidvossel opened this issue May 10, 2024 · 2 comments · May be fixed by #10589
Open

MachineSet Inplace update does not work during machine deletion #10588

davidvossel opened this issue May 10, 2024 · 2 comments · May be fixed by #10589
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@davidvossel
Copy link

What steps did you take and what happened?

There are values on a machine spec that impact how a machine is deleted. These values can not be modified inplace from the MachineDeployment or MachineSet once a machine is in the process of being deleted. This can result in machines that cannot be torn down successfully.

Here's an example scenario.

  1. Make a MachineDeployment with the default nodeDrainTimeout of 0s (which blocks indefinitely if a machine's node can't be drained)
  2. Scale down the MachineDeployment in a way that causes a PDB to block a machine's node drain
  3. Machine will get stuck in deletion forever

The scenario above is expected and not a bug. However if I add a 4th step here to try and unblock the machine teardown by setting nodeDrainTimeout to something like 5m, the Machine will still be wedged forever despite nodeDrainTimeout being an inplace update field now #8111

This means once a machine is being torn down, trying to modify any fields that influence how the machine is torn down (like nodeDrainTimeout) will not be applied in place.

What did you expect to happen?

modifying nodeDrainTimeout on a machineDeployment should update the MachineSet and all machines in place... This inplace update doesn't occur when the machines are being deleted though.

Cluster API version

main branch

Kubernetes version

No response

Anything else you would like to add?

No response

Label(s) to be applied

/kind bug
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 10, 2024
@enxebre
Copy link
Member

enxebre commented May 13, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 13, 2024
@fabriziopandini
Copy link
Member

/priority important-longterm

@k8s-ci-robot k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
4 participants