Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ctrl, VM: Require restart when socket count is reduced #11883

Merged
merged 1 commit into from
May 27, 2024

Conversation

orelmisan
Copy link
Member

@orelmisan orelmisan commented May 9, 2024

What this PR does

Before this PR:
Currently, it is possible to reduce the sockets count of a VirtualMachine object.
Doing so causes the following steps to happen:

  1. The VirtualMachineInstance is updated.
  2. A migration is triggered.
  3. The target virt-launcher pod is created according to the updated VMI.
  4. The domain is migrated as-is.
  5. On the target virt-launcher pod - the domain is updated to fit.

This works well for CPU hotplug, but is problematic for CPU hotunplug in several scenarios, for example:

  1. When using dedicated CPUs
  2. When using networkInterfaceMultiqueue: true

After this PR:
RestartRequired condition is added on the VM in case the socket count is reduced.
This will make sure the virt-launcher pod and the domain are aligned.

Fixes #

Why we need it and why it was done in this way

The following tradeoffs were made:

The following alternatives were considered:

Links to places where the discussion took place:

Special notes for your reviewer

Depends on #11836, please only review the last commit.

Checklist

This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.

Release note

Restart of a VM is required when the CPU socket count is reduced

@kubevirt-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/M labels May 9, 2024
@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels May 9, 2024
@lyarwood
Copy link
Member

/cc

@xpivarc
Copy link
Member

xpivarc commented May 10, 2024

Please, create an bug issue as well so we keep a track of it.

@orelmisan
Copy link
Member Author

Change: Rebase on top of #11836

@orelmisan orelmisan marked this pull request as ready for review May 13, 2024 06:46
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 13, 2024
@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2024
Currently, it is possible to reduce the sockets count
of a VirtualMachine object.
Doing so causes the following steps to happen:

1. The VirtualMachineInstance is updated.
2. A migration is triggered.
3. The target virt-launcher pod is created according to the updated VMI.
4. The domain is migrated as-is.
5. On the target virt-launcher pod - the domain is updated to fit.

This works well for CPU hotplug, but is problematic for CPU hotunplug in
several scenarios, for example:
1. When using dedicated CPUs
2. When using `networkInterfaceMultiqueue: true`

Add the RestartRequired condition on the VM in case the CPU socket
count is reduced.
This will make sure the virt-launcher pod and the domain are aligned.

Signed-off-by: Orel Misan <omisan@redhat.com>
@orelmisan orelmisan force-pushed the cpu-unplug-restart-required branch from 977b1fa to 000c9c3 Compare May 27, 2024 05:54
@kubevirt-bot kubevirt-bot added size/M and removed size/L needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 27, 2024
@orelmisan
Copy link
Member Author

Rebase

Copy link
Contributor

@fossedihelm fossedihelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2024
@jean-edouard
Copy link
Contributor

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jean-edouard

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2024
@kubevirt-commenter-bot
Copy link

Required labels detected, running phase 2 presubmits:
/test pull-kubevirt-e2e-windows2016
/test pull-kubevirt-e2e-kind-sriov
/test pull-kubevirt-e2e-k8s-1.30-ipv6-sig-network
/test pull-kubevirt-e2e-k8s-1.28-sig-network
/test pull-kubevirt-e2e-k8s-1.28-sig-storage
/test pull-kubevirt-e2e-k8s-1.28-sig-compute
/test pull-kubevirt-e2e-k8s-1.28-sig-operator
/test pull-kubevirt-e2e-k8s-1.29-sig-network
/test pull-kubevirt-e2e-k8s-1.29-sig-storage
/test pull-kubevirt-e2e-k8s-1.29-sig-compute
/test pull-kubevirt-e2e-k8s-1.29-sig-operator

@acardace
Copy link
Member

/cherrypick release-1.2

@kubevirt-bot
Copy link
Contributor

@acardace: once the present PR merges, I will cherry-pick it on top of release-1.2 in a new PR and assign it to you.

In response to this:

/cherrypick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@kubevirt-bot
Copy link
Contributor

@orelmisan: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-e2e-k8s-1.30-sig-network 000c9c3 link unknown /test pull-kubevirt-e2e-k8s-1.30-sig-network

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@fossedihelm
Copy link
Contributor

/retest-required

@kubevirt-bot kubevirt-bot merged commit 1ec375c into kubevirt:main May 27, 2024
36 of 38 checks passed
@kubevirt-bot
Copy link
Contributor

@acardace: new pull request created: #11985

In response to this:

/cherrypick release-1.2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants