WIP: Added support for performance profile status #4020

rbaturov · 2024-05-12T06:37:19Z

This PR includes the implementation of this PR on the hypershift side:
openshift/enhancements#1619

Up until now, there was no defined resolution over how PerformanceProfile
status conditions with the status of the different components created
from the PerformanceProfile (MachineConfig, KubeletConfig, Tuned) are handled and exposed.
To resolve this, a custom ConfigMap is added and will be used as a middleware object.
It will hold the updated status populated by the performance profile controller
and will be watched by the NodePool controller, which will calculate an overview
of the current status provided and reflect it under NodePool.status.condition.

The resolution described above consist of the following changes:

Added a new condition "PerformanceProfileAppliedSuccessfully" to the NodePool conditions API.
Added a watch to the specific performance profile status ConfigMap object that will trigger the reconcile loop.
Added the HandlePerformanceProfileStatus function to calculate and set the PerformanceProfileAppliedSuccessfully based on the possible performance profile statuses (available, progressing and degraded).

This commit includes the implementation of this PR on the hypershift side: openshift/enhancements#1619 Up until now, there was no defined resolution over how PerformanceProfile status conditions with the status of the different components created from the PerformanceProfile (MachineConfig, KubeletConfig, Tuned) are handled and exposed. To resolve this, a custom ConfigMap is added and will be used as a middleware object. It will hold the updated status populated by the performance profile controller and will be watched by the NodePool controller, which will calculate an overview of the current status provided and reflect it under NodePool.status.condition. The resolution described above consist of the following changes: * Added a new condition "PerformanceProfileAppliedSuccessfully" to the NodePool conditions API. * Added a watch to the specific performance profile status ConfigMap object that will trigger the reconcile loop. * Added the HandlePerformanceProfileStatus function to calculate and set the PerformanceProfileAppliedSuccessfully based on the possible performance profile statuses (available, progressing and degraded). Signed-off-by: Ronny Baturov <rbaturov@redhat.com>

Signed-off-by: Ronny Baturov <rbaturov@redhat.com>

openshift-ci · 2024-05-12T06:37:46Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rbaturov
Once this PR has been reviewed and has the lgtm label, please assign bryan-cox for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2024-05-12T08:30:44Z

@rbaturov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/verify	`9bc1534`	link	true	`/test verify`
ci/prow/e2e-kubevirt-aws-ovn	`9bc1534`	link	true	`/test e2e-kubevirt-aws-ovn`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

stevekuznetsov · 2024-05-14T19:10:07Z

Where does the node tuning operator usually put this information? Can the NodePool controller read it directly from there, instead of using a middleware object?

rbaturov · 2024-05-15T07:01:43Z

Where does the node tuning operator usually put this information? Can the NodePool controller read it directly from there, instead of using a middleware object?

In NTO on OCP, we have performanceprofile object and the performance profile controller is responsible for writing this information to the object status.
However, on hypershift performanceprofiles are applied as configmaps. Thus, adjustments for this are required in the NTO.
Since NTO with these adustements can't change the ConfigMap and not the PerformanceProfile embedded in it since it controlled by the Hypershift operator which will override the changes immediately.
The Nodepool is the object which suppose to report about the status of the PerformanceProfile since it represents the current configuration of the nodes.
Since NTO cannot modify the Nodepool directly, a middle-ware object (ConfigMap) is used to write to it the status and the nodepool controller will read from it.

stevekuznetsov · 2024-05-15T14:32:22Z

Sorry if these are obvious questions, but I am new - using proxy objects to funnel state in a distributed system can have a lot of downsides w.r.t. consistency, especially around outages, as etcd has no transactions and no way to keep objects in sync.

However, on hypershift performanceprofiles are applied as configmaps. Thus, adjustments for this are required in the NTO.

Are performanceprofiles cluster-scoped in OCP core? Have we done a spike to see if we could namespace them and re-use that object validation for HCP?

Since NTO cannot modify the Nodepool directly,

Why?

rbaturov · 2024-05-20T09:32:43Z

Sorry if these are obvious questions, but I am new - using proxy objects to funnel state in a distributed system can have a lot of downsides w.r.t. consistency, especially around outages, as etcd has no transactions and no way to keep objects in sync.

However, on hypershift performanceprofiles are applied as configmaps. Thus, adjustments for this are required in the NTO.

Are performanceprofiles cluster-scoped in OCP core? Have we done a spike to see if we could namespace them and re-use that object validation for HCP?

Yes performance profile are cluster scoped.

Since NTO cannot modify the Nodepool directly,

Why?

He doesn't have RBAC permissions for that

rbaturov added 2 commits May 12, 2024 09:36

Added unit tests for HandlePerformanceProfileStatus

9bc1534

Signed-off-by: Ronny Baturov <rbaturov@redhat.com>

openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels May 12, 2024

openshift-ci bot requested review from enxebre and sjenning May 12, 2024 06:37

openshift-ci bot added the area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release label May 12, 2024

openshift-ci bot removed the do-not-merge/needs-area label May 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Added support for performance profile status #4020

WIP: Added support for performance profile status #4020

rbaturov commented May 12, 2024

openshift-ci bot commented May 12, 2024

openshift-ci bot commented May 12, 2024

stevekuznetsov commented May 14, 2024

rbaturov commented May 15, 2024 •

edited

stevekuznetsov commented May 15, 2024

rbaturov commented May 20, 2024

WIP: Added support for performance profile status #4020

Are you sure you want to change the base?

WIP: Added support for performance profile status #4020

Conversation

rbaturov commented May 12, 2024

openshift-ci bot commented May 12, 2024

openshift-ci bot commented May 12, 2024

stevekuznetsov commented May 14, 2024

rbaturov commented May 15, 2024 • edited

stevekuznetsov commented May 15, 2024

rbaturov commented May 20, 2024

rbaturov commented May 15, 2024 •

edited