Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Added support for performance profile status #4020

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rbaturov
Copy link

This PR includes the implementation of this PR on the hypershift side:
openshift/enhancements#1619

Up until now, there was no defined resolution over how PerformanceProfile
status conditions with the status of the different components created
from the PerformanceProfile (MachineConfig, KubeletConfig, Tuned) are handled and exposed.
To resolve this, a custom ConfigMap is added and will be used as a middleware object.
It will hold the updated status populated by the performance profile controller
and will be watched by the NodePool controller, which will calculate an overview
of the current status provided and reflect it under NodePool.status.condition.

The resolution described above consist of the following changes:

  • Added a new condition "PerformanceProfileAppliedSuccessfully" to the NodePool conditions API.
  • Added a watch to the specific performance profile status ConfigMap object that will trigger the reconcile loop.
  • Added the HandlePerformanceProfileStatus function to calculate and set the PerformanceProfileAppliedSuccessfully based on the possible performance profile statuses (available, progressing and degraded).

This commit includes the implementation of this PR on the hypershift side:
 openshift/enhancements#1619

Up until now, there was no defined resolution over how PerformanceProfile
status conditions with the status of the different components created
from the PerformanceProfile (MachineConfig, KubeletConfig, Tuned) are handled and exposed.
To resolve this, a custom ConfigMap is added and will be used as a middleware object.
It will hold the updated status populated by the performance profile controller
and will be watched by the NodePool controller, which will calculate an overview
of the current status provided and reflect it under NodePool.status.condition.

The resolution described above consist of the following changes:
* Added a new condition "PerformanceProfileAppliedSuccessfully" to the NodePool conditions API.
* Added a watch to the specific performance profile status ConfigMap object that will trigger the reconcile loop.
* Added the HandlePerformanceProfileStatus function to calculate and set the PerformanceProfileAppliedSuccessfully based on the possible performance profile statuses (available, progressing and degraded).

Signed-off-by: Ronny Baturov <rbaturov@redhat.com>
Signed-off-by: Ronny Baturov <rbaturov@redhat.com>
@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels May 12, 2024
@openshift-ci openshift-ci bot requested review from enxebre and sjenning May 12, 2024 06:37
@openshift-ci openshift-ci bot added the area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release label May 12, 2024
Copy link
Contributor

openshift-ci bot commented May 12, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rbaturov
Once this PR has been reviewed and has the lgtm label, please assign bryan-cox for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented May 12, 2024

@rbaturov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify 9bc1534 link true /test verify
ci/prow/e2e-kubevirt-aws-ovn 9bc1534 link true /test e2e-kubevirt-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@stevekuznetsov
Copy link
Contributor

Where does the node tuning operator usually put this information? Can the NodePool controller read it directly from there, instead of using a middleware object?

@rbaturov
Copy link
Author

rbaturov commented May 15, 2024

Where does the node tuning operator usually put this information? Can the NodePool controller read it directly from there, instead of using a middleware object?

In NTO on OCP, we have performanceprofile object and the performance profile controller is responsible for writing this information to the object status.
However, on hypershift performanceprofiles are applied as configmaps. Thus, adjustments for this are required in the NTO.
Since NTO with these adustements can't change the ConfigMap and not the PerformanceProfile embedded in it since it controlled by the Hypershift operator which will override the changes immediately.
The Nodepool is the object which suppose to report about the status of the PerformanceProfile since it represents the current configuration of the nodes.
Since NTO cannot modify the Nodepool directly, a middle-ware object (ConfigMap) is used to write to it the status and the nodepool controller will read from it.

@stevekuznetsov
Copy link
Contributor

Sorry if these are obvious questions, but I am new - using proxy objects to funnel state in a distributed system can have a lot of downsides w.r.t. consistency, especially around outages, as etcd has no transactions and no way to keep objects in sync.

However, on hypershift performanceprofiles are applied as configmaps. Thus, adjustments for this are required in the NTO.

Are performanceprofiles cluster-scoped in OCP core? Have we done a spike to see if we could namespace them and re-use that object validation for HCP?

Since NTO cannot modify the Nodepool directly,

Why?

@rbaturov
Copy link
Author

Sorry if these are obvious questions, but I am new - using proxy objects to funnel state in a distributed system can have a lot of downsides w.r.t. consistency, especially around outages, as etcd has no transactions and no way to keep objects in sync.

However, on hypershift performanceprofiles are applied as configmaps. Thus, adjustments for this are required in the NTO.

Are performanceprofiles cluster-scoped in OCP core? Have we done a spike to see if we could namespace them and re-use that object validation for HCP?

Yes performance profile are cluster scoped.

Since NTO cannot modify the Nodepool directly,

Why?

He doesn't have RBAC permissions for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants