Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HOSTEDCP-1044: Add nodepools telemetry metrics for HyperShift #2265

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

muraee
Copy link

@muraee muraee commented Feb 16, 2024

  • hypershift:nodepools:size
  • hypershift:nodepools:available_replicas

requires: openshift/hypershift#3593

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 16, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 16, 2024

@muraee: This pull request references HOSTEDCP-1044 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target only the "4.16.0" version, but multiple target versions were set.

In response to this:

  • hypershift:nodepools:size
  • hypershift:nodepools:available_replicas

requires: openshift/hypershift#3593

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

openshift-ci bot commented Feb 16, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: muraee
Once this PR has been reviewed and has the lgtm label, please assign danielmellado for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2024
@muraee
Copy link
Author

muraee commented Apr 30, 2024

/retest-required

@muraee
Copy link
Author

muraee commented Apr 30, 2024

cc @simonpasquier

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From openshift/hypershift#3593 (comment)

changed to aggregate per HostedCluster, the maximum cardinality would between 80-100

So at most a single management cluster running the hypershift operator could generate 2 x 100 = 200 series. It's still way above the "automatically approved" limit (https://rhobs-handbook.netlify.app/products/openshiftmonitoring/telemetry.md/#request-approval).

cc @jan--f @moadz

@@ -921,6 +921,16 @@ data:
# platform:hypershift_nodepools:max is the total number of nodepools managed by the hypershift operator by cluster platform
- '{__name__="platform:hypershift_nodepools:max"}'
#
# owners: (@openshift/team-hypershift-maintainers)
#
# cluster_name:hypershift_nodepools_size:sum is the total number of desired nodepool replicas managed by the hypershift operator per HostedCluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC HostedCluster = (cluster_name, exported_namespace) labels. Could these values contain identifying information?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so

@simonpasquier
Copy link
Contributor

Another question: is there a need to correlate these metrics with the telemetry metrics emitted from the guest cluster?

@muraee
Copy link
Author

muraee commented May 3, 2024

So at most a single management cluster running the hypershift operator could generate 2 x 100 = 200 series

per HostedCluster, multiple HostedClusters can be created in the same management cluster.

Another question: is there a need to correlate these metrics with the telemetry metrics emitted from the guest cluster?

No need cc @zanetworker

@simonpasquier
Copy link
Contributor

So at most a single management cluster running the hypershift operator could generate 2 x 100 = 200 series

per HostedCluster, multiple HostedClusters can be created in the same management cluster.

I don't get it. Say that a management cluster runs 100 HostedClusters then the cardinality of cluster_name:hypershift_nodepools_size:sum will be 100 (and the same for luster_name:hypershift_nodepools_available_replicas:sum hence the 200 series).Do you confirm?

@simonpasquier
Copy link
Contributor

cc @zanetworker, see #2265 (comment) (the previous mention failed).

@muraee
Copy link
Author

muraee commented May 3, 2024

hence the 200 series).Do you confirm?

@simonpasquier exactly right. sorry I was thinking of something else.

@zanetworker
Copy link

Another question: is there a need to correlate these metrics with the telemetry metrics emitted from the guest cluster?

No need @simonpasquier

- cluster_name:hypershift_nodepools_size:sum
- cluster_name:hypershift_nodepools_available_replicas:sum
Copy link
Contributor

openshift-ci bot commented May 7, 2024

@muraee: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/versions 1446005 link false /test versions
ci/prow/e2e-aws-ovn-single-node 1446005 link false /test e2e-aws-ovn-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@muraee
Copy link
Author

muraee commented May 8, 2024

/retest-required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants