-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4622: New TopologyManager Policy: max-allowable-numa-nodes #4624
base: master
Are you sure you want to change the base?
Conversation
cyclinder
commented
May 8, 2024
- One-line PR description: New TopologyManager Policy: max-allowable-numa-nodes
- Issue link: KEP-4622: Add a TopologyManager policy option for MaxAllowableNUMANodes #4622
- Other comments: kubelet: Add a flag "MaxAllowableNUMANodes" kubernetes#124148
We expect no non-infra related flakes in the last month as a GA graduation criteria. | ||
--> | ||
|
||
TBD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need a e2e test here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for beta level it is strongly encouraged if not actually required, need to doublecheck
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: cyclinder The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -45,7 +45,7 @@ cd "${ROOT}" | |||
RES=0 | |||
echo "Checking spelling..." | |||
ERROR_LOG="${TMP_DIR}/errors.log" | |||
git ls-files | grep -v vendor | xargs misspell > "${ERROR_LOG}" | |||
git ls-files | grep -v vendor | xargs misspell -i $(grep -v '#' hack/.spelling_ignorewords) > "${ERROR_LOG}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change belongs there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to open another PR for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but I don't get why do we need this change in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to fix a CI failure, my name (cyclinder) did not pass the misspell check, so I made these changes, but I am not sure if I need to open a new PR for it, I put it on this pr for now. I can open another PR if it can be confirmed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thing is, usernames should not be spell-checked in the first place :\ I'll have a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that's the failure: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/enhancements/4624/pull-enhancements-verify/1788125378728955904 - it strongly believe it should be a separate PR. Pending conversation, I think kep.yaml
should not be spell-checked in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this expands the scope a bit, checking kep.yaml is ok, but it should not include the author name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fair also to exclude usernames from spell-checking kep.yaml
. Even in this case it should be a separate PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, maybe it's hard to exclude usernames from spell-checking kep.yaml
for misspell :(
@@ -0,0 +1,2 @@ | |||
# misspell ignore the following corrections, comma separated: fooa,boob | |||
cyclinder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I don't understand what you mean, Could you explain more about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering why do we need this change at all for this KEP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my name (cyclinder) did not pass the misspell check, so I made these changes, but I am not sure if I need to open a new PR for it, I put it on this pr for now. I can open another PR if it can be confirmed.
@@ -0,0 +1,39 @@ | |||
title: New TopologyManager Policy which configure the value of maxAllowableNUMANodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all this file looks fine to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the review!
Hi @klueska, Do you have a few comments on these files? |
<!-- | ||
**Note:** When your KEP is complete, all of these comment blocks should be removed. | ||
|
||
To get started with this template: | ||
|
||
- [ ] **Pick a hosting SIG.** | ||
Make sure that the problem space is something the SIG is interested in taking | ||
up. KEPs should not be checked in without a sponsoring SIG. | ||
- [ ] **Create an issue in kubernetes/enhancements** | ||
When filing an enhancement tracking issue, please make sure to complete all | ||
fields in that template. One of the fields asks for a link to the KEP. You | ||
can leave that blank until this KEP is filed, and then go back to the | ||
enhancement and add the link. | ||
- [ ] **Make a copy of this template directory.** | ||
Copy this template into the owning SIG's directory and name it | ||
`NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no | ||
leading-zero padding) assigned to your enhancement above. | ||
- [ ] **Fill out as much of the kep.yaml file as you can.** | ||
At minimum, you should fill in the "Title", "Authors", "Owning-sig", | ||
"Status", and date-related fields. | ||
- [ ] **Fill out this file as best you can.** | ||
At minimum, you should fill in the "Summary" and "Motivation" sections. | ||
These should be easy if you've preflighted the idea of the KEP with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to remove the tutorial comments like this one I'm partially quoting once you filed the relevant section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, update it now.
know that this has succeeded? | ||
--> | ||
- Introduce a new TopologyManager Policy Option called `max-allowable-numa-nodes`. | ||
- Improve the topology manager to remove the state explosion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it a goal of this KEP? I think this should be a non-goal because we want to enable users configure this limit but we do not aim to change the topology manager internal logic to do computations in a more efficient manner. Or do we? that would be a very significant scope increase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right, this should be a non-goal here. update it now.
and make progress. | ||
--> | ||
|
||
- This proposal does not aim to modify the existing TopologyManager Policies. It focuses solely on introducing a new policy for spreading the max allowable numa nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggested change:
"a new policy for spreading the max allowable numa nodes" -> "a new policy option to let users configure the maximum supported number of NUMA nodes"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes | ||
[kubernetes/website]: https://git.k8s.io/website | ||
|
||
## Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general in this document: NUMA is an acronym and so should be spelled in uppercase (e.g. not "numa" nor "Numa")
bogged down. | ||
--> | ||
|
||
#### Story 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add the story from the issue. I think I can probably find another user story, let me get back on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! It's a little hard for me to find a user story.
implementation difficulties, etc.). | ||
--> | ||
|
||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we can abuse metrics to report the configured value? let's hear from other reviewers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @klueska
|
||
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos | ||
--> | ||
No |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, right, I think we don't have SLIs/SLOs about pod admission time in the kubelet. Or do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, we don't have this, but better to confirm by other reviewers.
keps/sig-node/4622-topologymanager-max-allowable-numa-nodes/README.md
Outdated
Show resolved
Hide resolved
Are there any tests that were run/should be run to understand performance characteristics better | ||
and validate the declared limits? | ||
--> | ||
No |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about how "state explosion" will lead to more memory being used by the kubelet, if at all. However this should not cause "resource exhaustion" and we can defer to the GA graduation
…AllowableNUMANodes Signed-off-by: cyclinder <kuocyclinder@gmail.com>
I've added this to the tracking sheet for 1.31: @ffromani please let me know when this is ready for me to review |