-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle the case where container resource and request set to minimum values #120791
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Karthik-K-N The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @Jeffwan |
/cc @vinaykul |
/triage accepted @Karthik-K-N Thank you for the PR! Please, provide release note and cover this use case with unit or e2e test cases, thanks. |
a1b2cca
to
3d0689e
Compare
5f5d17a
to
c86ed2e
Compare
c86ed2e
to
1dc3916
Compare
A quick question on the current status. If the current status fall back to inconsistent values, does it introduce endless reconcile issues commented here #102884 (comment)
|
|
May I ask what you mean by "fall back to inconsistent values"? |
I believe its user setting 2m limit vs runtime setting 10m as the values. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does window's have the same minimum behavior here?
We can't break existing pods with tightening validation, but since resource updates aren't currently allowed outside of this alpha feature, we can apply stricter minimum values on update (only if the value changed). I think that would be worth doing, aside from the above windows question.
I'm also not sure about the desired end state. I feel like it would be better to have the Kubelet interpret all values below the minimum as the minimum, and report the actual used values in the pod status. So from your example, the outcome would be:
spec:
containers:
- command:
- tail
- -f
- /dev/null
image: ppc64le/ubuntu:latest
imagePullPolicy: Always
name: ubunutu
resizePolicy:
- resourceName: memory
restartPolicy: RestartContainer
- resourceName: cpu
restartPolicy: NotRequired
resources:
limits:
cpu: 1m
requests:
cpu: 1m
status:
.
.
.
containerStatuses:
- allocatedResources:
cpu: 2m
containerID: containerd://43baca32c3f20362057917d8ae22d8e6f820ae5320849a08304ec5891ef9215a
image: docker.io/ppc64le/ubuntu:latest
imageID: docker.io/ppc64le/ubuntu@sha256:de1fcbe2d7e928d0be9476e3079df8e69072dfb140be94f5d34091f09f6917dd
lastState: {}
name: ubunutu
ready: true
resources:
limits:
cpu: 10m
requests:
cpu: 2m
restartCount: 0
@@ -580,10 +580,20 @@ func (m *kubeGenericRuntimeManager) computePodResizeAction(pod *v1.Pod, containe | |||
currentMemoryLimit = kubeContainerStatus.Resources.MemoryLimit.Value() | |||
} | |||
if kubeContainerStatus.Resources.CPULimit != nil { | |||
currentCPULimit = kubeContainerStatus.Resources.CPULimit.MilliValue() | |||
kubeCurrentCPULimit := kubeContainerStatus.Resources.CPULimit.MilliValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code is a bit confusing, and could benefit from a comment & better variable names.
kubeCurrentCPULimit := kubeContainerStatus.Resources.CPULimit.MilliValue() | |
runtimeReportedCPULimit := kubeContainerStatus.Resources.CPULimit.MilliValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as suggested.
} | ||
if kubeContainerStatus.Resources.CPURequest != nil { | ||
currentCPURequest = kubeContainerStatus.Resources.CPURequest.MilliValue() | ||
kubeCurrentRequest := kubeContainerStatus.Resources.CPURequest.MilliValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubeCurrentRequest := kubeContainerStatus.Resources.CPURequest.MilliValue() | |
runtimeRecordedCPURequest := kubeContainerStatus.Resources.CPURequest.MilliValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as suggested.
@@ -580,10 +580,20 @@ func (m *kubeGenericRuntimeManager) computePodResizeAction(pod *v1.Pod, containe | |||
currentMemoryLimit = kubeContainerStatus.Resources.MemoryLimit.Value() | |||
} | |||
if kubeContainerStatus.Resources.CPULimit != nil { | |||
currentCPULimit = kubeContainerStatus.Resources.CPULimit.MilliValue() | |||
kubeCurrentCPULimit := kubeContainerStatus.Resources.CPULimit.MilliValue() | |||
if desiredCPULimit <= 10 && kubeCurrentCPULimit <= 10 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declare constants for this value. Same for the minimum request below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated as suggested.
/assign |
I haven't got a chance to check on windoes, May need some help.
I think this is the current behavior, runtime translates the minimum values(1m->2m for requests) and those values are displayed back. If we want to continue with this behavior, we can just fix the endless reconcilation. |
It's slightly different, because I'm proposing the Kubelet resolves the minimum for the allocation in the status, so that the allocation represents the real desired state. But yes, I am proposing to fix the reconciliation. |
I think If I understand correctly what you meant is, instead of runtime setting the minimum value , we need to make kubelet do the calculation to get the convert the actual values to runtime acceptable minimum values. If so how does it make difference. Currently the values in status are fetched and displayed from runtime. also as per previous discussion we cannot block the user to set values less than the runtime acceptable min values so the user desired state never become actual state. More info can be found here #102884 (comment) |
Either way the Kubelet needs to know what the minimum values are, whether that's to prevent the continuous reconcile or to translate the values. The reason I'd prefer the approach I suggested is that the status should reflect the real state, even if it doesn't match the requested state set by the user. I don't think the Kubelet should be translating the value reported by the runtime just so it matches the user's requested state. |
I got your point, based on the previous discussion I moved with this approach, may be I will do the necessary changes as suggested and update. |
aa6c4a9
to
34b1527
Compare
34b1527
to
747b7ff
Compare
Made changes only to avoid infinite reconciliation and also now the pod status displays the actual resources used by runtime when the resources are set to minimum. |
@Karthik-K-N: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Let me provide another perspective. Today we have o11y tools looking solely at c.Spec.Resources because that’s the only thing available to determine resource config for pods. With this feature, that is no longer true, the assumption is broken, and o11y tools need to adapt to this change. While I really want to avoid contextually interpreting runtime provided resource config when converting to values in the pod status, it may make sense to do so in this case, imho. I fired up a local cluster and tried out the below pod spec: apiVersion: v1
kind: Pod
metadata:
name: orca
spec:
containers:
- name: c1
image: skiibum/testpod:qos
resources:
requests:
"cpu": "1m"
"memory": "100Mi"
limits:
"cpu": "1m"
"memory": "200Mi"
- name: c2
image: skiibum/testpod:qos
resources:
requests:
"cpu": "2m"
limits:
"cpu": "2m"
- name: c3
image: skiibum/testpod:qos
resources:
requests:
"memory": "100Mi"
- name: c4
image: skiibum/testpod:qos
resources:
limits:
"memory": "100Mi"
- name: c5
image: skiibum/testpod:qos
resources:
requests:
"cpu": "3m"
limits:
"cpu": "3m"
- name: c6
image: skiibum/testpod:qos
resources:
requests:
"cpu": "5m"
limits:
"cpu": "10m"
- name: c7
image: skiibum/testpod:qos
resources:
requests:
"cpu": "8m"
limits:
"cpu": "11m" Consider the resource config reported in the status for the above spec: ...
type: PodScheduled
containerStatuses:
- allocatedResources:
cpu: 1m
memory: 100Mi
containerID: containerd://dbb27617dbcb5dbf33eac916ce4a1012029395ba395c6fd2883656362ac89ebe
image: docker.io/skiibum/testpod:qos
imageID: docker.io/skiibum/testpod@sha256:b002bd3862e83d939ae676a909390ff57f662f3e139e0dd1297ffa5b85938250
lastState: {}
name: c1
ready: true
resources:
limits:
cpu: 10m
memory: 200Mi
requests:
cpu: 2m
memory: 100Mi
restartCount: 0
started: true
state:
running:
startedAt: "2024-04-21T01:02:26Z"
- allocatedResources:
cpu: 2m
containerID: containerd://c3293b8c124f68fcb8a0319ff78e21ac6106881d109901a3060aa23e1bd28494
image: docker.io/skiibum/testpod:qos
imageID: docker.io/skiibum/testpod@sha256:b002bd3862e83d939ae676a909390ff57f662f3e139e0dd1297ffa5b85938250
lastState: {}
name: c2
ready: true
resources:
limits:
cpu: 10m
requests:
cpu: 2m
restartCount: 0
started: true
state:
running:
startedAt: "2024-04-21T01:02:27Z"
- allocatedResources:
memory: 100Mi
containerID: containerd://27c9874c7db64e9760541052792cde9c0663d4f03bfa665b7bfbbe1c0ef2d3ae
image: docker.io/skiibum/testpod:qos
imageID: docker.io/skiibum/testpod@sha256:b002bd3862e83d939ae676a909390ff57f662f3e139e0dd1297ffa5b85938250
lastState: {}
name: c3
ready: true
resources:
requests:
cpu: 2m
memory: 100Mi
restartCount: 0
started: true
state:
running:
startedAt: "2024-04-21T01:02:27Z"
- allocatedResources:
memory: 100Mi
containerID: containerd://0471105627ba4731d5ce1f975a06b2c23468e80a08b508e80a7a6b7dbb93c711
image: docker.io/skiibum/testpod:qos
imageID: docker.io/skiibum/testpod@sha256:b002bd3862e83d939ae676a909390ff57f662f3e139e0dd1297ffa5b85938250
lastState: {}
name: c4
ready: true
resources:
limits:
memory: 100Mi
requests:
cpu: 2m
memory: 100Mi
restartCount: 0
started: true
state:
running:
startedAt: "2024-04-21T01:02:27Z"
- allocatedResources:
cpu: 3m
containerID: containerd://71d4aa23a4dc61b6261be8544b729b230f58336c4af71e2dd1a632a0290a38d7
image: docker.io/skiibum/testpod:qos
imageID: docker.io/skiibum/testpod@sha256:b002bd3862e83d939ae676a909390ff57f662f3e139e0dd1297ffa5b85938250
lastState: {}
name: c5
ready: true
resources:
limits:
cpu: 10m
requests:
cpu: 3m
restartCount: 0
started: true
state:
running:
startedAt: "2024-04-21T01:02:27Z"
- allocatedResources:
cpu: 5m
containerID: containerd://c17d75fab3b40227254c5dbe6429be02c99b8ee1888b6d9e1caf52f0f3c2c9a2
image: docker.io/skiibum/testpod:qos
imageID: docker.io/skiibum/testpod@sha256:b002bd3862e83d939ae676a909390ff57f662f3e139e0dd1297ffa5b85938250
lastState: {}
name: c6
ready: true
resources:
limits:
cpu: 10m
requests:
cpu: 5m
restartCount: 0
started: true
state:
running:
startedAt: "2024-04-21T01:02:27Z"
- allocatedResources:
cpu: 8m
containerID: containerd://272cddd35ca16008285f4602f7c74c485301c1e4950a6792b47fb6d2ab291070
image: docker.io/skiibum/testpod:qos
imageID: docker.io/skiibum/testpod@sha256:b002bd3862e83d939ae676a909390ff57f662f3e139e0dd1297ffa5b85938250
lastState: {}
name: c7
ready: true
resources:
limits:
cpu: 11m
requests:
cpu: 8m
restartCount: 0
started: true
state:
running:
startedAt: "2024-04-21T01:02:28Z"
hostIP: 127.0.0.1
hostIPs:
- ip: 127.0.0.1
phase: Running
... Besides the CPU requests/limits not matching what was requested, it really looks inconsistent and "off" that user sees CPU requests in the status when none was specified in the spec, however the same is not true for memory even though there has to be some minimum RSS allocated just to start the container process. If any o11y tools have alerts setup such as “trigger pager if pod-obj-resources != pod-spec-resources”, we risk breaking them. By not interpreting the value as above, I suspect we inadvertently pass on the Linux kernel CFS nuances to the user, and shift the burden of interpreting this special case to the users. Hypothetically, if tomorrow we discover windows has some other nuance, then all consuming o11y tools would need to do something like “switch node-os { case linux: interpret_linux(); case windows: interpret_windows(); … } Any other time, I’d vote to display unfiltered observed values in the status. But given the ambiguity of this situation, I’m leaning towards a solution that makes the end user’s life easier. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This PR contains the changes to handle the case in which the container resource request and limit set to minimum value.
Which issue(s) this PR fixes:
Fixes #114123
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
Alpha feature code isse in KEP: kubernetes/enhancements#1287
Previous code review discussion: #102884 (comment)
pod.yaml
Current behavior
With this PR changes