Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-place Pod Vertical Scaling feature #102884

Merged

Conversation

vinaykul
Copy link
Contributor

@vinaykul vinaykul commented Jun 15, 2021

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This PR brings the following changes that mostly implement In-place Pod Vertical Scaling feature:

  1. API change for In-place Pod Vertical Scaling feature
  2. Implementation of CRI API changes to support In-Place Pod Vertical Scaling.
  3. Core implementation that enables In-place vertical scaling for pods, comprehensively tested with docker runtime.
  4. Comprehensive E2E tests to validate In-place pod vertical scaling feature.

Which issue(s) this PR fixes: #9043 #110490

xref kubernetes/enhancements#1287

Special notes for your reviewer:

API changes: See: #111946

Scheduler changes: See
231849a
7db339d

Kubelet implementation: See changes in pkg/kubelet

E2E test: test/e2e/node/pod_resize.go

Does this PR introduce a user-facing change? Yes

 In-place resize feature for Kubernetes Pods
  - Changed the Pod API so that the `resources` defined for containers are mutable for `cpu` and `memory` resource types.
  - Added `resizePolicy` for containers in a pod to allow users control over how their containers are resized.
  - Added `allocatedResources` field to container status in pod status that describes the node resources allocated to a pod.
  - Added `resources` field to container status that reports actual resources applied to running containers.
  - Added `resize` field to pod status that describes the state of a requested pod resize.

  For details, see KEPs below. ([#102884](https://github.com/kubernetes/kubernetes/pull/102884), [@vinaykul](https://github.com/vinaykul)) [SIG API Machinery, Apps, Instrumentation, Node, Scheduling and Testing]

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1287-in-place-update-pod-resources
- [Usage]: via kubectl or API 
e.g kubectl patch pod bar --patch '{"spec":{"containers":[{"name":"ale", "resources":{"requests":{"memory":"500Mi"}, "limits":{"memory":"500Mi"}}}]}}'

Jun 26th:
PodStatus.Resize has now been fully implemented. @thockin Please see below. I hope this cuts as as simple signal to the API user (VPA) as to what's going on with resize, so they may choose to take alternative action in the Deferred / Infeasible cases as allowed by their policy.

root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# ./cluster/kubectl.sh describe no 127.0.0.1
Name:               127.0.0.1
Roles:              <none>
...
...
Addresses:
  InternalIP:  127.0.0.1
  Hostname:    127.0.0.1
Capacity:
  cpu:                16
  ephemeral-storage:  927125032Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32928300Ki
  pods:               110
Allocatable:
  cpu:                4
  ephemeral-storage:  854438428077
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             3465772Ki
  pods:               110
System Info:
...
Non-terminated Pods:          (1 in total)
  Namespace                   Name                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                        ------------  ----------  ---------------  -------------  ---
  kube-system                 coredns-66cf7947cf-zvlxf    100m (2%)     0 (0%)      70Mi (2%)        170Mi (5%)     11m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (2%)  0 (0%)
  memory             70Mi (2%)  170Mi (5%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-1Gi      0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:
...
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# 
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# cat ~/YML/2pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: 2pod
spec:
  containers:
  - name: stress
    image: skiibum/ubuntu-stress:18.10
    resources:
      limits:
        cpu: "500m"
        memory: "500Mi"
      requests:
        cpu: "500m"
        memory: "500Mi"
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# 
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# ./cluster/kubectl.sh create -f ~/YML/2pod.yaml 
pod/2pod created
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# 
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# ./cluster/kubectl.sh get po 2pod -oyaml 
apiVersion: v1
kind: Pod
metadata:
  name: 2pod
  namespace: default
spec:
  containers:
  - image: skiibum/ubuntu-stress:18.10
    name: stress
    resizePolicy:
    - policy: RestartNotRequired
      resourceName: cpu
    - policy: RestartNotRequired
      resourceName: memory
    resources:
      limits:
        cpu: 500m
        memory: 500Mi
      requests:
        cpu: 500m
        memory: 500Mi
...
...
status:
  conditions:
...
  containerStatuses:
  - containerID: docker://015b2d8605c732329129a8d61894ef5438b5a8ed09da0b5e56dad82d3b57a789
    image: skiibum/ubuntu-stress:18.10
    name: stress
    ready: true
    resources:
      limits:
        cpu: 500m
        memory: 500Mi
      requests:
        cpu: 500m
        memory: 500Mi
    resourcesAllocated:
      cpu: 500m
      memory: 500Mi
    restartCount: 0
    started: true
...
  qosClass: Guaranteed
  startTime: "2021-06-27T02:06:56Z"
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# 
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# ./cluster/kubectl.sh patch pod 2pod --patch '{"spec":{"containers":[{"name":"stress", "resources":{"requests":{"cpu":"650m"}, "limits":{"cpu":"650m"}}}]}}'
pod/2pod patched
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# 
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# ./cluster/kubectl.sh get po 2pod -oyaml 
apiVersion: v1
kind: Pod
metadata:
  name: 2pod
  namespace: default
spec:
  containers:
  - image: skiibum/ubuntu-stress:18.10
    name: stress
    resizePolicy:
    - policy: RestartNotRequired
      resourceName: cpu
    - policy: RestartNotRequired
      resourceName: memory
    resources:
      limits:
        cpu: 650m
        memory: 500Mi
      requests:
        cpu: 650m
        memory: 500Mi
...
...
status:
  conditions:
...
  containerStatuses:
  - containerID: docker://015b2d8605c732329129a8d61894ef5438b5a8ed09da0b5e56dad82d3b57a789
    image: skiibum/ubuntu-stress:18.10
    name: stress
    ready: true
    resources:
      limits:
        cpu: 500m
        memory: 500Mi
      requests:
        cpu: 500m
        memory: 500Mi
    resourcesAllocated:
      cpu: 650m
      memory: 500Mi
    restartCount: 0
    started: true
...
  qosClass: Guaranteed
  resize: InProgress
  startTime: "2021-06-27T02:06:56Z"
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# 
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# ./cluster/kubectl.sh patch pod 2pod --patch '{"spec":{"containers":[{"name":"stress", "resources":{"requests":{"cpu":"3950m"}, "limits":{"cpu":"3950m"}}}]}}'
pod/2pod patched
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# 
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# ./cluster/kubectl.sh get po 2pod -oyaml 
apiVersion: v1
kind: Pod
metadata:
  name: 2pod
  namespace: default
spec:
  containers:
  - image: skiibum/ubuntu-stress:18.10
    name: stress
    resizePolicy:
    - policy: RestartNotRequired
      resourceName: cpu
    - policy: RestartNotRequired
      resourceName: memory
    resources:
      limits:
        cpu: 3950m
        memory: 500Mi
      requests:
        cpu: 3950m
        memory: 500Mi
...
...
status:
  conditions:
...
  containerStatuses:
  - containerID: docker://015b2d8605c732329129a8d61894ef5438b5a8ed09da0b5e56dad82d3b57a789
    image: skiibum/ubuntu-stress:18.10
    name: stress
    ready: true
    resources:
      limits:
        cpu: 500m
        memory: 500Mi
      requests:
        cpu: 500m
        memory: 500Mi
    resourcesAllocated:
      cpu: 650m
      memory: 500Mi
    restartCount: 0
    started: true
...
  qosClass: Guaranteed
  resize: Deferred
  startTime: "2021-06-27T02:06:56Z"
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# 
(failed reverse-i-search)`': cat /sys/fs/cgroup/cpu/kubepods/podd0dd7678-^Cf5-4b55-ad5d-08a384113ed4/cpu.cfs_quota_us 
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# 
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# ./cluster/kubectl.sh patch pod 2pod --patch '{"spec":{"containers":[{"name":"stress", "resources":{"requests":{"cpu":"4650m"}, "limits":{"cpu":"4650m"}}}]}}'
pod/2pod patched
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# 
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# ./cluster/kubectl.sh get po 2pod -oyaml 
apiVersion: v1
kind: Pod
metadata:
  name: 2pod
  namespace: default
spec:
  containers:
  - image: skiibum/ubuntu-stress:18.10
    name: stress
    resizePolicy:
    - policy: RestartNotRequired
      resourceName: cpu
    - policy: RestartNotRequired
      resourceName: memory
    resources:
      limits:
        cpu: 4650m
        memory: 500Mi
      requests:
        cpu: 4650m
        memory: 500Mi
...
...
status:
  conditions:
...
  containerStatuses:
  - containerID: docker://015b2d8605c732329129a8d61894ef5438b5a8ed09da0b5e56dad82d3b57a789
    image: skiibum/ubuntu-stress:18.10
...
    name: stress
    ready: true
    resources:
      limits:
        cpu: 500m
        memory: 500Mi
      requests:
        cpu: 500m
        memory: 500Mi
    resourcesAllocated:
      cpu: 650m
      memory: 500Mi
    restartCount: 0
...
  qosClass: Guaranteed
  resize: Infeasible
  startTime: "2021-06-27T02:06:56Z"
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core# 
root@fw0000359:~/go/src/k8s.io/kubernetes-rfpvs-core#

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 15, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @vinaykul. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 15, 2021
@vinaykul
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 15, 2021
@vinaykul
Copy link
Contributor Author

vinaykul commented Jun 15, 2021

/assign @vinaykul

@vinaykul
Copy link
Contributor Author

@vinaykul
Copy link
Contributor Author

@fedebongio
Copy link
Contributor

/remove-sig api-machinery

@thockin
Copy link
Member

thockin commented Mar 3, 2023

What's the over/under on how long until this gets reverted?

in alpha clusters with all alpha feature gates enabled, kubelet is panicking

I would not have guessed 3 days :)

@vinaykul
Copy link
Contributor Author

vinaykul commented Mar 4, 2023

What's the over/under on how long until this gets reverted?

in alpha clusters with all alpha feature gates enabled, kubelet is panicking

I would not have guessed 3 days :)

@liggitt I'll look at the node CI again to see if I missed one somewhere.

@thockin So close! 😀

I'll this fix by checking all NP accesses for now. But now that I think about it, I'm wondering if I can toss out all this node checkpointing code in favor of relying on ResourcesAllocated and Resize values being persisted in status. (A bigger change, so maybe not rn so close to code freeze)

@vinaykul
Copy link
Contributor Author

vinaykul commented Mar 4, 2023

Potential fix. PTAL: #116271

@anoop2811
Copy link

anoop2811 commented Mar 5, 2023

Looking forward to this epic being available....this can be a life saver in these times of cost cutting. Hope companies using k8s use this for their not so modern apps to cut resource cost. Next most exciting one would be the multi dimensional scaling :)

@sftim
Copy link
Contributor

sftim commented Mar 14, 2023

For the changelog entry, we might prefer to describe the changes in terms of API fields.

The API doesn't have fields named PodSpec or Resources or ResizePolicy; those are instead spec, resources and resizePolicy. These capitalizations are what end users typically see.

I also like to use Markdown in the changelog. Something like (not tech reviewed for accuracy):

- Changed the Pod API so that the `resources` defined for a container are mutable for `cpu` and `memory` resources
- Added a `resizePolicy` for containers within a Pod
- Added an `allocatedResources` field within Pod status (reported per Pod)
- Added a `resources` field within Pod status for reporting actual resource allocations
- Extended `status` within the Pod API to report actual state for container resize operations
- Added Windows support for [CRI](https://k8s.io/docs/concepts/architecture/cri/)
  `UpdateContainerResources` operations

@lowang-bh
Copy link
Member

niubility and congratulation

@gdace829
Copy link

🐮

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/code-generation area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. ¯\_(ツ)_/¯ ¯\\\_(ツ)_/¯
Projects
Status: API review completed, 1.25
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

In-place rolling updates