Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kubelet]: fixed container restart due to pod spec field changes #124220

Merged
merged 2 commits into from
May 22, 2024

Conversation

HirazawaUi
Copy link
Contributor

@HirazawaUi HirazawaUi commented Apr 7, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

The kubelet should hash on fields that it consider as mutable for the life of the container.

This can prevent pod restarts caused by modification of the pod spec default value in the following two situations:

  1. If an alpha field that is not defaulted graduates to adding a default in beta, an API server upgrade from alpha->beta will modify this data.
  2. If an field is enabled through feature gate, if we disable the gate or enable the gate, the data will also be modified.

According to kubernetes/website#12326 (comment), we know that pods will be drained when upgrading kubelet, so we can change this mechanism on a minor version boundary.

Which issue(s) this PR fixes:

Fixes #122028 #63814

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Kubelet will not restart the container when fields other than image in the pod spec change.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. labels Apr 7, 2024
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.30 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.30.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Sun Apr 7 13:42:56 UTC 2024.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 7, 2024
@HirazawaUi HirazawaUi changed the title [kubelet]: fixed container restart due to field changes [kubelet]: fixed container restart due to pod spec field changes Apr 7, 2024
@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 7, 2024
@HirazawaUi
Copy link
Contributor Author

@liggitt Could you please tell me if I'm on the right track?

@bart0sh bart0sh added this to Triage in SIG Node PR Triage Apr 8, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 10, 2024
@ndixita
Copy link
Contributor

ndixita commented Apr 10, 2024

/cc @tallclair @ndixita

@kannon92
Copy link
Contributor

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Apr 10, 2024
// Note: this list must be updated if ever kubelet wants to allow mutations to other fields.
func pickFieldToHash(container *v1.Container) map[string]string {
retval := map[string]string{
"name": container.Name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not include fields like ID, or ImageID? These should also be immutable. Are you finding the smallest set of fields that don't require a restart?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is picking the smallest set of mutable fields in the REST API v1.Container that do require a restart.

ID and ImageID are not in the v1.Container type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not include fields like ID, or ImageID?

We already had some discussion above about not include the pod ID.
ref: #124220 (comment) #124220 (comment)

Are you finding the smallest set of fields that don't require a restart?

Yes, we would like your help in reviewing and verifying whether other fields in v1.Container need to be covered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, thanks!

@haircommander
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 20, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 6be440dc41e15951e9dd8427c3a5f8ff908ab020

@vinaykul
Copy link
Contributor

/lgtm

Thanks for the review @haircommander
Imho, the contents in the hash are a contract between kubelet and runtime that help the kubelet identify when it must restart the container. As @liggitt mentioned, this change includes the minimum set of fields in the Container struct that, when changed, indicate that container needs to be restarted. I just wanted to make sure we didn't miss something buried in the container struct somewhere.

/lgtm

@HirazawaUi
Copy link
Contributor Author

HirazawaUi commented May 20, 2024

/cc @AkihiroSuda @mrunalp @dchen1107 @SergeyKanzhelev for approval

And do we need to discuss adding gates to control risk proposed by @thockin?

@k8s-ci-robot
Copy link
Contributor

@HirazawaUi: GitHub didn't allow me to request PR reviews from the following users: for, approve.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @AkihiroSuda @mrunalp @dchen1107 @SergeyKanzhelev for approve

And do we need to discuss adding gates to control risk proposed by @thockin?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mrunalp
Copy link
Contributor

mrunalp commented May 21, 2024

I am making a pass for node approval.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 22, 2024
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 22, 2024
Copy link
Contributor

@vinaykul vinaykul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 22, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 88b44d0862a0c27bc7550a10c1732a65ee428895

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HirazawaUi, mrunalp, vinaykul

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
SIG Node PR Triage
Needs Reviewer
Development

Successfully merging this pull request may close these issues.

Enabling InPlacePodVerticalScaling feature gate causes restart of containers within the cluster