Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: maxUnavailable for StatefulSet #82162

Merged
merged 2 commits into from Mar 29, 2022
Merged

Conversation

krmayankk
Copy link

@krmayankk krmayankk commented Aug 30, 2019

Implements https://github.com/kubernetes/enhancements/pull/1010/files

MaxUnavailable for StatefulSets, allows faster RollingUpdate by taking down more than 1 pod at a time. The number of pods you want to take down during a RollingUpdate is configurable using maxUnavailable parameter.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/apps Categorizes an issue or PR as relevant to SIG Apps. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 30, 2019
@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@krmayankk
Copy link
Author

/assign @lavalamp

// The maximum number of pods that can be unavailable during the update.
// Value can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%).
// Absolute number is calculated from percentage by rounding down.
// Defaults to 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May it be zero?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment, it cannot be zero @lavalamp

@lavalamp
Copy link
Member

lavalamp commented Sep 3, 2019

Needs validation code.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 16, 2019
@krmayankk
Copy link
Author

Needs validation code.

added validation code

spec.UpdateStrategy.RollingUpdate.MaxUnavailable != nil &&
spec.UpdateStrategy.RollingUpdate.MaxUnavailable.IntValue() > 1 {
allErrs = append(allErrs, field.Invalid(fldPath.Child("updateStrategy").Child("rollingUpdate").Child("maxUnavailable"), spec.UpdateStrategy.RollingUpdate.MaxUnavailable,
fmt.Sprintf("with value greater than 1 only allowed for podManagementPolicy '%s'", apps.ParallelPodManagement)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also prevent using a percentage in this case, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, any reason not to prevent setting it at all with ordered policy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was very early API days when we were debating this for whether to implement for OrderedReady or not. Finally we decided to add for both, unit tests cover both scenarios. PTAL

@lavalamp
Copy link
Member

API looks OK now. Feature gate?

I guess all of the implementation goes in the controller?

@krmayankk
Copy link
Author

@lavalamp yes all of the implementation goes into the controller. Can we merge the api with feature gate first and then do the implementation since its alpha and always off ?(just makes it easier to do the changes)

@lavalamp
Copy link
Member

lavalamp commented Sep 26, 2019 via email

func TestSetDefaultStatefulSet(t *testing.T) {
defaultLabels := map[string]string{"foo": "bar"}
defaultMaxUnavailable := intstr.FromInt(1)
notTheDefaultMaxUnavailable := intstr.FromInt(3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same -- please remove

Type: apps.RollingUpdateStatefulSetStrategyType,
RollingUpdate: &apps.RollingUpdateStatefulSetStrategy{
Partition: 2,
MaxUnavailable: func() *intstr.IntOrString {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace with a call to the function you defined.

updateMin := 0
maxUnavailable := 1
if set.Spec.UpdateStrategy.RollingUpdate != nil {
updateMin = int(*set.Spec.UpdateStrategy.RollingUpdate.Partition)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partition could be nil?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I don't understand how this is supposed to work but it seems to me that this doesn't work if this is a non-zero value?

Copy link
Author

@krmayankk krmayankk Mar 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is all existing old code
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/statefulset/stateful_set_control.go#L578-L583

  • partition is defaulted to 0 always and its not under any feature flag, and hence that would be reason its not nil
  • if partition is non zero, the pods are deleted starting from highest ordinal to the ordinal equal to the partition number, all pods less than partition are left untouched
  • eg. maxUnavailable =2, total replicas 6 partition=3, so the pods deleted are 5,4(because maxUnavailable) first and eventually 3 also. 2, 1 and 0 remain untouched

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case you need to decide if maxUnavailable is per-partition or total, state that in the documentation, and probably fix the logic below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxUnavailable is total , we dont want to proceed with any update, if the total count of unavailable pods within 0 to replicas is greater than or equal to maxUnavailable

// to their ordinal for termination. Delete those pods and count the successful deletions. Update the status with the correct
// number of deletions.
unavailablePods := 0
for target := len(replicas) - 1; target >= updateMin; target-- {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If updateMin is not zero, this could miss counting some unhealthy pods?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the invariant has been established till this point(https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/statefulset/stateful_set_control.go#L497-L501), that all pods are running and ready before we attempt any deletion. See https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/statefulset/stateful_set_control.go#L444 and https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/statefulset/stateful_set_control.go#L452
and https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/statefulset/stateful_set_control.go#L497-L501 so technically we could leave this check as target >=updateMin since we from nowards are only deleting pods from replicas to updateMin(partition) in the next loop. For extra security, i will make it 0. @lavalamp

// Value can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%).
// Absolute number is calculated from percentage by rounding up. This can not be 0.
// Defaults to 1. This field is alpha-level and is only honored by servers that enable the
// MaxUnavailableStatefulSet feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should explain how this field interacts with the Partition field.

...The partition field comment is also not helpful and could be expanded if you know what it is supposed to mean.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not . maxUnavailable only determines at rolling update time, how many can we make go down, and partition limits the ordinals that can go down to the range [ replicas: partition]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will fix these comments, partition one is completely useless, i just noticed :-)

@@ -11487,6 +11491,555 @@
],
"type": "object"
},
"io.k8s.api.flowcontrol.v1alpha1.FlowDistinguisherMethod": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must need a rebase or something, your PR is not adding this field.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird i am not sure what is causing this to be generated. i removed this file and ran code generation again

@@ -0,0 +1,4093 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your PR should not be adding this file.

}

if tc == apps.OrderedReadyPodManagement {
// only one pod gets created at a time due to OrderedReady
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "loop with huge nested if" is not a good way to do a table driven test BTW. Better to extract logic to e.g. functions which are in the test cases and can just be run. I'm not sure if I'm asking for a change or not because I'm still trying to understand the test.

Copy link
Author

@krmayankk krmayankk Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my original PR, these were two different tests , one for OrderedReadyPodManagement and one for ParallelPodManagement. Since there was lot of duplication between two, i changed it to single test

@krmayankk krmayankk force-pushed the maxun branch 2 times, most recently from b51a150 to e957da8 Compare March 26, 2022 04:00
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2022
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 27, 2022
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Mar 28, 2022

@krmayankk: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce-device-plugin-gpu 739289444d5fa5f92d111759083d1479e388117f link /test pull-kubernetes-e2e-gce-device-plugin-gpu
pull-kubernetes-bazel-test 32e82c5ca9c79f3d87bfea1e158f667a48d68489 link /test pull-kubernetes-bazel-test

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

// Defaults to 1. This field is alpha-level and is only honored by servers that enable the
// MaxUnavailableStatefulSet feature. The field applies to all pods in the range 0 to
// Replicas-1. That means if there is any unavailable pod in the range 0 to Replicas-1, it
// will be counted towards MaxUnavailable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both these comments are huge improvements, thanks.

@lavalamp
Copy link
Member

/approve

For API.

I can't vouch for the controller tests, you'll need to get an lgtm from someone who can.

With the code split I'm no longer worried about breaking regular non-alpha users.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kow3ns, krmayankk, lavalamp, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 28, 2022
@ravisantoshgudimetla
Copy link
Contributor

ravisantoshgudimetla commented Mar 28, 2022

/lgtm

from controller standpoint. We can improve the tests in a separate PR later.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2022
@k8s-ci-robot k8s-ci-robot merged commit f85ff4b into kubernetes:master Mar 29, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Mar 29, 2022
@krmayankk krmayankk deleted the maxun branch March 30, 2022 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
Status: API review completed, 1.20
Development

Successfully merging this pull request may close these issues.

None yet