Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TerminationGracePeriodSeconds is negative (part 1) #98866

Conversation

wzshiming
Copy link
Member

@wzshiming wzshiming commented Feb 8, 2021

What type of PR is this?

/kind bug
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #98506
xref #98507
xref #103476

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

ACTION REQUIRED: TerminationGracePeriodSeconds on pod specs and container probes should not be negative.
Negative values of TerminationGracePeriodSeconds will be treated as the value `1s` on the delete path.
Immutable field validation will be relaxed in order to update negative values. 
In a future release, negative values will not be permitted.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 8, 2021
@k8s-ci-robot k8s-ci-robot added area/kubelet kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 8, 2021
@wzshiming
Copy link
Member Author

/retest

@fejta-bot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@ehashman ehashman added this to Triage in SIG Node PR Triage Feb 8, 2021
@ehashman
Copy link
Member

ehashman commented Feb 8, 2021

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 8, 2021
@ehashman ehashman moved this from Triage to Needs Reviewer in SIG Node PR Triage Feb 8, 2021
@liggitt
Copy link
Member

liggitt commented Feb 9, 2021

the BeforeDelete implementation currently recalculates and updates DeletionGracePeriodSeconds:

// a resource was previously left in a state that was non-recoverable. We
// check if the existing stored resource has a grace period as 0 and if so
// attempt to delete immediately in order to recover from this scenario.
if objectMeta.GetDeletionGracePeriodSeconds() == nil || *objectMeta.GetDeletionGracePeriodSeconds() == 0 {
return false, false, nil
}
// only a shorter grace period may be provided by a user
if options.GracePeriodSeconds != nil {
period := int64(*options.GracePeriodSeconds)
if period >= *objectMeta.GetDeletionGracePeriodSeconds() {
return false, true, nil
}
newDeletionTimestamp := metav1.NewTime(
objectMeta.GetDeletionTimestamp().Add(-time.Second * time.Duration(*objectMeta.GetDeletionGracePeriodSeconds())).
Add(time.Second * time.Duration(*options.GracePeriodSeconds)))
objectMeta.SetDeletionTimestamp(&newDeletionTimestamp)
objectMeta.SetDeletionGracePeriodSeconds(&period)
return true, false, nil
}

There are two things that need changing here (with unit tests that exercise each change):

  • if the existing grace period is already less than zero we should delete immediately
  • if the new period is less than zero we should clip to zero
diff --git a/staging/src/k8s.io/apiserver/pkg/registry/rest/delete.go b/staging/src/k8s.io/apiserver/pkg/registry/rest/delete.go
index 3e7ca85b761..7f90e5e18be 100644
--- a/staging/src/k8s.io/apiserver/pkg/registry/rest/delete.go
+++ b/staging/src/k8s.io/apiserver/pkg/registry/rest/delete.go
@@ -103,9 +103,9 @@ func BeforeDelete(strategy RESTDeleteStrategy, ctx context.Context, obj runtime.
 		// 2. Delete the object from storage.
 		// If the update succeeds, but the delete fails (network error, internal storage error, etc.),
 		// a resource was previously left in a state that was non-recoverable.  We
-		// check if the existing stored resource has a grace period as 0 and if so
+		// check if the existing stored resource has a grace period <= 0 and if so
 		// attempt to delete immediately in order to recover from this scenario.
-		if objectMeta.GetDeletionGracePeriodSeconds() == nil || *objectMeta.GetDeletionGracePeriodSeconds() == 0 {
+		if objectMeta.GetDeletionGracePeriodSeconds() == nil || *objectMeta.GetDeletionGracePeriodSeconds() <= 0 {
 			return false, false, nil
 		}
 		// only a shorter grace period may be provided by a user
@@ -113,10 +113,13 @@ func BeforeDelete(strategy RESTDeleteStrategy, ctx context.Context, obj runtime.
 			period := int64(*options.GracePeriodSeconds)
 			if period >= *objectMeta.GetDeletionGracePeriodSeconds() {
 				return false, true, nil
+			} else if period < 0 {
+				// clip to zero
+				period = 0
 			}
 			newDeletionTimestamp := metav1.NewTime(
 				objectMeta.GetDeletionTimestamp().Add(-time.Second * time.Duration(*objectMeta.GetDeletionGracePeriodSeconds())).
-					Add(time.Second * time.Duration(*options.GracePeriodSeconds)))
+					Add(time.Second * time.Duration(period)))
 			objectMeta.SetDeletionTimestamp(&newDeletionTimestamp)
 			objectMeta.SetDeletionGracePeriodSeconds(&period)
 			return true, false, nil

@liggitt liggitt self-assigned this Feb 9, 2021
@liggitt liggitt removed the request for review from juanvallejo February 9, 2021 14:13
@k8s-ci-robot k8s-ci-robot added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jun 23, 2021

// Negative values will be treated as the value `1s` on the delete path.
if gracePeriodSeconds := options.GracePeriodSeconds; gracePeriodSeconds != nil && *gracePeriodSeconds < 0 {
options.GracePeriodSeconds = utilpointer.Int64(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding another options mutation looks like it will exacerbate the issue being fixed in #100101

@deads2k, is that PR close? is there something this PR should do differently to avoid that issue?

@@ -3873,9 +3873,35 @@ func ValidateContainerUpdates(newContainers, oldContainers []core.Container, fld
allErrs = append(allErrs, field.Invalid(fldPath.Index(i).Child("image"), ctr.Image, "must not have leading or trailing whitespace"))
}
}

// validate updated container probe
for i, ctr := range newContainers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if probe.TerminationGracePeriodSeconds is still alpha, can we add the validation to require non-negative values before promotion to beta, rather than making the update validation more complicated/expensive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Update and I sent a new patch to follow this #103245

@@ -4039,13 +4070,26 @@ func ValidatePodUpdate(newPod, oldPod *core.Pod, opts PodValidationOptions) fiel
// tolerations are checked before the deep copy, so munge those too
mungedPodSpec.Tolerations = oldPod.Spec.Tolerations // +k8s:verify-mutation:reason=clone

// Relax validation of immutable fields to allow it to be set to 1 if it was previously negative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lgtm

@liggitt liggitt moved this from In progress to Changes requested in API Reviews Jun 26, 2021
@wzshiming wzshiming force-pushed the fix/termination_grace_period_seconds_is_negative branch from 4a1708d to 963ae22 Compare June 28, 2021 03:45
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 28, 2021
@wzshiming wzshiming requested a review from liggitt June 28, 2021 03:47
@wzshiming wzshiming force-pushed the fix/termination_grace_period_seconds_is_negative branch from 963ae22 to a8d4cfa Compare June 28, 2021 03:50
@liggitt
Copy link
Member

liggitt commented Jun 28, 2021

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 28, 2021
@liggitt liggitt moved this from Changes requested to API review completed, 1.22 in API Reviews Jun 28, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, wzshiming

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 28, 2021
@k8s-ci-robot k8s-ci-robot merged commit 5e06f17 into kubernetes:master Jun 28, 2021
SIG Node PR Triage automation moved this from Needs Reviewer to Done Jun 28, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jun 28, 2021
@wzshiming wzshiming deleted the fix/termination_grace_period_seconds_is_negative branch June 28, 2021 15:13
@wzshiming
Copy link
Member Author

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: API review completed, 1.22
Development

Successfully merging this pull request may close these issues.

Force deleting pods not working after deletionGracePeriodSeconds set to a negative value