Add DaemonSet support in PDB #98307

shvgn · 2021-01-22T11:45:54Z

What type of PR is this?

/kind fix
/sig apps

What this PR does / why we need it:

Supports DaemonSets in disruption controller by adding /scale subresource to daemonsets API. It allows to control the eviction rate of DaemonSet pods.

Which issue(s) this PR fixes:

#108124

Does this PR introduce a user-facing change?:

Added /scale  subresource to DaemonSet API to support of PodDisruptionBudget for DaemonSet pods.

k8s-ci-robot · 2021-01-22T11:46:02Z

Welcome @shvgn!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2021-01-22T11:46:03Z

Hi @shvgn. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

leilajal · 2021-01-26T21:07:45Z

/remove-sig api-machinery

smarterclayton · 2021-02-01T17:21:30Z

Is there a KEP connecting to this?

smarterclayton · 2021-02-01T17:26:32Z

pkg/controller/disruption/disruption.go

@@ -174,14 +182,15 @@ func NewDisruptionController(
 // resources directly and only fall back to the scale subresource when needed.
 func (dc *DisruptionController) finders() []podControllerFinder {
 	return []podControllerFinder{dc.getPodReplicationController, dc.getPodDeployment, dc.getPodReplicaSet,
-		dc.getPodStatefulSet, dc.getScaleController}
+		dc.getPodStatefulSet, dc.getPodDaemonSet, dc.getScaleController}


I admit I'm surprised we have added this method in general. Where is the fallback logic to use scale for this case? getExpectedScale() doesn't implement a generic fallback, so that means this logic is not extensible and that's not great. If there is a generic fallback, that would imply this is an optimization only (in which case, I would have expected the bug report to be "controller lookups for daemonsets are slow and this improves performance", which doesn't match what you are reporting).

Is the generic lookup dc.getScaleController? Is Scale not implemented for DaemonSets?

Is Scale not implemented for DaemonSets?

It is not.

https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/apps/v1/types.go#L636

Oh.

Oh....

Hrm. That's really ugly. It should support read scale but not write scale, or reject write scale.

mortent · 2021-02-04T05:54:25Z

/ok-to-test
How will this interact with the eviction api, in particular the kubectl drain command that has a --ignore-daemonsets flag?

This should probably have a KEP. kubernetes/enhancements#963 is the KEP for adding support for the scale subresource in PDBs

shvgn · 2021-02-08T10:04:27Z

@mortent @smarterclayton Thank you for your feedback. There is no KEP for this by now, I'll add one. And I'll look into implementing the scale subresource for DS.

How will this interact with the eviction api, in particular the kubectl drain command that has a --ignore-daemonsets flag?

PDB's status.disruptionsAllowed calculates correctly that leads to the desired eviction behavior.

The --ignore-daemonsets flag on draining leaves DS pods untouched, and PDB's 'status.disruptionsAllowed' does not change. From the perspective of DS pods and its PDB, nothing changes.

michaelgugino · 2021-02-10T17:49:29Z

I have a drain patch that went rotten for draining daemonsets here: #88345

I can revive if necessary if we decide to support this DaemonSet option in PDBs. Right now, 'drain' is really the only (I might be making this up) client that supports the eviction API, so we'd want to get in my change or a similar one.

fejta-bot · 2021-07-10T09:54:55Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

atiratree · 2023-03-13T22:05:22Z

@mhmxs regarding your testing in #98307 (comment), I think you might be running into an #116552 issue, which I am trying to fix in #116554

atiratree · 2023-03-13T22:06:21Z

@shvgn are you planning to continue with this PR?

nabokihms · 2023-03-13T22:53:40Z

@atiratree hello there! I will continue communication here since @shvgn is unavailable (busy with other important tasks).

We in Flant used this patch to prevent unexpected evictions and we are still willing to contribute. I will try to check the code and answer all your questions.

mhmxs · 2023-03-14T10:08:46Z

@mhmxs regarding your testing in #98307 (comment), I think you might be running into an #116552 issue, which I am trying to fix in #116554

Thanks, i would retest after the rebase.

ibihim · 2023-04-24T16:20:35Z

/remove-sig auth

atiratree · 2023-06-20T20:13:15Z

@nabokihms @shvgn do you have the capacity to look into this for 1.28?

shvgn · 2023-06-26T16:00:23Z

@atiratree Unfortunately, I won't be able to work on this PR in upcoming weeks.

neolit123 · 2023-07-17T13:21:47Z

/remove-area kubeadm
/remove-sig cluster-lifecycle

rexagod · 2023-08-04T22:11:01Z

/remove-sig instrumentation

sftim · 2023-08-16T09:58:00Z

I don't understand what a scale subresource means for a DaemonSet. What I think it'd mean is that setting scale to 2 means I want two copies on each selected node.

@shvgn, you'd really need a KEP to make this proposal land.

nabokihms · 2023-08-17T05:21:50Z

@sftim, there was a KEP, but after that, we agreed that this is more a bug than a feature, so KEP is not required. I personally agree with you that a KEP will make things way more transparent for all parties.

Can we rediscuss it once again?
@mortent @smarterclayton wdyt?

liggitt · 2023-08-17T14:42:16Z

@sftim, there was a KEP, but after that, we agreed that this is more a bug than a feature, so KEP is not required.

I don't see how absence of a scale subresource for daemonset is a bug... this definitely seems like a feature, and one which (in my opinion) needs more evidence / understanding / design before being accepted or implemented

nabokihms · 2023-08-17T19:27:59Z

@liggitt the initial attempt to write a KEP was here kubernetes/enhancements#3089, but per comment was closed. Should a new KEP for a scale subresource be introduced? Should we discuss offered solution once more (and probably reconsider it)?

liggitt · 2023-08-17T19:50:14Z

If we want to narrowly make PDB work specifically with daemonset pods and consider it not interoperating with daemonset pods a bug, that's one thing.

If we want to fix that issue by adding a scale subresource to DaemonSet, I think that needs a KEP, since it has implications way beyond PDB.

nabokihms · 2023-08-17T20:05:03Z

As @sftim mentioned, the whole idea seems odd, like how can we scale pods for the controller that suppose to run pods on every node? However, ignoring PDB is another odd thing because PDB is not about scaling (despite being connected to a scale subresource in other controllers' implementation).

My intention is only to fix PDB, not invent a whole new conception of daemonset scaling.

liggitt · 2023-08-17T20:40:55Z

so maybe that means teaching the disruption controller about daemonsets specifically in

kubernetes/pkg/controller/disruption/disruption.go

Lines 244 to 258 in c184284

    
           // The workload resources do implement the scale subresource, so it would 
        
           // be possible to only check the scale subresource here. But since there is no 
        
           // way to take advantage of listers with scale subresources, we use the workload 
        
           // resources directly and only fall back to the scale subresource when needed. 
        
           func (dc *DisruptionController) finders() []podControllerFinder { 
        
           	return []podControllerFinder{dc.getPodReplicationController, dc.getPodDeployment, dc.getPodReplicaSet, 
        
           		dc.getPodStatefulSet, dc.getScaleController} 
        
           } 
        
           var ( 
        
           	controllerKindRS  = v1beta1.SchemeGroupVersion.WithKind("ReplicaSet") 
        
           	controllerKindSS  = apps.SchemeGroupVersion.WithKind("StatefulSet") 
        
           	controllerKindRC  = v1.SchemeGroupVersion.WithKind("ReplicationController") 
        
           	controllerKindDep = v1beta1.SchemeGroupVersion.WithKind("Deployment") 
        
           )

(like it already knows about replicasets, etc, specifically)

atiratree · 2023-08-22T10:04:39Z

As @sftim mentioned, the whole idea seems odd, like how can we scale pods for the controller that suppose to run pods on every node? However, ignoring PDB is another odd thing because PDB is not about scaling (despite being connected to a scale subresource in other controllers' implementation).

I just want to emphasise, that this is about adding a read only scale subresource, so we would not have to solve scaling of the DS in any sense.

But since adding only part of the scale API could have implications for other users, adding the DS finder seems like a simpler/easier thing to do in the short term.

mhmxs · 2023-08-22T10:15:25Z

the whole idea seems odd

@nabokihms our usecase is a bit different. We need this feature to prevent deleting nodes. If PDB maxUnavailable is 1 and any of the daemonsets is not ready (they are watchdogs exactly) we can block upgrade or many other node related operations. Because Kube control plane is not able to delete any of the daemonset pods.

Currently, this is not available, and we need an extra controller to update minAvailable to #nodes-1 all the time when a node changes.

knelasevero · 2024-01-12T13:47:23Z

Hey, @shvgn do you plan to get back to this PR? If not I can take over and go with the teaching the disruption controller about daemonsets specifically approach. (I'm working on it)

k8s-triage-robot · 2024-04-04T19:07:23Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen
Mark this PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2024-04-04T19:07:29Z

@k8s-triage-robot: Closed this PR.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Reopen this PR with /reopen

Mark this PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 22, 2021

k8s-ci-robot requested review from mortent and smarterclayton January 22, 2021 11:46

k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jan 22, 2021

k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jan 26, 2021

smarterclayton reviewed Feb 1, 2021

View reviewed changes

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 4, 2021

liggitt added this to Triage in PodDisruptionBudget Apr 1, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 6, 2021

shvgn marked this pull request as draft April 11, 2021 09:28

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 11, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 10, 2021

shvgn force-pushed the apps/pdb-for-ds branch from 5f0a0b6 to 8631765 Compare July 19, 2021 06:51

k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Jul 19, 2021

k8s-ci-robot removed the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Apr 24, 2023

liggitt removed their assignment May 13, 2023

k8s-ci-robot removed area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. labels Jul 17, 2023

k8s-ci-robot removed the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Aug 4, 2023

dims removed the area/dependency Issues or PRs related to dependency changes label Jan 4, 2024

dims added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 5, 2024

k8s-ci-robot closed this Apr 4, 2024

SIG Node PR Triage automation moved this from WIP to Done Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DaemonSet support in PDB #98307

Add DaemonSet support in PDB #98307

shvgn commented Jan 22, 2021 •

edited

k8s-ci-robot commented Jan 22, 2021

k8s-ci-robot commented Jan 22, 2021

leilajal commented Jan 26, 2021

smarterclayton commented Feb 1, 2021

smarterclayton Feb 1, 2021

smarterclayton Feb 1, 2021

zuzzas Feb 2, 2021

smarterclayton Feb 3, 2021

smarterclayton Feb 3, 2021

mortent commented Feb 4, 2021

shvgn commented Feb 8, 2021

michaelgugino commented Feb 10, 2021

fejta-bot commented Jul 10, 2021

atiratree commented Mar 13, 2023

atiratree commented Mar 13, 2023

nabokihms commented Mar 13, 2023

mhmxs commented Mar 14, 2023

ibihim commented Apr 24, 2023

atiratree commented Jun 20, 2023

shvgn commented Jun 26, 2023

neolit123 commented Jul 17, 2023

rexagod commented Aug 4, 2023

sftim commented Aug 16, 2023

nabokihms commented Aug 17, 2023

liggitt commented Aug 17, 2023 •

edited

nabokihms commented Aug 17, 2023

liggitt commented Aug 17, 2023

nabokihms commented Aug 17, 2023

liggitt commented Aug 17, 2023

atiratree commented Aug 22, 2023

mhmxs commented Aug 22, 2023

knelasevero commented Jan 12, 2024 •

edited

k8s-triage-robot commented Apr 4, 2024

k8s-ci-robot commented Apr 4, 2024

Add DaemonSet support in PDB #98307

Add DaemonSet support in PDB #98307

Conversation

shvgn commented Jan 22, 2021 • edited

k8s-ci-robot commented Jan 22, 2021

k8s-ci-robot commented Jan 22, 2021

leilajal commented Jan 26, 2021

smarterclayton commented Feb 1, 2021

smarterclayton Feb 1, 2021

Choose a reason for hiding this comment

smarterclayton Feb 1, 2021

Choose a reason for hiding this comment

zuzzas Feb 2, 2021

Choose a reason for hiding this comment

smarterclayton Feb 3, 2021

Choose a reason for hiding this comment

smarterclayton Feb 3, 2021

Choose a reason for hiding this comment

mortent commented Feb 4, 2021

shvgn commented Feb 8, 2021

michaelgugino commented Feb 10, 2021

fejta-bot commented Jul 10, 2021

atiratree commented Mar 13, 2023

atiratree commented Mar 13, 2023

nabokihms commented Mar 13, 2023

mhmxs commented Mar 14, 2023

ibihim commented Apr 24, 2023

atiratree commented Jun 20, 2023

shvgn commented Jun 26, 2023

neolit123 commented Jul 17, 2023

rexagod commented Aug 4, 2023

sftim commented Aug 16, 2023

nabokihms commented Aug 17, 2023

liggitt commented Aug 17, 2023 • edited

nabokihms commented Aug 17, 2023

liggitt commented Aug 17, 2023

nabokihms commented Aug 17, 2023

liggitt commented Aug 17, 2023

atiratree commented Aug 22, 2023

mhmxs commented Aug 22, 2023

knelasevero commented Jan 12, 2024 • edited

k8s-triage-robot commented Apr 4, 2024

k8s-ci-robot commented Apr 4, 2024

shvgn commented Jan 22, 2021 •

edited

liggitt commented Aug 17, 2023 •

edited

knelasevero commented Jan 12, 2024 •

edited