[JENKINS-67679] add maintenance mode to pause provisioning nodes #1134

fengxx · 2022-02-15T07:58:49Z

fix JENKINS-67679 add maintenance mode to pause provisioning nodes

Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
Ensure that the pull request title represents the desired changelog entry
Please describe what you did
Link to relevant issues in GitHub or Jira
Link to relevant pull requests, esp. upstream and downstream changes
Ensure you have provided tests - that demonstrates feature works or fixes the issue

Vlatombe

Some minor style items to fix but the feature and logic looks fine.

src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloud.java

src/test/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesCloudTest.java

jglick · 2022-02-22T21:24:17Z

Is this really necessary? Can you not get a similar effect by putting some sort of taint on the cluster’s nodes, or making an admission webhook reject new pods? Or would builds fail with an error after timing out trying to schedule a pod?

If this really needs to be done on the Jenkins side, maybe it would be better for core to have the ability to take a Cloud offline.

Vlatombe · 2022-02-23T08:44:32Z

If this really needs to be done on the Jenkins side, maybe it would be better for core to have the ability to take a Cloud offline.

This looks like a good idea actually.

dduportal · 2022-02-24T07:28:30Z

Is this really necessary? Can you not get a similar effect by putting some sort of taint on the cluster’s nodes, or making an admission webhook reject new pods? Or would builds fail with an error after timing out trying to schedule a pod?

The permissions that a Jenkins admin are given are not always the same as the set of permissions of the Kubernetes cluster (even not permitted to access the cluster API). This PR is really legit in term of usage.

If this really needs to be done on the Jenkins side, maybe it would be better for core to have the ability to take a Cloud offline.

That is really an excellent idea: this change would benefit way more users. Where should it be looked (a particular plugin? or the Jenkins core)?

fengxx · 2022-02-24T13:27:04Z

Is this really necessary? Can you not get a similar effect by putting some sort of taint on the cluster’s nodes, or making an admission webhook reject new pods? Or would builds fail with an error after timing out trying to schedule a pod?

If this really needs to be done on the Jenkins side, maybe it would be better for core to have the ability to take a Cloud offline.

maintenance usually includes cordon and drain, cordon will put a taint on node, drain will evict pods to other nodes (drain is necessary for kernel upgrade, docker/containerd uprade etc)

kubectl drain node-name--delete-local-data --ignore-daemonsets

Jenkins job pod will be deleted from the node and job will fail. Even if we enable retry on job level, plugin may launch a pod on other nodes which will also be drained. You may ask why don't put all node on cordon, that is overkill since kubernetes service are not affected by cordon and drain, kubernetes has logic to maintain PodDisruptionBudget in drain operation. Jenkins job is the only victim, we can not freeze whole cluster for jenkins if the cluster is shared.

I tried to workaround the issue by using a system groovy job to put k8s plugin on hold

import jenkins.model.Jenkins
import org.csanchez.jenkins.plugins.kubernetes.KubernetesCloud;

def instance=Jenkins.get()
// we only care first item since we know only one defined
def clouds=instance.clouds.getAll(KubernetesCloud.class).findAll{it -> filter-clause }
clouds.each{ cloud ->
  cloud.containerCap=-1
}

But if jenkins itself is running in k8s cluster and gets restarted, the containerCap will lost due to check (it is a valid check)

    public void setContainerCap(Integer containerCap) {
        this.containerCap = (containerCap != null && containerCap > 0) ? containerCap : null;
    }
    public String getContainerCapStr() {
        // null, serialized Integer.MAX_VALUE, or 0 means no limit
        return (containerCap == null || containerCap == Integer.MAX_VALUE || containerCap == 0) ? "" : String.valueOf(containerCap);
    }

so I think it is better to add a self-explaining flag

jglick · 2022-02-24T16:19:23Z

Jenkins job pod will be deleted from the node and job will fail.

FYI: #1083

jonesbusy · 2022-10-13T06:58:20Z

Hi,

I'm looking a similar feature to put a cloud in maintenance mode.

During my search I found the agent-maintenance-plugin but it cover partially my use case of permanent jenkins agents.

Regards,

Vlatombe · 2023-01-19T08:59:42Z

Taking another look at this request: I think the agent-maintenance-plugin could be extended to support clouds. There is an existing CloudProvisioningListener extension point allowing to prevent cloud provisioning on a specific cloud and label (hudson.slaves.CloudProvisioningListener#canProvision(hudson.slaves.Cloud, hudson.slaves.Cloud.CloudState, int)). This would cover the use case for any kind of cloud, as well as providing a central place to configure such maintenance windows.

Vlatombe changed the title ~~fix JENKINS-67679 add maintenance mode to pause provisioning nodes~~ [JENKINS-67679] add maintenance mode to pause provisioning nodes Feb 21, 2022

Vlatombe added the enhancement Improvements label Feb 21, 2022

Vlatombe reviewed Feb 21, 2022

View reviewed changes

fix JENKINS-67679 add maintenanceMode to pause provisioning nodes

5543680

fengxx force-pushed the feature/JENKINS-67679 branch from 57b41c3 to 5543680 Compare February 22, 2022 08:23

Vlatombe requested a review from jglick February 22, 2022 09:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JENKINS-67679] add maintenance mode to pause provisioning nodes #1134

[JENKINS-67679] add maintenance mode to pause provisioning nodes #1134

fengxx commented Feb 15, 2022

Vlatombe left a comment

jglick commented Feb 22, 2022

Vlatombe commented Feb 23, 2022

dduportal commented Feb 24, 2022

fengxx commented Feb 24, 2022 •

edited

jglick commented Feb 24, 2022

jonesbusy commented Oct 13, 2022

Vlatombe commented Jan 19, 2023

[JENKINS-67679] add maintenance mode to pause provisioning nodes #1134

Are you sure you want to change the base?

[JENKINS-67679] add maintenance mode to pause provisioning nodes #1134

Conversation

fengxx commented Feb 15, 2022

Vlatombe left a comment

Choose a reason for hiding this comment

jglick commented Feb 22, 2022

Vlatombe commented Feb 23, 2022

dduportal commented Feb 24, 2022

fengxx commented Feb 24, 2022 • edited

jglick commented Feb 24, 2022

jonesbusy commented Oct 13, 2022

Vlatombe commented Jan 19, 2023

fengxx commented Feb 24, 2022 •

edited