Scheduler will run into race conditions on large scale clusters #106361

ahg-g · 2021-11-11T18:09:08Z

What happened?

The scheduler has a 30s timeout for the bind operation to succeed; if we don't get a response within 30s, the in-memory assignment of pod to node in the scheduler cache expires.

A race condition will happen in the follow case:

pod1 is assigned to a node, scheduler cache is updated with the assignment, bind operation issued to apiserver.
if the apiserver is under huge pressure, bind takes more than 30s, scheduler expires the cached pod-to-node assignment.
bind eventually succeeds, but because the apiserver is under huge pressure, the pod update with the node name takes a long time to propagate to the scheduler.
because the pod update took a long time to propagate and the cache entry expired, the scheduler is not aware that the assignment actually happened, and so it had no problem assigning a second pod to the same node that would otherwise not fit if the scheduler was aware that the first pod was eventually assigned to the node.

On the scheduler side, what we need to do is make the 30s longer for large clusters, and ideally adaptable to cluster state.

/sig scheduling

What did you expect to happen?

No race conditions.

How can we reproduce it (as minimally and precisely as possible)?

Create a large scale cluster.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot · 2021-11-11T18:09:15Z

@ahg-g: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alculquicondor · 2021-11-11T18:48:19Z

IIRC, the 30s timeout only starts running after we receive a 200 response for the binding. But yes, if it takes longer than 30s to receive the update, we would invalidate the cache.

I don't know if there is particular reason to invalidate the cache at all. I suppose it was meant to guard against missing Pod deletion events. We need to improve that regardless.

I would be in favor of dropping the timeout altogether.

ahg-g · 2021-11-11T19:20:43Z

ok, so the deadline is effectively for receiving the update.

Perhaps making it longer, like 15min, is good first step before completely removing it just so we can easily revert in case something else goes wrong.

ahg-g · 2021-11-14T22:24:13Z

I sent #106412 to increase the timeout to 15min. We should backport this.

wojtek-t · 2021-11-30T09:50:25Z

I don't know if there is particular reason to invalidate the cache at all. I suppose it was meant to guard against missing Pod deletion events. We need to improve that regardless.

We should double check the history, but I think that in the past we were:

asynchronously sending bind request
updating the cache before sending the request
and then the timeout was actually protecting us from the binding request actually not succeeding.

So I agree with Aldo - that we should double check if this timeout is at all needed currently.

k8s-triage-robot · 2022-02-28T10:18:56Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-03-30T10:46:47Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

kerthcet · 2022-03-31T02:59:31Z

/remove-lifecycle rotten

k8s-triage-robot · 2022-06-29T03:00:03Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

alculquicondor · 2022-06-29T16:07:36Z

We are probably good to remove this timeout altogether at this point.

/remove-lifecycle stale

alculquicondor · 2022-06-29T16:07:52Z

/good-first-issue

fabianberisha · 2022-06-29T16:20:10Z

I see this issue was labeled as "good first issue" and I would like to start working on it! @alculquicondor

alculquicondor · 2022-08-26T14:34:36Z

one more release before we can remove this code.

anson627 · 2022-10-31T19:36:17Z

do you have plan to cherry pick for previous versions, e.g 1.23, 1.24?

alculquicondor · 2022-10-31T19:50:01Z

did you mean 1.13 and 1.14?

anson627 · 2022-10-31T20:12:29Z

oh, sorry my bad, I meant for 1.23 and 1.24

alculquicondor · 2022-10-31T20:20:13Z

It is fixed in 1.23 #106412, so it would be fixed in 1.24 too.

anson627 · 2022-10-31T21:07:46Z

my understanding is the previous fix to increase the duration only reduce the chance of issue, but not completely avoid it, that's why the other 2 fixes #110925 and #110954 are needed, right?

emiljanogj · 2022-12-08T01:11:25Z

Hi, is this issue still available?

alculquicondor · 2022-12-08T14:30:08Z

@anson627 yes, but since the likelihood that an issue happens is very low, I don't think it's worth the risk of cherry-picking at this point.

alculquicondor · 2022-12-08T14:32:12Z

According to #110925 (comment), we can remove the TTL code entirely in 1.27

I'll give the chance to @kapiljain1989 to confirm if they can still do this.

In the meantime:
/unassign @fabianberisha
/unassign @zabrox
/remove-help

k8s-triage-robot · 2023-03-08T14:48:22Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

alculquicondor · 2023-03-08T15:33:47Z

/remove-lifecycle stale
/unassign @kapiljain1989
as we haven't heard from them.

k8s-triage-robot · 2023-06-06T16:14:52Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-07-06T16:33:45Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

wackxu · 2023-07-20T12:50:16Z

I want to help fix this
/assign

alculquicondor · 2023-07-20T14:59:14Z

Thanks. What is left is to remove the timeout logic altogether.

k8s-triage-robot · 2024-01-20T04:11:46Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-01-20T04:11:52Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ahg-g added the kind/bug Categorizes issue or PR as related to a bug. label Nov 11, 2021

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Nov 11, 2021

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Nov 11, 2021

ahg-g mentioned this issue Nov 14, 2021

Increase the duration to expire an assumed pod #106412

Merged

wojtek-t mentioned this issue Nov 30, 2021

Add watchcache metrics to tracking its progress #106737

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 30, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 31, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 29, 2022

k8s-ci-robot unassigned fabianberisha and zabrox Dec 8, 2022

k8s-ci-robot removed help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. labels Dec 8, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 8, 2023

k8s-ci-robot unassigned kapiljain1989 Mar 8, 2023

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 8, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 6, 2023

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 6, 2023

k8s-ci-robot assigned wackxu Jul 20, 2023

wackxu mentioned this issue Jul 21, 2023

remove the scheduler cache timeout logic #119498

Closed

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler will run into race conditions on large scale clusters #106361

Scheduler will run into race conditions on large scale clusters #106361

ahg-g commented Nov 11, 2021

k8s-ci-robot commented Nov 11, 2021

alculquicondor commented Nov 11, 2021

ahg-g commented Nov 11, 2021

ahg-g commented Nov 14, 2021

wojtek-t commented Nov 30, 2021

k8s-triage-robot commented Feb 28, 2022

k8s-triage-robot commented Mar 30, 2022

kerthcet commented Mar 31, 2022

k8s-triage-robot commented Jun 29, 2022

alculquicondor commented Jun 29, 2022

alculquicondor commented Jun 29, 2022

fabianberisha commented Jun 29, 2022

alculquicondor commented Aug 26, 2022

anson627 commented Oct 31, 2022 •

edited

alculquicondor commented Oct 31, 2022

anson627 commented Oct 31, 2022

alculquicondor commented Oct 31, 2022

anson627 commented Oct 31, 2022 •

edited

emiljanogj commented Dec 8, 2022 •

edited

alculquicondor commented Dec 8, 2022

alculquicondor commented Dec 8, 2022

k8s-triage-robot commented Mar 8, 2023

alculquicondor commented Mar 8, 2023

k8s-triage-robot commented Jun 6, 2023

k8s-triage-robot commented Jul 6, 2023

wackxu commented Jul 20, 2023

alculquicondor commented Jul 20, 2023

k8s-triage-robot commented Jan 20, 2024

k8s-ci-robot commented Jan 20, 2024

Scheduler will run into race conditions on large scale clusters #106361

Scheduler will run into race conditions on large scale clusters #106361

Comments

ahg-g commented Nov 11, 2021

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented Nov 11, 2021

alculquicondor commented Nov 11, 2021

ahg-g commented Nov 11, 2021

ahg-g commented Nov 14, 2021

wojtek-t commented Nov 30, 2021

k8s-triage-robot commented Feb 28, 2022

k8s-triage-robot commented Mar 30, 2022

kerthcet commented Mar 31, 2022

k8s-triage-robot commented Jun 29, 2022

alculquicondor commented Jun 29, 2022

alculquicondor commented Jun 29, 2022

fabianberisha commented Jun 29, 2022

alculquicondor commented Aug 26, 2022

anson627 commented Oct 31, 2022 • edited

alculquicondor commented Oct 31, 2022

anson627 commented Oct 31, 2022

alculquicondor commented Oct 31, 2022

anson627 commented Oct 31, 2022 • edited

emiljanogj commented Dec 8, 2022 • edited

alculquicondor commented Dec 8, 2022

alculquicondor commented Dec 8, 2022

k8s-triage-robot commented Mar 8, 2023

alculquicondor commented Mar 8, 2023

k8s-triage-robot commented Jun 6, 2023

k8s-triage-robot commented Jul 6, 2023

wackxu commented Jul 20, 2023

alculquicondor commented Jul 20, 2023

k8s-triage-robot commented Jan 20, 2024

k8s-ci-robot commented Jan 20, 2024

anson627 commented Oct 31, 2022 •

edited

anson627 commented Oct 31, 2022 •

edited

emiljanogj commented Dec 8, 2022 •

edited