Topology aware scheduler plugin in kube-scheduler #2044

swatisehgal · 2020-10-01T11:45:34Z

Enhancement Description

One-line enhancement description (can be used as a release note):
Scheduler plugin that runs a simplfied version of topology manager logic in kube-scheduler to enable topology aware pod placement.
Kubernetes Enhancement Proposal: Simplified version of topology manager in kube-scheduler #1858
Discussion Link:
SIG Scheduling Meeting 20200702: Recording, Slides
SIG Node Meeting 20200811: Recording, Slides
Primary contacts (assignee):
Alexey Perevalov (@AlexeyPerevalov)
Responsible SIGs: sig-node and sig-scheduling
Enhancement target (which target equals to which milestone):
- Alpha release target (1.21)
- Beta release target (x.y)
- Stable release target (x.y)

swatisehgal · 2020-10-01T11:45:50Z

/sig node

swatisehgal · 2020-10-01T11:46:00Z

/sig scheduling

kikisdeliveryservice · 2020-10-01T17:50:08Z

@swatisehgal each KEP needs a separate issue.

swatisehgal · 2020-10-02T09:00:12Z

@swatisehgal each KEP needs a separate issue.

@kikisdeliveryservice Updated this issue and created another issue #2051

kikisdeliveryservice · 2020-10-02T18:23:19Z

Great! so the underlying PR from an enhancements reqs perspective is good and just needs 1 change to the dir structure.

kikisdeliveryservice · 2020-10-02T18:25:19Z

Marking this as 1.20 for now, since that's what the underlying enhancement has. If that's in error, just LMK!

swatisehgal · 2020-10-05T10:06:29Z

@kikisdeliveryservice As per conversation with Derek last week, we are not able to target this for 1.20. So 1.21 is correct.

fejta-bot · 2021-01-04T19:34:51Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

swatisehgal · 2021-01-05T10:44:17Z

/remove-lifecycle stale

fejta-bot · 2021-04-05T11:01:07Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-05-05T11:43:00Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot · 2021-06-04T12:09:43Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-06-04T12:09:49Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

swatisehgal · 2021-06-10T07:51:53Z

/reopen
/remove-lifecycle rotten

k8s-ci-robot · 2021-06-10T07:51:58Z

@swatisehgal: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2021-09-08T08:16:52Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

catblade · 2021-09-24T16:56:00Z

@swatisehgal what can we do to help on this? target was 1.21?

k8s-triage-robot · 2021-10-24T17:00:48Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

catblade · 2021-10-25T04:29:58Z

@swatisehgal ping

swatisehgal · 2021-10-26T10:02:41Z

/remove-lifecycle rotten

swatisehgal · 2021-10-26T10:13:43Z

@catblade We have been focusing on the out of tree solution enablement of Topology aware Scheduling. The two main components are:

Topology aware Scheduler plugin
Node Resource Topology is now part of https://github.com/kubernetes-sigs/scheduler-plugins repo.
nfd-topology-updater in Node feature Discovery
Introducing NFD Topology Updater exposing Resource hardware Topology info through CRs kubernetes-sigs/node-feature-discovery#525 was merged recently but there is a lot of work that needs to be done. Please refer to the comment here for information on more work that is either in progress or needs to be done in NFD.

Some of the items we can use help with are:

Watcher implementation (K8s side and/or NFD topology-updater side). Here is some context on this:

We have enhanced the Pod Resource API with “List” and “GetAllocatableResources” endpoints in order to account for allocated resources but the API needs further improvement because both the endpoints require the monitoring application to poll the kubelet. If the monitoring application has a too slow monitoring loop, the scheduler gets likely stale information; on the other hand, if the monitoring application monitors very frequently, it adds extra load to the kubelet (and to the system in general). To overcome this limitation, we need to add a Watch endpoint that reports a stream of events to the monitoring application, both when resource allocation changes (when pods are created or deleted) or if the resource availability changes (if new device plugins are added or deleted). Here is our initial POC on this. We were looking for this to be part of Kubernetes 1.24 release. Please refer to the link here on Kubernetes release timelines
Alternatively, we can obtain notification events for CRI runtime. Please refer to the discussion about this here.
The monitoring application which in our case is NFD-topology Updater in NFD would also need modifications to be able update the CRs on every pod creation/deletion event as opposed to the current timer based approach.

Topology aware scheduling testing at scale. This work ties in to the value proposition of Topology aware Scheduling and we are looking to gain insight into how this solution performs in a large scale cluster.

catblade · 2021-10-26T16:25:27Z

@swatisehgal I must be missing some links re 1) and first bullet on items we can help on.

catblade · 2021-11-03T12:34:49Z

@swatisehgal re-ping for links above.

swatisehgal · 2021-11-08T15:54:11Z

@catblade Updated the comment above with the relevant links. Please refer to some additional links below:

Issue: #2043
Initial Enhancement proposal: #1884

k8s-triage-robot · 2022-02-06T16:22:39Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-03-08T17:03:26Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-04-07T17:07:14Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-04-07T17:08:31Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 1, 2020

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 1, 2020

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Oct 1, 2020

This was referenced Oct 1, 2020

Simplified version of topology manager in kube-scheduler #1858

Closed

Topology aware resource provisioning daemon #1870

Closed

swatisehgal changed the title ~~Topology Aware Scheduling in kubernetes~~ Topology aware scheduler plugin in kube-scheduler Oct 2, 2020

kikisdeliveryservice added the tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team label Oct 2, 2020

kikisdeliveryservice added this to the v1.20 milestone Oct 2, 2020

kikisdeliveryservice added tracked/no Denotes an enhancement issue is NOT actively being tracked by the Release Team and removed tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team labels Oct 5, 2020

kikisdeliveryservice removed this from the v1.20 milestone Oct 5, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 5, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 5, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 5, 2021

k8s-ci-robot closed this as completed Jun 4, 2021

k8s-ci-robot reopened this Jun 10, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 10, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 8, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 24, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 26, 2021

pohly mentioned this issue Feb 3, 2022

KEP-3063: dynamic resource allocation #3064

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 6, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 8, 2022

k8s-ci-robot closed this as completed Apr 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topology aware scheduler plugin in kube-scheduler #2044

Topology aware scheduler plugin in kube-scheduler #2044

swatisehgal commented Oct 1, 2020 •

edited

swatisehgal commented Oct 1, 2020

swatisehgal commented Oct 1, 2020

kikisdeliveryservice commented Oct 1, 2020

swatisehgal commented Oct 2, 2020

kikisdeliveryservice commented Oct 2, 2020

kikisdeliveryservice commented Oct 2, 2020

swatisehgal commented Oct 5, 2020 •

edited

fejta-bot commented Jan 4, 2021

swatisehgal commented Jan 5, 2021

fejta-bot commented Apr 5, 2021

fejta-bot commented May 5, 2021

fejta-bot commented Jun 4, 2021

k8s-ci-robot commented Jun 4, 2021

swatisehgal commented Jun 10, 2021

k8s-ci-robot commented Jun 10, 2021

k8s-triage-robot commented Sep 8, 2021

catblade commented Sep 24, 2021

k8s-triage-robot commented Oct 24, 2021

catblade commented Oct 25, 2021

swatisehgal commented Oct 26, 2021 •

edited

swatisehgal commented Oct 26, 2021 •

edited

catblade commented Oct 26, 2021

catblade commented Nov 3, 2021

swatisehgal commented Nov 8, 2021

k8s-triage-robot commented Feb 6, 2022

k8s-triage-robot commented Mar 8, 2022

k8s-triage-robot commented Apr 7, 2022

k8s-ci-robot commented Apr 7, 2022

Topology aware scheduler plugin in kube-scheduler #2044

Topology aware scheduler plugin in kube-scheduler #2044

Comments

swatisehgal commented Oct 1, 2020 • edited

Enhancement Description

swatisehgal commented Oct 1, 2020

swatisehgal commented Oct 1, 2020

kikisdeliveryservice commented Oct 1, 2020

swatisehgal commented Oct 2, 2020

kikisdeliveryservice commented Oct 2, 2020

kikisdeliveryservice commented Oct 2, 2020

swatisehgal commented Oct 5, 2020 • edited

fejta-bot commented Jan 4, 2021

swatisehgal commented Jan 5, 2021

fejta-bot commented Apr 5, 2021

fejta-bot commented May 5, 2021

fejta-bot commented Jun 4, 2021

k8s-ci-robot commented Jun 4, 2021

swatisehgal commented Jun 10, 2021

k8s-ci-robot commented Jun 10, 2021

k8s-triage-robot commented Sep 8, 2021

catblade commented Sep 24, 2021

k8s-triage-robot commented Oct 24, 2021

catblade commented Oct 25, 2021

swatisehgal commented Oct 26, 2021 • edited

swatisehgal commented Oct 26, 2021 • edited

catblade commented Oct 26, 2021

catblade commented Nov 3, 2021

swatisehgal commented Nov 8, 2021

k8s-triage-robot commented Feb 6, 2022

k8s-triage-robot commented Mar 8, 2022

k8s-triage-robot commented Apr 7, 2022

k8s-ci-robot commented Apr 7, 2022

swatisehgal commented Oct 1, 2020 •

edited

swatisehgal commented Oct 5, 2020 •

edited

swatisehgal commented Oct 26, 2021 •

edited

swatisehgal commented Oct 26, 2021 •

edited