Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queueing pods belong to the same podGroup as an unit #661

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kerthcet
Copy link
Contributor

@kerthcet kerthcet commented Nov 9, 2023

What type of PR is this?

/kind document

What this PR does / why we need it:

Which issue(s) this PR fixes:

xref: #658

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

Signed-off-by: kerthcet <kerthcet@gmail.com>
@k8s-ci-robot
Copy link
Contributor

@kerthcet: The label(s) kind/document cannot be applied, because the repository doesn't have them.

In response to this:

What type of PR is this?

/kind document

What this PR does / why we need it:

Which issue(s) this PR fixes:

xref: #658

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 9, 2023
Copy link

netlify bot commented Nov 9, 2023

Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.

Name Link
🔨 Latest commit 37ae422
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-scheduler-plugins/deploys/654c95ce57c1db000750864b

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kerthcet
Once this PR has been reviewed and has the lgtm label, please assign denkensk for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 9, 2023
@kerthcet
Copy link
Contributor Author

kerthcet commented Nov 9, 2023

/kind documentation

@k8s-ci-robot k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Nov 9, 2023
@kerthcet
Copy link
Contributor Author

kerthcet commented Nov 9, 2023

An initial thought, the first commit is just a format, see the 2nd one as the main change 37ae422.

Signed-off-by: kerthcet <kerthcet@gmail.com>
@kerthcet
Copy link
Contributor Author

@Huang-Wei can you help to take a look?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 9, 2024
@tenzen-y
Copy link
Member

tenzen-y commented Apr 5, 2024

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 5, 2024

// ScheduleTimeoutSeconds defines the maximal time of members/tasks to wait before run the pod group;
ScheduleTimeoutSeconds *int32 `json:"scheduleTimeoutSeconds,omitempty"`
// MinMember defines the minimal number of members/tasks to run the pod group;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you either change the indent back to tab, or 4 spaces?

Comment on lines +144 to +146
For Pods belong to a same podGroup should be popped out of the scheduling queue one by one,
which requires an efficient and wise algorithm. This is quite different with plain Pods,
so we need to reimplement the QueueSort plugin.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Pods within the same PodGroup, they should be dequeued one by one from the scheduling queue, necessitating an efficient and strategic algorithm. This requirement differs significantly from plain Pods, thus prompting the need for a reimplementation of the QueueSort plugin.

Basically, we need to watch for all the podGroups and maintain a cache for them
in coscheduling's core package:

- when podGroup created, we'll create a `queuedPodGroup` in the cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when a podGroup gets created, create a queuedPodGroup in the cache

Comment on lines +159 to +162
// ...
// <new>
// key is the podGroup name, value is the queued podGroup info.
podGroups map[string]*queuedPodGroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ...
// <new>
// key is the podGroup name, value is the queued podGroup info.
podGroups map[string]*queuedPodGroup
// ...
// <new>
// key is the podGroup name, value is the queued podGroup info.
podGroups map[string]*queuedPodGroup

(applies elsewhere in the code snippets)

type PodGroupManager struct {
// ...
// <new>
// key is the podGroup name, value is the queued podGroup info.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key should be <namespace name>/<podGroup name>

type queuedPodGroup struct {
// Timestamp is the podGroup's queued time.
// - timestamp will be initialized when the first pod enqueues
// - timestamp will be renewed when a pod re-enqueues for failing the scheduling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you reword a bit?

do you mean "... renewed once a pod that belongs to a previously failed podGroup gets re-enqueued"?

the former podGroup was rebuilt, instead of waiting for the latter podGroup scheduling first, we hope the pod recover in prior,
although this may lead to the latter podGroup failed in scheduling.

Another risk is we'll initialize or renew the timestamp in the `Less()` function, which will lead to the performance degradation in queueing,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a big no. We should not mutate timestamp in a function that relies on timestamp to do sorting. It'd cause unstable sorting result and make the sorting behavior unpredictable and hard to debug.

to the same podGroup will be placed together as a unit.
3. Schedule the pods of the podGroup one by one, if anyone fails in the scheduling cycle, we'll set the status=queueingFailed in PostFilter,
and the podGroup will enter into the backoff queue if configured.
4. When the failed Pod re-enqueues, we'll renew the queuedPodGroup timestamp based on the `queueingFailed` status and also set the status to `queueing`, the new timestamp should be `now+backoffTime` to avoid breaking the podGroup who starts scheduling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean to renew during QueueSort()? If so, we shouldn't do that. See a comment below.

I'd suggest is to renew the timestamp at the end when a group of pods failed as a unit; instead of the beginning when the group starts its next scheduling attempt.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 23, 2024
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants