New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically Sized Jobs - Scale Down #1852
base: main
Are you sure you want to change the base?
Conversation
Hi @vicentefb. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
f0f98d9
to
d49df00
Compare
0bcee5e
to
daa8d26
Compare
/ok-to-test |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vicentefb The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
b194b89
to
35ca648
Compare
test/integration/controller/jobs/raycluster/raycluster_controller_test.go
Show resolved
Hide resolved
60cbf6b
to
def03e9
Compare
/cc |
/retest |
@@ -698,7 +709,8 @@ func equivalentToWorkload(ctx context.Context, c client.Client, job GenericJob, | |||
jobPodSets := clearMinCountsIfFeatureDisabled(job.PodSets()) | |||
|
|||
if runningPodSets := expectedRunningPodSets(ctx, c, wl); runningPodSets != nil { | |||
if equality.ComparePodSetSlices(jobPodSets, runningPodSets) { | |||
jobSizeable, implementsSizable := job.(ReizableJobs) | |||
if equality.ComparePodSetSlices(jobPodSets, runningPodSets) || (implementsSizable && jobSizeable.IsResizable(wl)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is not enough.
Since we don't support upsizing, we should return false if upsizing happens here.
But overall, we should check that the shapes are the same (only counters changed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you are partially checking that inside IsResizable
, but we shouldn't be doing that in the integration.
It can be generic... just by looking at the podsets.
So, in this package, there could be a function isDownsized(oldPodsets, newPodSets []PodSet) bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave few comments.
pkg/util/testing/wrappers.go
Outdated
}} | ||
} | ||
|
||
func (w *WorkloadWrapper) ScaleDown() *WorkloadWrapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not combine to Scale(count int32)
?
pkg/util/testing/wrappers.go
Outdated
@@ -74,6 +74,29 @@ func MakeWorkload(name, ns string) *WorkloadWrapper { | |||
}} | |||
} | |||
|
|||
// MakeWorkload creates a wrapper for a Workload with a single | |||
// pod with a single container. | |||
func MakeWorkloadWithMorePodSets(name, ns string) *WorkloadWrapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little confusion about the function name, what do you mean More here, or specify the count directly?
@@ -620,7 +680,7 @@ func (r *WorkloadReconciler) Update(e event.UpdateEvent) bool { | |||
} | |||
}) | |||
} | |||
case prevStatus == admitted && status == admitted && !equality.Semantic.DeepEqual(oldWl.Status.ReclaimablePods, wl.Status.ReclaimablePods): | |||
case prevStatus == admitted && status == admitted && !equality.Semantic.DeepEqual(oldWl.Status.ReclaimablePods, wl.Status.ReclaimablePods) || r.isScaledDown(oldWl): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll verify the oldWl or newWl?
return false | ||
} | ||
podSetSize := len(wl.Spec.PodSets) | ||
for i := 1; i < podSetSize; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me to understand why index 0 not possible here?
n m patch error field not declared in schema commented out podSet immutability from workload webhook to be able to update that field added more comments clean code nit debuggin n m patch error field not declared in schema clean code n m patch error field not declared in schema commented out podSet immutability from workload webhook to be able to update that field added more comments clean code nit a cluster queue reconciliation fixed, it had to do with the infot totalrequests from admission inside the worklad go file working with scheduler cleaning code cleaning code cleaning cleaning cleaning integation test, but it messes up with parallelism test which should be expected updated parallelism it test updated wrappers kep removed Kep removed log lines clean code added a better conditional for updating the resize if the job is a RayCluster added Kind condition updated test and equivalentToWorkload condition added podset assigments check updated feature gate updated feature gate updating equivalentWorkload fixed lint removed changes from scheduler and workload controller testing updated workload controller reconciler to update spec and status nit update feature gate update variables made code more generic updated workload controller helper method typo nit addressed comments updated workload controller to use unuused quota updated integration test to work added unit test in workload controller changed naming to resizeable and fixed lint nit addressed comments
What type of PR is this?
/kind feature
What this PR does / why we need it:
This includes Phase 1 implementation for Dynamically Sized Jobs KEP #1851
Which issue(s) this PR fixes:
Part of #77
Special notes for your reviewer:
Scale Down only implementation for RayClusters
Does this PR introduce a user-facing change?