Adding a suspend field to the dask operator #701

kannon92 · 2023-04-24T21:50:38Z

Kubernetes has started adding ways to add queueing capabilities into Kubernetes. The entrypoint for enabling queueing can be by implementing the suspend field.

The BatchJob API contains these field in kubernetes upstream but custom CRDs need to implement suspend schematics for queueing.

There is some work in Kueue for adding suspend capabilities to RayJob and I imagine it would be similar for this project.

Relevant PR for RayJob: ray-project/kuberay#926
Kueue PR to incorporate RayJob: kubernetes-sigs/kueue#667

Documentation for suspend in jobs: https://kubernetes.io/docs/concepts/workloads/controllers/job/#suspending-a-job

I think it would make sense to add it as a DaskJob but there could be a reason to implement queueing in other areas also?

jacobtomlinson · 2023-04-25T10:00:28Z

This sounds great.

One of the goals on our roadmap is to stop manipulating Pods directly wherever possible and switch to higher-level abstractions. Today a DaskJob is comprised of a Pod and a DaskCluster and we have some lifecycle hooks that create/delete the DaskCluster based on the status of the Pod.

Perhaps we should replace the Pod with a Job and include the spec.suspended field in our lifecycle hooks?

I have a few thoughts/questions about how this would behave. Currently, we create the DaskCluster and Pod in the DaskJob creation hook and then delete the DaskCluster when the Pod status is set to Completed or Failed.

If we switch to a Job we could do the same thing, which would allow other configurations such as parallelism to be set if folks want that (although having many clients sharing a single Dask cluster is not generally recommended). We could ensure the DaskCluster doesn't get created until the Job spec.suspended is set to false.

What would happen if the Job gets suspended halfway through? Should we delete the DaskCluster and recreate it when the Job resumes? If so we would lose any state that the DaskCluster had in memory. This is probably fine but should be well documented.

kannon92 · 2023-04-26T17:42:33Z

One of the goals on our roadmap is to stop manipulating Pods directly wherever possible and switch to higher-level abstractions. Today a DaskJob is comprised of a Pod and a DaskCluster and we have some lifecycle hooks that create/delete the DaskCluster based on the status of the Pod.

Yay! Sounds like a good goal.

Perhaps we should replace the Pod with a Job and include the spec.suspended field in our lifecycle hooks?

That sounds like a good idea. Where in the code is the creation of the Pods?

I have a few thoughts/questions about how this would behave. Currently, we create the DaskCluster and Pod in the DaskJob creation hook and then delete the DaskCluster when the Pod status is set to Completed or Failed.

If we switch to a Job we could do the same thing, which would allow other configurations such as parallelism to be set if folks want that (although having many clients sharing a single Dask cluster is not generally recommended). We could ensure the DaskCluster doesn't get created until the Job spec.suspended is set to false.

You could always set Parallelism: 1 for the pod so you disallow this. You can always control how the Job gets created so you disallow sharing of the same DaskCluster.

What would happen if the Job gets suspended halfway through? Should we delete the DaskCluster and recreate it when the Job resumes? If so we would lose any state that the DaskCluster had in memory. This is probably fine but should be well documented.

This is how the Job code works in Kubernetes. We assume that if a job is suspended we would terminate the existing active pods.

jacobtomlinson · 2023-05-04T14:52:52Z

Where in the code is the creation of the Pods?

dask-kubernetes/dask_kubernetes/operator/controller/controller.py

Line 566 in 62209cb

async def daskjob_create_components(

You could always set Parallelism: 1 for the pod so you disallow this.

I don't necessarily want to constrain people. There may be valid use cases for parallelism, but all of the parallel Pods would share the same Dask cluster.

This is how the Job code works in Kubernetes. We assume that if a job is suspended we would terminate the existing active pods.

Ok perfect

kannon92 · 2023-05-12T16:15:17Z

One of the goals on our roadmap is to stop manipulating Pods directly wherever possible and switch to higher-level abstractions. Today a DaskJob is comprised of a Pod and a DaskCluster and we have some lifecycle hooks that create/delete the DaskCluster based on the status of the Pod.

One project that may interest you is https://github.com/kubernetes-sigs/jobset. I'm not sure of the architecture for DaskCluster and what kind of objects you are using there. I think Job could replace DaskJob but not sure how you will handle networking between DaskCluster and DaskJob.

jacobtomlinson · 2023-05-15T11:05:27Z

Thanks for sharing that. DaskCluster creates a Service and the Job (currently Pod) communicates via that.

jacobtomlinson added enhancement operator labels Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a suspend field to the dask operator #701

Adding a suspend field to the dask operator #701

kannon92 commented Apr 24, 2023

jacobtomlinson commented Apr 25, 2023

kannon92 commented Apr 26, 2023

jacobtomlinson commented May 4, 2023

kannon92 commented May 12, 2023

jacobtomlinson commented May 15, 2023

Adding a suspend field to the dask operator #701

Adding a suspend field to the dask operator #701

Comments

kannon92 commented Apr 24, 2023

jacobtomlinson commented Apr 25, 2023

kannon92 commented Apr 26, 2023

jacobtomlinson commented May 4, 2023

kannon92 commented May 12, 2023

jacobtomlinson commented May 15, 2023