Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding a suspend field to the dask operator #701

Open
kannon92 opened this issue Apr 24, 2023 · 5 comments
Open

Adding a suspend field to the dask operator #701

kannon92 opened this issue Apr 24, 2023 · 5 comments

Comments

@kannon92
Copy link

Kubernetes has started adding ways to add queueing capabilities into Kubernetes. The entrypoint for enabling queueing can be by implementing the suspend field.

The BatchJob API contains these field in kubernetes upstream but custom CRDs need to implement suspend schematics for queueing.

There is some work in Kueue for adding suspend capabilities to RayJob and I imagine it would be similar for this project.

Relevant PR for RayJob: ray-project/kuberay#926
Kueue PR to incorporate RayJob: kubernetes-sigs/kueue#667

Documentation for suspend in jobs: https://kubernetes.io/docs/concepts/workloads/controllers/job/#suspending-a-job

I think it would make sense to add it as a DaskJob but there could be a reason to implement queueing in other areas also?

@jacobtomlinson
Copy link
Member

This sounds great.

One of the goals on our roadmap is to stop manipulating Pods directly wherever possible and switch to higher-level abstractions. Today a DaskJob is comprised of a Pod and a DaskCluster and we have some lifecycle hooks that create/delete the DaskCluster based on the status of the Pod.

Perhaps we should replace the Pod with a Job and include the spec.suspended field in our lifecycle hooks?

I have a few thoughts/questions about how this would behave. Currently, we create the DaskCluster and Pod in the DaskJob creation hook and then delete the DaskCluster when the Pod status is set to Completed or Failed.

If we switch to a Job we could do the same thing, which would allow other configurations such as parallelism to be set if folks want that (although having many clients sharing a single Dask cluster is not generally recommended). We could ensure the DaskCluster doesn't get created until the Job spec.suspended is set to false.

What would happen if the Job gets suspended halfway through? Should we delete the DaskCluster and recreate it when the Job resumes? If so we would lose any state that the DaskCluster had in memory. This is probably fine but should be well documented.

@kannon92
Copy link
Author

One of the goals on our roadmap is to stop manipulating Pods directly wherever possible and switch to higher-level abstractions. Today a DaskJob is comprised of a Pod and a DaskCluster and we have some lifecycle hooks that create/delete the DaskCluster based on the status of the Pod.

Yay! Sounds like a good goal.

Perhaps we should replace the Pod with a Job and include the spec.suspended field in our lifecycle hooks?

That sounds like a good idea. Where in the code is the creation of the Pods?

I have a few thoughts/questions about how this would behave. Currently, we create the DaskCluster and Pod in the DaskJob creation hook and then delete the DaskCluster when the Pod status is set to Completed or Failed.

If we switch to a Job we could do the same thing, which would allow other configurations such as parallelism to be set if folks want that (although having many clients sharing a single Dask cluster is not generally recommended). We could ensure the DaskCluster doesn't get created until the Job spec.suspended is set to false.

You could always set Parallelism: 1 for the pod so you disallow this. You can always control how the Job gets created so you disallow sharing of the same DaskCluster.

What would happen if the Job gets suspended halfway through? Should we delete the DaskCluster and recreate it when the Job resumes? If so we would lose any state that the DaskCluster had in memory. This is probably fine but should be well documented.

This is how the Job code works in Kubernetes. We assume that if a job is suspended we would terminate the existing active pods.

@jacobtomlinson
Copy link
Member

Where in the code is the creation of the Pods?

async def daskjob_create_components(

You could always set Parallelism: 1 for the pod so you disallow this.

I don't necessarily want to constrain people. There may be valid use cases for parallelism, but all of the parallel Pods would share the same Dask cluster.

This is how the Job code works in Kubernetes. We assume that if a job is suspended we would terminate the existing active pods.

Ok perfect

@kannon92
Copy link
Author

One of the goals on our roadmap is to stop manipulating Pods directly wherever possible and switch to higher-level abstractions. Today a DaskJob is comprised of a Pod and a DaskCluster and we have some lifecycle hooks that create/delete the DaskCluster based on the status of the Pod.

One project that may interest you is https://github.com/kubernetes-sigs/jobset. I'm not sure of the architecture for DaskCluster and what kind of objects you are using there. I think Job could replace DaskJob but not sure how you will handle networking between DaskCluster and DaskJob.

@jacobtomlinson
Copy link
Member

Thanks for sharing that. DaskCluster creates a Service and the Job (currently Pod) communicates via that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants