Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] Fix running on k8s. #6343

Merged
merged 9 commits into from Nov 11, 2020
Merged

[dask] Fix running on k8s. #6343

merged 9 commits into from Nov 11, 2020

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Nov 4, 2020

Close #5765

  • Use socket from Python.
  • Only use workers associated with input data instead of all workers from client. This avoids accessing client.scheduler_info()['workers'].
  • Avoid calling client.gather inside task function.

XGBoost still cannot be run on GKE yet. See dask/dask#6800 for details.

* Avoid accessing `scheduler_info()['workers']`.
* Avoid calling `client.gather` inside task.
* Avoid using `client.scheduler_address`.
@jameslamb
Copy link
Contributor

Hey @trivialfis , I'm really excited to see this! I have easy access to Dask clusters on EKS, so if you want me to test anything there I'd be happy to.

@trivialfis
Copy link
Member Author

@jameslamb Thanks for the offer! So far I managed to get some examples working on GKE. But automated pytest is not on the table since everything just times out while waiting for scheduling. Feel free to test it out, it might help uncovering some other unknown issues on xgboost and dask and be useful to others. Also the issue is not reproducible on local deployment, so they are cluster specific. Your tests are definitely gonna be helpful!

I'm not sure about the performance impact and memory usage impact of current workaround, should be trivial even if there's any but I need to try it out.

@trivialfis trivialfis merged commit 6e12c2a into dmlc:master Nov 11, 2020
@trivialfis trivialfis deleted the dask-fix-k8s branch November 11, 2020 10:04
@trivialfis
Copy link
Member Author

A refactoring is coming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Host IP resolution problem w/ dask on kubernetes or dask-gateway
4 participants