New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] xgb.dask.train() fails on dask-kubernetes cluster #6390
Comments
I think you have an outdated |
Yep you're right, I ran conda uninstall -y xgboost
pip uninstall -y xgboost I'll remove the old library from my image and try again, thanks. |
Thanks for testing! Feel free to let me know if there's anything I can help. I will close this one now as this specific issue is resolved. |
@trivialfis I'm very happy to tell you that after I was able to clear out my old Thanks for all the great work!!! 🎉 |
I tried tonight to test the recent
xgboost.dask
changes on adask-kubernetes
cluster on EKS (per #6343 (comment)).Unfortunately, I ran into this error:
Reproduction Information
training code
I omitted the code I used to create my client (
...[CLIENT CODE]...
) because it uses adask-kubernetes
cluster provisioned with a commercial product. I can see that work is getting scheduled onto that cluster when theDaskDMatrix
is set up and when training starts, so I'm confident that that isn't the issue.I installed
xgboost
by cloning from latestmaster
(https://github.com/dmlc/xgboost/tree/fcfeb4959c6e361f2fd1cd18c3b61b598dc205ae).full stacktrace
output of conda info`
Other Notes
I'll try to come up with a reproducible example using
dask-cloudprovider
so that' it's 100% reproducible (no redacted code).The text was updated successfully, but these errors were encountered: