Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoost_Ray Train High Memory use #308

Open
chadbreece opened this issue Mar 28, 2024 · 1 comment
Open

XGBoost_Ray Train High Memory use #308

chadbreece opened this issue Mar 28, 2024 · 1 comment

Comments

@chadbreece
Copy link

I am trying to train on a 34Ggb (result from df.info) dataset over 8 GPUs w/ 396gb of RAM. I can only get away with training on half the dataset currently without OOM errors killing the process. Each GPU ends up loaded with ~10gb of data. Does that mean the actual data size is 160gb (8 GPUs * 10gb * 2 halves to the data).

Any advice on how to train on so much data using. XGBoost Ray would be helpful.

@showkeyjar
Copy link

same problem.

On such a large-capacity GPU, the amount of data that xgboost_ray can load normally is not even as good as using a single GPU directly with xgboost. xgboost_ray often occurs cuda OOM and fails to take advantage of multiple GPUs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants