Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] GPU memory efficiency #6327

Closed
pseudotensor opened this issue Oct 30, 2020 · 12 comments
Closed

[QUESTION] GPU memory efficiency #6327

pseudotensor opened this issue Oct 30, 2020 · 12 comments

Comments

@pseudotensor
Copy link
Contributor

https://news.developer.nvidia.com/gpu-accelerated-spark-xgboost/

mentions:

Efficient GPU memory utilization: XGBoost requires that data fit into memory which creates a restriction on data size using either a single GPU or distributed multi-GPU multi-node training. The latest release has improved GPU memory utilization by 5X, i.e., users now can now train with data that is five times the size as compared to the first version. This is one of the critical factors to improve total cost of training without impacting performance.

In this something only in xgboost4j? Or is it also in dmlc xgboost?

I'm asking because in playing around with multi-GPU using dask, the memory use is quite high. 37M rows by 20 features runs out of GPU memory on 2 11GB GPUs. If there was really 5X to gain, that would be incredible. I don't see any such significant changes in GPU memory usage since the first GPU implementations by @RAMitchell . @teju85

@trivialfis
Copy link
Member

trivialfis commented Oct 30, 2020

On dask, you can try the DaskDeviceQuantileDMatrix if your input is from GPU.

@trivialfis
Copy link
Member

Preferably with nightly build.

@trivialfis
Copy link
Member

Feel free to close if DDQDM helps.

@pseudotensor
Copy link
Contributor Author

Thanks will try. Is there some specific thing spark contributors did that allowed 5X memory improvement that dask has not yet done?

@trivialfis
Copy link
Member

No. I implemented DDQDM based on quantile sketching algorithm recently. The post you linked is old.

@pseudotensor
Copy link
Contributor Author

Sorry, I just mean, what is the 5X GPU memory improvement they are referring to?

@pseudotensor
Copy link
Contributor Author

Also, with the scikit-learn API is the same option possible? Also, maybe good idea to allow scikit-learn API to accept dmatrix as X if not already possible.

@trivialfis
Copy link
Member

trivialfis commented Oct 30, 2020

Sorry, I just mean, what is the 5X GPU memory improvement they are referring to?

I think it meant comparing converting GPU dataframe to XGBoost DMatrix directly, and their old approach of saving memory.

Also, with the scikit-learn API is the same option possible?

Right now no.

maybe good idea to allow scikit-learn API to accept dmatrix as X

Thanks for the suggestion, that's a possible option. Or maybe we can dispatch based on tree method and use DDQDM internally for gpu_hist by default. I'm not sure yet.

@pseudotensor
Copy link
Contributor Author

Ya, if there was another parameter in constructor for scikit-learn API or xgboost parameters to choose that option, that would work and be aligned with how have to choose gpu_hist vs. hist, default of gpu_predictor instead of cpu_predictor (AFAIK with rapids/cudf can't switch to cpu_predictor), gpu_id = 0 as default, etc.

So as parameter or as default sounds reasonable.

@pseudotensor
Copy link
Contributor Author

When is 1.3.0 release planned? I couldn't find out the plan, only old roadmaps. The notes on releases says the plan of when to release is made once prior release is out. So I suppose there is a plan for 1.3.0? It seems to have good fixes and features for dask.

@hcho3
Copy link
Collaborator

hcho3 commented Oct 30, 2020

@pseudotensor Here is the roadmap for 1.3.0: #6031. We will make the release once all the blocking issues are addressed.

@trivialfis
Copy link
Member

Closing as the integer overflow issue is resolved and now the DDQDM doesn't have known limitation. It's very close to inplace data initialization. If there's better idea on how to stream data for gradient boosting, that will be another topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants