Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Enable use of subset Dataset in new CUDA version #5086

Open
Tracked by #5153
shiyu1994 opened this issue Mar 21, 2022 · 0 comments
Open
Tracked by #5153

[CUDA] Enable use of subset Dataset in new CUDA version #5086

shiyu1994 opened this issue Mar 21, 2022 · 0 comments

Comments

@shiyu1994
Copy link
Collaborator

Summary

In #4630, we did compress the Dataset memory in CUDA even when the sampling rate is very small. While this is done in CPU version,

if (!is_use_subset_) {
tree_learner_->SetBaggingData(nullptr, bag_data_indices_.data(), bag_data_cnt_);
} else {
// get subset
tmp_subset_->ReSize(bag_data_cnt_);
tmp_subset_->CopySubrow(train_data_, bag_data_indices_.data(),
bag_data_cnt_, false);
tree_learner_->SetBaggingData(tmp_subset_.get(), bag_data_indices_.data(),
bag_data_cnt_);
}

Motivation

Enable subset Dataset (physically compress Dataset in memory when low sampling rate) can speedup the training with GOSS or Bagging.

References

A related discussion, #4630 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants