[CUDA] Enable use of subset Dataset in new CUDA version #5086

shiyu1994 · 2022-03-21T07:20:14Z

Summary

In #4630, we did compress the Dataset memory in CUDA even when the sampling rate is very small. While this is done in CPU version,

LightGBM/src/boosting/gbdt.cpp

Lines 253 to 262 in d4cdbcf

    
           if (!is_use_subset_) { 
        
             tree_learner_->SetBaggingData(nullptr, bag_data_indices_.data(), bag_data_cnt_); 
        
           } else { 
        
             // get subset 
        
             tmp_subset_->ReSize(bag_data_cnt_); 
        
             tmp_subset_->CopySubrow(train_data_, bag_data_indices_.data(), 
        
                                     bag_data_cnt_, false); 
        
             tree_learner_->SetBaggingData(tmp_subset_.get(), bag_data_indices_.data(), 
        
                                           bag_data_cnt_); 
        
           }

Motivation

Enable subset Dataset (physically compress Dataset in memory when low sampling rate) can speedup the training with GOSS or Bagging.

References

A related discussion, #4630 (comment)

shiyu1994 mentioned this issue Mar 21, 2022

[CUDA] New CUDA version Part 1 #4630

Merged

jameslamb mentioned this issue Apr 14, 2022

[RFC] 4.0.0 Release #5153

Closed

60 tasks

StrikerRUS mentioned this issue Jun 25, 2022

[CUDA] Initial work for boosting and evaluation with CUDA #5279

Merged

jameslamb added the feature request label Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Enable use of subset Dataset in new CUDA version #5086

[CUDA] Enable use of subset Dataset in new CUDA version #5086

shiyu1994 commented Mar 21, 2022

[CUDA] Enable use of subset Dataset in new CUDA version #5086

[CUDA] Enable use of subset Dataset in new CUDA version #5086

Comments

shiyu1994 commented Mar 21, 2022

Summary

Motivation

References