Gradient based sampling for external memory mode on GPU #5093

rongou · 2019-12-06T00:51:43Z

In GPU external memory mode, rely on gradient-based sampling to allow for bigger datasets. The final step for #4357.

The main idea is from https://arxiv.org/abs/1803.00841.

High level design:

At the beginning of each round, take the gradient pairs, perform weighted sampling without replacement based on the absolute value of the gradient.
Compact all the sampled rows in the DMatrix into a single ELLPACK page.
Construct the tree the same way as in-memory mode, using the compacted page that's kept in memory.
After the tree is constructed, loop through all the original pages to finalize the position of all the rows.
Repeat for each round.

On a generated synthetic dataset (1 million rows, 50 features), gradient-based sampling still works down to about 10% of the total rows sampled with loss of accuracy, while uniform random sampling only works at 50% of data, and completely fails to converge at 10%.

On a larger synthetic dataset (30 million rows, 200 features), CPU external memory mode actually runs out of memory on my 32GB desktop, while GPU external memory mode works fine on a Titan V (12GB).

Some benchmark numbers for the synthetic 1 million row, 50 feature dataset:

number of rounds: 500
max_depth = 0
max_leaves = 256
grow_policy = lossguide

Mode	Eval Error	Time (seconds)
CPU in-core	0.0083	508.40
CPU external memory	0.0096	513.58
GPU in-core	0.0086	31.13
GPU external memory, sample all rows	0.0093	46.30
GPU external memory, sample 50%	0.0081	124.44
GPU external memory, sample 10%	0.0085	121.29
GPU external memory, sample 5%	0.0098	120.48

@RAMitchell @trivialfis @sriramch

hcho3 · 2019-12-06T18:59:57Z

@rongou Can you provide a summary of what this pull request does? Why a new kind of sampling? To reduce the workload of making new releases, I'd like to start writing summaries each month, similar to https://discuss.tvm.ai/t/tvm-monthly-nov-2019/5038

trivialfis · 2019-12-06T19:11:50Z

@hcho3 I think it's this one: https://arxiv.org/abs/1901.09047

rongou · 2019-12-06T19:25:16Z

@hcho3 @trivialfis I'll write a more detailed summary once I'm confident that it's actually working properly. The main reason for this is that the naive implementation of external memory mode on GPU requires reading back all the pages for every tree node, which is pretty expensive since data are moved over the PCIe bus. By doing some kind of smart sampling and keeping the sampled page in GPU memory, we can hope to get reasonable performance without degrading the accuracy too much. The main idea is from https://arxiv.org/abs/1803.00841, but LightGBM also does something similar.

hcho3 · 2019-12-06T19:35:37Z

Thanks a lot for the paper references.

trivialfis

Just before reviewing the PR, could you provide a general pipeline as code comment, something similar to:

external data -> adaptor -> sparsepage -> ellpack page -> sampling algorithm -> sampled ?? page -> tree updater

trivialfis · 2019-12-06T19:30:03Z

include/xgboost/base.h

+
+  XGBOOST_DEVICE GradientPairInternal<T> operator/(float divider) const {
+    GradientPairInternal<T> g;
+    g.grad_ = grad_ / divider;


Be careful for 0 division.

Turns out don't really need these. Removed.

rongou · 2020-01-23T20:29:09Z

The code is ready, but looks like there is a failing test involving XGBRFClassifier. Need to do some debugging.

rongou · 2020-01-29T17:00:04Z

@RAMitchell @trivialfis I refactored the code, the existing behavior is preserved for uniform sampling. This PR actually reverts some of the changes made to gpu_hist in my last PR, so it's arguably less risky.

trivialfis · 2020-01-29T20:00:43Z

@rongou The next release should be really close. 1 last remaining blocker being model IO of Scikit-Learn interface. I will try to merge this PR once we can split up the branch. It should be fine as I believe we can make it for the next rapids release.

src/tree/updater_gpu_hist.cu

trivialfis · 2020-02-01T11:21:43Z

@RAMitchell I think it's ready for merging. WDYT?

RAMitchell · 2020-02-03T03:56:23Z

src/tree/gpu_hist/gradient_based_sampler.cu

+GradientBasedSample ExternalMemoryNoSampling::Sample(common::Span<GradientPair> gpair,
+                                                     DMatrix* dmat) {
+  if (!page_concatenated_) {
+    // Concatenate all the external memory ELLPACK pages into a single in-memory page.


Why is it even possible to do this? Seems redundant to allow a user to build external memory pages only to concatenate them.

I think this option is here mostly for completeness' sake. If the whole dataset fits in GPU memory, then presumably you wouldn't want to use external memory; if it doesn't fit, you probably want to play around with sampling.

I have noticed that writing out external pages and then concatenating them together might allow you to train on slightly larger datasets versus keeping everything in memory, probably because of lower working memory requirement. Not sure how useful that is though.

trivialfis · 2020-02-03T14:44:10Z

Restarted the test. Will merge unless @rongou has other comments regarding @RAMitchell 's review.

codecov-io · 2020-02-03T15:13:05Z

Codecov Report

Merging #5093 into master will increase coverage by 5.94%.
The diff coverage is 92.72%.

@@            Coverage Diff             @@
##           master    #5093      +/-   ##
==========================================
+ Coverage   77.89%   83.83%   +5.94%     
==========================================
  Files          11       11              
  Lines        2330     2407      +77     
==========================================
+ Hits         1815     2018     +203     
+ Misses        515      389     -126

Impacted Files	Coverage Δ
python-package/xgboost/compat.py	`53.95% <83.33%> (+5.17%)`	⬆️
python-package/xgboost/sklearn.py	`90.88% <97.29%> (+1.66%)`	⬆️
python-package/xgboost/rabit.py	`67.1% <0%> (+3.94%)`	⬆️
python-package/xgboost/tracker.py	`93.97% <0%> (+15.66%)`	⬆️
python-package/xgboost/dask.py	`90.3% <0%> (+28.42%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 97ae33d...7fd7c31. Read the comment docs.

rongou added 12 commits December 5, 2019 16:23

add skeleton gradient-based sampler

51b50da

change gpu_hist to use sampling in external memory mode

ebd67c1

add failing tests

a2c446c

wip: poisson sampling

9e91871

sample and scale gradient pairs

ed322cc

calculate max number of sample rows

9377014

add sampler constructor

f26415c

collect all pages in memory if they fit

f2dd928

optimize finalize position

2e5494c

done with sampling

7fca606

add some docs

f661df0

Merge branch 'master' into gradient-based-sampler

6513f8a

rongou added 2 commits December 6, 2019 11:12

formatting

14361af

explicit constructor

70276f1

trivialfis reviewed Dec 6, 2019

View reviewed changes

rongou added 11 commits December 6, 2019 12:16

no need for gmock

d7770b4

test ellpackpage copy and compact

6a29c38

use subsample to control gradient sampler

967ff16

Merge branch 'master' into gradient-based-sampler

494b179

Merge branch 'master' into gradient-based-sampler

ecd2419

implement sequential poisson sampling

73ba5af

fix compact bug

d2f2f69

fix cpp test

827c988

Merge branch 'master' into gradient-based-sampler

9350ec5

Merge branch 'master' into gradient-based-sampler

4482181

finally working

857c9c7

rongou added 4 commits January 22, 2020 10:45

Merge branch 'master' into gradient-based-sampler

c9eb5c9

calculate threshold

9680a69

tweak test

c93f20d

Merge branch 'master' into gradient-based-sampler

d684148

rongou added 11 commits January 24, 2020 10:18

more accurate threshold

09864ed

Merge branch 'master' into gradient-based-sampler

7675a9e

tweak test tolerance

f8b7dbf

wip: refactor the code to disintangle sampling methods

3aaae89

Merge branch 'master' into gradient-based-sampler

a83f13e

done with refactoring

5cbca76

fix tests

7cf9110

Merge branch 'master' into gradient-based-sampler

71b21c6

release device memory

3a734a9

remove scaling in uniform sampling

55b36f2

Merge branch 'master' into gradient-based-sampler

97ae33d

trivialfis reviewed Jan 31, 2020

View reviewed changes

src/tree/updater_gpu_hist.cu Show resolved Hide resolved

rongou added 2 commits January 31, 2020 11:53

Merge branch 'master' into gradient-based-sampler

df96394

revert rabit

7fd7c31

RAMitchell approved these changes Feb 3, 2020

View reviewed changes

trivialfis merged commit e4b74c4 into dmlc:master Feb 4, 2020

trivialfis mentioned this pull request Feb 5, 2020

Possible GPU memory usage increase. #5285

Closed

hcho3 mentioned this pull request Feb 21, 2020

[Roadmap] 1.1.0 Roadmap #5337

Closed

12 tasks

lock bot locked as resolved and limited conversation to collaborators May 5, 2020

rongou deleted the gradient-based-sampler branch November 18, 2022 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient based sampling for external memory mode on GPU #5093

Gradient based sampling for external memory mode on GPU #5093

rongou commented Dec 6, 2019 •

edited

hcho3 commented Dec 6, 2019 •

edited

trivialfis commented Dec 6, 2019

rongou commented Dec 6, 2019

hcho3 commented Dec 6, 2019

trivialfis left a comment

trivialfis Dec 6, 2019

rongou Dec 16, 2019

rongou commented Jan 23, 2020

rongou commented Jan 29, 2020

trivialfis commented Jan 29, 2020

trivialfis commented Feb 1, 2020

RAMitchell Feb 3, 2020

rongou Feb 3, 2020

trivialfis commented Feb 3, 2020

codecov-io commented Feb 3, 2020

Gradient based sampling for external memory mode on GPU #5093

Gradient based sampling for external memory mode on GPU #5093

Conversation

rongou commented Dec 6, 2019 • edited

hcho3 commented Dec 6, 2019 • edited

trivialfis commented Dec 6, 2019

rongou commented Dec 6, 2019

hcho3 commented Dec 6, 2019

trivialfis left a comment

Choose a reason for hiding this comment

trivialfis Dec 6, 2019

Choose a reason for hiding this comment

rongou Dec 16, 2019

Choose a reason for hiding this comment

rongou commented Jan 23, 2020

rongou commented Jan 29, 2020

trivialfis commented Jan 29, 2020

trivialfis commented Feb 1, 2020

RAMitchell Feb 3, 2020

Choose a reason for hiding this comment

rongou Feb 3, 2020

Choose a reason for hiding this comment

trivialfis commented Feb 3, 2020

codecov-io commented Feb 3, 2020

Codecov Report

rongou commented Dec 6, 2019 •

edited

hcho3 commented Dec 6, 2019 •

edited