Refactor split valuation kernel #8073

RAMitchell · 2022-07-14T11:34:10Z

Increases split evaluation kernel throughput on my V100 from 150 GB/s to 400 GB/s.

These gains come from:

using cuda fast division operations
avoiding shared memory broadcasts, instead using warp shuffle.
not computing parents gain for every possible split (this is constant per node)

The fastest possible throughput I have achieved is 700 GB/s, however it becomes a little complicated to get this fast. Current version I think is fast enough, placing the bottleneck entirely on histogram computation.

trivialfis

The performance improvement looks exciting! Some questions in the comments.

trivialfis · 2022-07-21T11:24:16Z

src/tree/split_evaluator.h

-    XGBOOST_DEVICE float
-    CalcGainGivenWeight(ParamT const &p, tree::GradStats const& stats, float w) const {
+    // Fast floating point division instruction on device
+    XGBOOST_DEVICE float Divide(float a, float b) const {


Can we extract this as an independent function?

I'm not expecting to use it anywhere else at this moment, so I think it should stay unless you have something specific in mind A kernel needs to be heavily bottlenecked by arithmetic before this makes a difference, and I can't think of other places in xgboost.

src/tree/gpu_hist/evaluate_splits.cu

RAMitchell · 2022-07-21T11:52:53Z

Depth 8 benchmarks:

dataset	master	eval
airline	90.88661192	89.93374479
bosch	12.88504644	12.46627029
covtype	18.01187677	17.60098921
epsilon	46.48386218	43.91058178
fraud	1.315704659	1.237399099
higgs	17.19260674	17.27932671
year	7.047273015	6.841409724

RAMitchell added 10 commits July 5, 2022 08:09

Use single precision, use pointers instead of span.

197ce75

Reorganise explit evaluation code.

f3abe81

Avoid copying gradstats

de8d6da

Better reduce

4fe386c

72 Register version.

31e4845

Lint, remove benchmark

dbcadbc

Merge branch 'master' of github.com:dmlc/xgboost into eval3

ec024e3

Clang tidy

c63f94e

Fix test failure.

a5cc7c1

Guard against NaN in gain calculation.

fe08cfb

trivialfis reviewed Jul 21, 2022

View reviewed changes

RAMitchell added 2 commits July 21, 2022 05:09

More comments.

0c9ea5d

Remove unused template parameter.

52b10ac

trivialfis approved these changes Jul 21, 2022

View reviewed changes

RAMitchell merged commit 1be0984 into dmlc:master Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor split valuation kernel #8073

Refactor split valuation kernel #8073

RAMitchell commented Jul 14, 2022

trivialfis left a comment

trivialfis Jul 21, 2022

RAMitchell Jul 21, 2022

RAMitchell commented Jul 21, 2022

Refactor split valuation kernel #8073

Refactor split valuation kernel #8073

Conversation

RAMitchell commented Jul 14, 2022

trivialfis left a comment

Choose a reason for hiding this comment

trivialfis Jul 21, 2022

Choose a reason for hiding this comment

RAMitchell Jul 21, 2022

Choose a reason for hiding this comment

RAMitchell commented Jul 21, 2022