Support optimal partitioning for GPU hist. #7652

trivialfis · 2022-02-13T18:23:06Z

Implement MaxCategory in quantile.
Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function.
Extract an evaluator from GPU Hist to store the needed states.
Added some CUDA stream/event utilities.
Update document with references.
Fixed a bug in approx evaluator where the number of data points is less than the number of categories.

trivialfis · 2022-02-13T18:30:01Z

I linked some of the references in the document. For a quick review, the algorithm is based on a proof by Fisher, which states that, when trying to partition a set of discrete values into groups based on the distances between a measure of these values, one only needs to look at sorted partitions instead of enumerating all possible permutations(I have a test comparing that). In the context of decision trees, the discrete values are categories, and the measure is the output leaf value.

I will push some work into follow-up PRs as this one is already complicated as it's:

Better test for mixed feature types and mixed categorical types.
Better test for missing values. (and backward scan)
Add column sampling to the hypothesis test.
Continue the work from @hcho3 for rewriting the evaluation function.

src/tree/gpu_hist/evaluator.cu

src/tree/updater_gpu_hist.cu

demo/guide-python/categorical.py

doc/tutorials/categorical.rst

src/tree/gpu_hist/evaluate_splits.cuh

trivialfis · 2022-02-13T23:20:52Z

src/tree/gpu_hist/evaluate_splits.cu

+  auto d_entries = out_entries;
+  auto cats_out = this->DeviceCatStorage(left.nidx);
+  // turn candidate into entry, along with hanlding sort based split.
+  dh::LaunchN(right.feature_set.empty() ? 1 : 2, [=] __device__(size_t i) {


Any chance we can fuse this kernel into the segmented reduction using transform output iter?

Seems to be non-trivial.

I managed to implement it using thrust instead of cub. Will follow up on this after initial feature complete implementation.

tests/python/test_updaters.py

doc/tutorials/categorical.rst

trivialfis · 2022-02-14T02:01:17Z

src/tree/gpu_hist/evaluate_splits.cu

+    auto boundary = std::min(static_cast<size_t>((best_thresh + 1)), (f_sorted_idx.size() - 1));
+    boundary = std::max(boundary, static_cast<size_t>(1ul));
+    auto end = beg + boundary;
+    thrust::for_each(thrust::seq, beg, end, [&](auto c) {


This is quite inefficient when number of categories is huge. But I think it's best to push the optimization work after 1.6.

RAMitchell

LGTM in general. It is making our codebase way more complicated, but I think we knew that would happen.

RAMitchell · 2022-02-14T11:42:11Z

demo/guide-python/cat_in_the_dat.py

        eval_metric="auc",
+        enable_categorical=True,
+        max_cat_to_onehot=1,    # We use optimal partitioning exclusively


I find this interface a bit weird, but I see that it is following LightGBM.

src/common/categorical.h

RAMitchell · 2022-02-14T13:38:33Z

src/tree/gpu_hist/evaluate_splits.cuh

-                    EvaluateSplitInputs<GradientSumT> left,
-                    EvaluateSplitInputs<GradientSumT> right);
-template <typename GradientSumT>
-void EvaluateSingleSplit(common::Span<DeviceSplitCandidate> out_split,


Going from a function to a class is not good, as we now we have lots of state. I'm not sure if anything can be done about it.

trivialfis added 2 commits February 14, 2022 02:07

Support optimal partitioning for GPU hist.

f05b51c

Restore the demo.

d558808

trivialfis mentioned this pull request Feb 13, 2022

Categorical data support. #6503

Closed

67 tasks

trivialfis requested review from RAMitchell and hcho3 February 13, 2022 18:31

Windows build.

ac70e1f

trivialfis commented Feb 13, 2022

View reviewed changes

src/tree/gpu_hist/evaluator.cu Outdated Show resolved Hide resolved

trivialfis commented Feb 13, 2022

View reviewed changes

src/tree/updater_gpu_hist.cu Outdated Show resolved Hide resolved

src/tree/updater_gpu_hist.cu Outdated Show resolved Hide resolved

trivialfis added 4 commits February 14, 2022 02:51

small cleanups.

b09fa18

define stream wait default.

2951520

CUDA 11.0

9b3cd3e

format.

5dc6a9e

trivialfis force-pushed the cat-gpu-hist-part-evaluation branch from 703451a to 5dc6a9e Compare February 13, 2022 19:01

trivialfis added 4 commits February 14, 2022 03:06

typo & polishes.

5b5d08b

Windows.

520bf56

replace fill.

4812a1e

Cleanup.

861637a

trivialfis commented Feb 13, 2022

View reviewed changes

demo/guide-python/categorical.py Outdated Show resolved Hide resolved

trivialfis commented Feb 13, 2022

View reviewed changes

doc/tutorials/categorical.rst Outdated Show resolved Hide resolved

trivialfis commented Feb 13, 2022

View reviewed changes

trivialfis added 2 commits February 14, 2022 09:28

Fuse the root evaluation.

f0e7ccd

typo.

b28974b

trivialfis commented Feb 14, 2022

View reviewed changes

RAMitchell approved these changes Feb 14, 2022

View reviewed changes

trivialfis merged commit 0d0abe1 into dmlc:master Feb 14, 2022

trivialfis deleted the cat-gpu-hist-part-evaluation branch February 14, 2022 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support optimal partitioning for GPU hist. #7652

Support optimal partitioning for GPU hist. #7652

trivialfis commented Feb 13, 2022 •

edited

trivialfis commented Feb 13, 2022 •

edited

trivialfis Feb 13, 2022 •

edited

trivialfis Feb 14, 2022

trivialfis Feb 14, 2022

trivialfis Feb 14, 2022

RAMitchell left a comment

RAMitchell Feb 14, 2022

RAMitchell Feb 14, 2022

Support optimal partitioning for GPU hist. #7652

Support optimal partitioning for GPU hist. #7652

Conversation

trivialfis commented Feb 13, 2022 • edited

trivialfis commented Feb 13, 2022 • edited

trivialfis Feb 13, 2022 • edited

Choose a reason for hiding this comment

trivialfis Feb 14, 2022

Choose a reason for hiding this comment

trivialfis Feb 14, 2022

Choose a reason for hiding this comment

trivialfis Feb 14, 2022

Choose a reason for hiding this comment

RAMitchell left a comment

Choose a reason for hiding this comment

RAMitchell Feb 14, 2022

Choose a reason for hiding this comment

RAMitchell Feb 14, 2022

Choose a reason for hiding this comment

trivialfis commented Feb 13, 2022 •

edited

trivialfis commented Feb 13, 2022 •

edited

trivialfis Feb 13, 2022 •

edited