Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support optimal partitioning for GPU hist. #7652

Merged
merged 13 commits into from Feb 14, 2022

Conversation

trivialfis
Copy link
Member

@trivialfis trivialfis commented Feb 13, 2022

  • Implement MaxCategory in quantile.
  • Implement partition-based split for GPU evaluation. Currently, it's based on the existing evaluation function.
  • Extract an evaluator from GPU Hist to store the needed states.
  • Added some CUDA stream/event utilities.
  • Update document with references.
  • Fixed a bug in approx evaluator where the number of data points is less than the number of categories.

@trivialfis trivialfis mentioned this pull request Feb 13, 2022
67 tasks
@trivialfis
Copy link
Member Author

trivialfis commented Feb 13, 2022

I linked some of the references in the document. For a quick review, the algorithm is based on a proof by Fisher, which states that, when trying to partition a set of discrete values into groups based on the distances between a measure of these values, one only needs to look at sorted partitions instead of enumerating all possible permutations(I have a test comparing that). In the context of decision trees, the discrete values are categories, and the measure is the output leaf value.

I will push some work into follow-up PRs as this one is already complicated as it's:

  • Better test for mixed feature types and mixed categorical types.
  • Better test for missing values. (and backward scan)
  • Add column sampling to the hypothesis test.
  • Continue the work from @hcho3 for rewriting the evaluation function.

src/tree/updater_gpu_hist.cu Outdated Show resolved Hide resolved
src/tree/updater_gpu_hist.cu Outdated Show resolved Hide resolved
src/tree/gpu_hist/evaluate_splits.cuh Outdated Show resolved Hide resolved
auto d_entries = out_entries;
auto cats_out = this->DeviceCatStorage(left.nidx);
// turn candidate into entry, along with hanlding sort based split.
dh::LaunchN(right.feature_set.empty() ? 1 : 2, [=] __device__(size_t i) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can fuse this kernel into the segmented reduction using transform output iter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be non-trivial.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to implement it using thrust instead of cub. Will follow up on this after initial feature complete implementation.

tests/python/test_updaters.py Outdated Show resolved Hide resolved
doc/tutorials/categorical.rst Outdated Show resolved Hide resolved
auto boundary = std::min(static_cast<size_t>((best_thresh + 1)), (f_sorted_idx.size() - 1));
boundary = std::max(boundary, static_cast<size_t>(1ul));
auto end = beg + boundary;
thrust::for_each(thrust::seq, beg, end, [&](auto c) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite inefficient when number of categories is huge. But I think it's best to push the optimization work after 1.6.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. It is making our codebase way more complicated, but I think we knew that would happen.

eval_metric="auc",
enable_categorical=True,
max_cat_to_onehot=1, # We use optimal partitioning exclusively
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this interface a bit weird, but I see that it is following LightGBM.

src/common/categorical.h Show resolved Hide resolved
EvaluateSplitInputs<GradientSumT> left,
EvaluateSplitInputs<GradientSumT> right);
template <typename GradientSumT>
void EvaluateSingleSplit(common::Span<DeviceSplitCandidate> out_split,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going from a function to a class is not good, as we now we have lots of state. I'm not sure if anything can be done about it.

@trivialfis trivialfis merged commit 0d0abe1 into dmlc:master Feb 14, 2022
@trivialfis trivialfis deleted the cat-gpu-hist-part-evaluation branch February 14, 2022 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants