CPU evaluation for cat data. #7393

trivialfis · 2021-11-03T18:27:44Z

Implementation for one hot based.
Implementation for partition based for regression. (LightGBM)

Currently, the evaluation function is using parameters from numerical split. We should set up a new set of training parameters later. But before that, I want to have a working implementation and some experiment results first.

Extracted from #7214 .

* Implementation for one hot based. * Implementation for partition based.

trivialfis · 2021-11-03T18:57:39Z

tests/cpp/common/test_quantile.cu

@@ -5,6 +5,13 @@
 #include "../../../src/common/quantile.cuh"



Some issues in the quantile test were found after changing the SimpleLCG, so the changes in this file are related.

trivialfis · 2021-11-03T19:00:20Z

src/tree/hist/evaluate_splits.h

@@ -91,49 +96,118 @@ template <typename GradientSumT, typename ExpandEntry> class HistEvaluator {
      iend = static_cast<int32_t>(cut_ptr[fidx]) - 1;
    }

+    auto calc_bin_value = [&](auto i) {


This can be split into multiple functions, but then we will have lots of duplicated code.

trivialfis · 2021-11-03T19:01:19Z

src/tree/hist/evaluate_splits.h

+            std::vector<size_t> sorted_idx(n_bins);
+            std::iota(sorted_idx.begin(), sorted_idx.end(), 0);
+            auto feat_hist = histogram.subspan(cut_ptr[fidx], n_bins);
+            std::stable_sort(sorted_idx.begin(), sorted_idx.end(), [&](size_t l, size_t r) {


Can't use argsort as we don't have a cpu transform iter.

hcho3

LGTM

hcho3 · 2021-11-06T00:02:28Z

tests/cpp/tree/hist/test_evaluate_splits.cc

@@ -39,7 +36,7 @@ template <typename GradientSumT> void TestEvaluateSplits() {
  std::iota(row_indices.begin(), row_indices.end(), 0);
  row_set_collection.Init();

-  auto hist_builder = GHistBuilder<GradientSumT>(n_threads, gmat.cut.Ptrs().back());
+  auto hist_builder = GHistBuilder<GradientSumT>(omp_get_max_threads(), gmat.cut.Ptrs().back());


Shouldn't we call function OmpGetThreadLimit() to query OMP_THREAD_LIMIT?

It's a c++ unittest, I think we don't have to worry about that too much. But on the other hand, we need more thorough integration tests for that env, probably in another PR.

hcho3 · 2021-11-06T00:04:22Z

tests/cpp/tree/hist/test_evaluate_splits.cc

+  size_t n_cats{8};
+
+  auto dmat =
+      RandomDataGenerator(kRows, kCols, 0).Seed(3).Type(ft).MaxCategory(n_cats).GenerateDMatrix();


I really like how we can generate random data with a fluent interface.

CPU evaluation for cat data.

036eb6b

* Implementation for one hot based. * Implementation for partition based.

trivialfis mentioned this pull request Nov 3, 2021

Categorical data support. #6503

Closed

67 tasks

trivialfis added 2 commits November 4, 2021 02:29

Port the fixes for GPUQunatile tests.

9b62857

Remove smoothing for now.

395d6b5

trivialfis commented Nov 3, 2021

View reviewed changes

Fix

d06c13b

hcho3 approved these changes Nov 6, 2021

View reviewed changes

trivialfis merged commit d7d1b6e into dmlc:master Nov 6, 2021

trivialfis deleted the cat-evaluate-split branch November 6, 2021 06:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU evaluation for cat data. #7393

CPU evaluation for cat data. #7393

trivialfis commented Nov 3, 2021 •

edited

trivialfis Nov 3, 2021

trivialfis Nov 3, 2021 •

edited

trivialfis Nov 3, 2021

hcho3 left a comment

hcho3 Nov 6, 2021

trivialfis Nov 6, 2021 •

edited

hcho3 Nov 6, 2021

CPU evaluation for cat data. #7393

CPU evaluation for cat data. #7393

Conversation

trivialfis commented Nov 3, 2021 • edited

trivialfis Nov 3, 2021

Choose a reason for hiding this comment

trivialfis Nov 3, 2021 • edited

Choose a reason for hiding this comment

trivialfis Nov 3, 2021

Choose a reason for hiding this comment

hcho3 left a comment

Choose a reason for hiding this comment

hcho3 Nov 6, 2021

Choose a reason for hiding this comment

trivialfis Nov 6, 2021 • edited

Choose a reason for hiding this comment

hcho3 Nov 6, 2021

Choose a reason for hiding this comment

trivialfis commented Nov 3, 2021 •

edited

trivialfis Nov 3, 2021 •

edited

trivialfis Nov 6, 2021 •

edited