Optimization/buildhist/colwisebuildhist #8233

razdoburdin · 2022-09-08T08:52:55Z

Hi,
this PR is a part of #7192. It continues the optimization of BuildHistKernel, that was started in #8218.
Here I introduce a column wise building of histogram and a dispatcher for choice between the row wise and the column wise kernels. This optimization allow to improve runtime speed for several datasets up to 2x. For performance measurements I booked the c6i.12xlarge (24 cores, hyperthreading is off) instance on on Amazon Web Services and used the benchmarks from here.

Methodology of the benchmarking is the same as in #8218. The only two datasets from the list affecting by this optimization are santander and epsilon. For this reason only them are shown in the following table:

	master	this PR	speedup
santander	1.02E+02	4.89E+01	2.09
epsion	1.84E+02	1.03E+02	1.78

I am looking forward for your review and comments!

upg:
I have deleted the optimization related to santander dataset for simplification of the review.

Merge the last changes

…dhist/colwisebuildhist

razdoburdin · 2022-09-12T07:16:11Z

Hi, for performance measurements I have used the master branch in the dmlc/xgboost repo. You can easily reproduce the calculations with it. If you need any help, you are welcome! From: wenyi Liu ***@***.***> Sent: Saturday, September 10, 2022 11:34 AM To: dmlc/xgboost ***@***.***> Cc: Razdoburdin, Dmitry ***@***.***>; Author ***@***.***> Subject: Re: [dmlc/xgboost] Optimization/buildhist/colwisebuildhist (PR #8233) Hi, Your work is very interesting. I‘d like to test your optimization effect. Could you please tell me the source version of XGBoost?

liuwenyi2 · 2022-09-12T09:29:37Z

Hi, for performance measurements I have used the master branch in the dmlc/xgboost repo. You can easily reproduce the calculations with it. If you need any help, you are welcome! From: wenyi Liu @.> Sent: Saturday, September 10, 2022 11:34 AM To: dmlc/xgboost @.> Cc: Razdoburdin, Dmitry @.>; Author @.> Subject: Re: [dmlc/xgboost] Optimization/buildhist/colwisebuildhist (PR #8233) Hi, Your work is very interesting. I‘d like to test your optimization effect. Could you please tell me the source version of XGBoost?

much appreciated!
Since I can only compile XGboost-1.1.0 on clusterIs.
I want to reproduce your calculations with 1.1.0 to do some measurements.
I'm trying to do it.

trivialfis

Thank you for the work on optimization. Some questions in the comments. Can we omit the optimization on column sampling for now and just build the histogram for all columns?

src/common/hist_util.cc

trivialfis · 2022-09-14T16:31:08Z

src/common/column_matrix.cc

@@ -23,10 +23,12 @@ void ColumnMatrix::InitStorage(GHistIndexMatrix const& gmat, double sparse_thres
  gmat.GetFeatureCounts(feature_counts.data());

  // classify features
+  any_sparse_column_ = false;


any_sparse_column_ = !all_dense_column

trivialfis · 2022-09-14T16:32:58Z

src/tree/hist/histogram.h

@@ -16,6 +19,19 @@

 namespace xgboost {
 namespace tree {
+
+struct ColSample {


I think using TrainParam directly seems to be simpiler.

trivialfis · 2022-09-14T16:34:22Z

src/tree/hist/histogram.h

  size_t n_batches_{0};
  // Whether XGBoost is running in distributed environment.
  bool is_distributed_{false};
+  // Addhoch colsample threshold level


trivialfis · 2022-09-14T16:35:00Z

src/tree/hist/histogram.h

    const size_t n_nodes = nodes_for_explicit_hist_build.size();
    CHECK_GT(n_nodes, 0);

+    const common::ColumnMatrix& column_matrix = gidx.Transpose();
+    const bool column_sampling =


Could you please explain the logic here? there are missing values but no sparse column in column matrix and bytree/bylevel lesser than threshold while there's no bynode? Is this necessary?

Also, this special case seems to be difficult to test.

…ew process.

razdoburdin · 2022-09-19T13:55:48Z

Hi, I have removed column sampling optimization for a while. Hope it will help with a review. Thanks for the suggestion!

trivialfis

Could you please try to enable some tests for each of the case? I don't want to break anything in here by accident. ;-)

…n wise buiilhist.

razdoburdin · 2022-09-21T11:50:27Z

Could you please try to enable some tests for each of the case? I don't want to break anything in here by accident. ;-)

As far as column wise building is just another way to calculate the same result as the row wise building, I suggest to reuse the existing tests for histogram building.
I added force_read_by_column flag to BuildHist. It allows to use disable automatic choice of building strategy and to use column wise building instead. I duplicated the related tests with the new flag, so they now test both strategies.

razdoburdin and others added 6 commits September 7, 2022 10:58

Merge pull request #10 from dmlc/master

5734b3b

Merge the last changes

Intoducing Column Wise Hist Building

18f4f1d

linting

063807a

more linting

b5596e4

bug fixing

c3d193f

Merge remote-tracking branch 'upstream/master' into optimization/buil…

83a8654

…dhist/colwisebuildhist

trivialfis reviewed Sep 14, 2022

View reviewed changes

razdoburdin marked this pull request as draft September 19, 2022 13:16

dmitry.razdoburdin added 4 commits September 19, 2022 06:20

Removing column samping optimization for a while to simplify the revi…

85a2f28

…ew process.

linting

83277d4

Removing unnecessary changes

106ebc0

Use DispatchBinType in hist_util.cc

c539a24

razdoburdin marked this pull request as ready for review September 19, 2022 13:54

trivialfis reviewed Sep 20, 2022

View reviewed changes

Adding force_read_by column flag to buildhist. Adding tests for colum…

d228560

…n wise buiilhist.

trivialfis approved these changes Sep 21, 2022

View reviewed changes

trivialfis merged commit eb7bbee into dmlc:master Sep 21, 2022

razdoburdin mentioned this pull request Oct 7, 2022

Using column_sampler for optimization of ColWiseBuildHist #8319

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization/buildhist/colwisebuildhist #8233

Optimization/buildhist/colwisebuildhist #8233

razdoburdin commented Sep 8, 2022 •

edited

razdoburdin commented Sep 12, 2022 via email •

edited

liuwenyi2 commented Sep 12, 2022

trivialfis left a comment

trivialfis Sep 14, 2022

trivialfis Sep 14, 2022

trivialfis Sep 14, 2022

trivialfis Sep 14, 2022

razdoburdin commented Sep 19, 2022

trivialfis left a comment

razdoburdin commented Sep 21, 2022

Optimization/buildhist/colwisebuildhist #8233

Optimization/buildhist/colwisebuildhist #8233

Conversation

razdoburdin commented Sep 8, 2022 • edited

razdoburdin commented Sep 12, 2022 via email • edited

liuwenyi2 commented Sep 12, 2022

trivialfis left a comment

Choose a reason for hiding this comment

trivialfis Sep 14, 2022

Choose a reason for hiding this comment

trivialfis Sep 14, 2022

Choose a reason for hiding this comment

trivialfis Sep 14, 2022

Choose a reason for hiding this comment

trivialfis Sep 14, 2022

Choose a reason for hiding this comment

razdoburdin commented Sep 19, 2022

trivialfis left a comment

Choose a reason for hiding this comment

razdoburdin commented Sep 21, 2022

razdoburdin commented Sep 8, 2022 •

edited

razdoburdin commented Sep 12, 2022 via email •

edited