The `hist` method of XGBoost scales poorly on multi-core CPUs: a demo script

Currently, the hist tree-growing algorithm (tree_method=hist) of XGBoost scales poorly on multi-core CPUs: for some datasets, performance deteriorates as the number of threads is increased. This issue was discovered by @Laurae2's Gradient Boosting Benchmark.

To make things easier for contributors, I went ahead and isolated the performance bottleneck. A vast majority of time (> 95 %) is spent in a stage known as gradient histogram construction. This repository isolates this stage so that it is easy to fix and improve.

How to compile and run

Compile the script by running CMake:

mkdir build
cd build
cmake ..
make

Download record.tar.bz2 in the same directory.
Extract record.tar.bz2 by running tar xvf record.tar.bz2.
Run the script:

# Usage: ./perflab record/ [number of threads]
./perflab record/ 36

Running with different number of threads should produce the following trend of performance:

What this script does

The script reads from record.tar.bz2, which was processed from the Bosch dataset. Its job is to compute histograms for gradient pairs, where each bin of histogram is a partial sum.

Some background:

A gradient for a given instance (X_i, y_i) is a pair of double values that quantify the distance between the true label y_i and predicted label yhat_i.
There are as many gradient pairs as there are instances in a training dataset.
In order to find optimal splits for decision trees, we compute a histogram of gradients. Each bin of the histogram stands for a range of feature values. The value of the bin is given by the sum of gradients corresponding to the data points lying inside the range.
In each boosting iteration, we have to compute multiple histograms, each histogram corresponding to a set of instances.

Setting build types

By default, 'Release' build type will be used, with flags -O3 -DNDEBUG.
For perfiling, you may want to add debug symbols by choosing 'RelWithDebInfo' build type instead:
```
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
```
This build type uses the following flags: -O2 -g -DNDEBUG.
For full control over the compilation flags, specify CMAKE_CXX_FLAGS_RELEASE:
```
cmake -DCMAKE_CXX_FLAGS_RELEASE="-O3 -g -DNDEBUG -march=native" ..
```
This give you full control over the optimization flags. Here, we are compiling with -O3 -g -DNDEBUG -march=native flags.

You can check whether they are applied using make VERBOSE=1 and looking at the C++ compilation lines for the existence of the flags you used:
```
/usr/bin/c++   -I/home/ubuntu/xgboost-fast-hist-perf-lab/include  -O3 -g -DNDEBUG -march=native
    -fopenmp -std=gnu++11 -o CMakeFiles/perflab.dir/src/main.cc.o
    -c /home/ubuntu/xgboost-fast-hist-perf-lab/src/main.cc
```

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
cmake		cmake
include		include
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
c5.9xlarge-log.txt		c5.9xlarge-log.txt
scaling.png		scaling.png
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmake

cmake

include

include

src

src

.gitignore

.gitignore

CMakeLists.txt

CMakeLists.txt

README.md

README.md

c5.9xlarge-log.txt

c5.9xlarge-log.txt

scaling.png

scaling.png

test.sh

test.sh

Repository files navigation

The `hist` method of XGBoost scales poorly on multi-core CPUs: a demo script

How to compile and run

What this script does

Setting build types

About

Releases

Packages

Contributors 2

Languages

hcho3/xgboost-fast-hist-perf-lab

Folders and files

Latest commit

History

Repository files navigation

The hist method of XGBoost scales poorly on multi-core CPUs: a demo script

How to compile and run

What this script does

Setting build types

About

Resources

Stars

Watchers

Forks

Languages

The `hist` method of XGBoost scales poorly on multi-core CPUs: a demo script