New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using column_sampler for optimization of ColWiseBuildHist #8319
base: master
Are you sure you want to change the base?
Using column_sampler for optimization of ColWiseBuildHist #8319
Conversation
Please ignore the failing MacOS tests from GitHub Actions. We're working to fix it. |
Hi @trivialfis , |
Apologies for the slow response, I will get to this tomorrow. |
Hi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay. May I ask the specific objective of this PR? Are you trying to optimize the case to build for less columns when sampling is enabled?
I'm concerned with optimizations that don't generalize and requires edge cases. I haven't been able to track which test can confirm the correctness of the code.
@@ -284,13 +300,14 @@ void ColsWiseBuildHistKernel(const std::vector<GradientPair> &gpair, | |||
}; | |||
|
|||
const size_t n_features = gmat.cut.Ptrs().size() - 1; | |||
const size_t n_columns = n_features; | |||
const size_t n_columns = kColumnSampling ? fids.size() : n_features; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is fids
empty if there's no sampling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fids is empty is case of condition here is false.
@@ -76,6 +84,20 @@ class HistogramBuilder { | |||
buffer_.Reset(this->n_threads_, n_nodes, space, target_hists); | |||
} | |||
|
|||
constexpr float kColsampleTh = 0.1; | |||
bool column_sampling = (column_sampler_ != nullptr) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When column sampler is nullptr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set it to nullptr in tests of the histogram builder without using this optimization.
@@ -76,6 +84,20 @@ class HistogramBuilder { | |||
buffer_.Reset(this->n_threads_, n_nodes, space, target_hists); | |||
} | |||
|
|||
constexpr float kColsampleTh = 0.1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 0.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an ad-hoc threshold value.
constexpr float kColsampleTh = 0.1; | ||
bool column_sampling = (column_sampler_ != nullptr) && | ||
(train_param_.colsample_bytree < kColsampleTh || | ||
train_param_.colsample_bylevel < kColsampleTh); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about bynode? Is it used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not now, maybe later, one can investigate these options more deeply.
yes, that was the idea
It is reasonable. I added a subcase for |
Let me try to look deeper into this PR. The optimization is definitely important but I would like to keep lesser conditional branches than it currently has. |
Hi,
this PR is a part of #7192.
It continues the optimization of BuildHistKernel, which was started at #8233.
In #8233 I removed the optimization related to column_sampler to simplify the reviewing process. Here I return it.
The methodology of the benchmarking is the same as in #8218. The only dataset affected by optimization is
santander
. Training speed forsantander
increases twice after applying this optimization. You can find the exact numbers at #8233.I am looking forward for your review and comments!