Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition optimization #7208

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

ShvetsKS
Copy link
Contributor

@ShvetsKS ShvetsKS commented Sep 2, 2021

it's part of #7192

Current implementation of partition allow to read bin matrix data with less random access

full train airline, ~100m higgs, ~10m
master, s 403 53.7
this PR, s 170s 32.5
speedup 2.37 1.65

@codecov-commenter
Copy link

codecov-commenter commented Sep 3, 2021

Codecov Report

Merging #7208 (137183e) into master (3290a4f) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #7208   +/-   ##
=======================================
  Coverage   82.67%   82.67%           
=======================================
  Files          13       13           
  Lines        4017     4017           
=======================================
  Hits         3321     3321           
  Misses        696      696           
Impacted Files Coverage Δ
python-package/xgboost/core.py 84.35% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3290a4f...137183e. Read the comment docs.

@trivialfis trivialfis marked this pull request as draft September 7, 2021 11:34
@ShvetsKS
Copy link
Contributor Author

@trivialfis Can we start detailed review of this PR as this part is related to partition optimizations only?
Optimizations of histogram, evaluate and sync kernels are prepared in separate branches

@ShvetsKS ShvetsKS marked this pull request as ready for review September 22, 2021 14:15
check_hist(parent_hist, this_hist, sibling_hist, 0, total_bins);
++node_id;
}
// const common::OptPartitionBuilder* p_opt_partition_builder;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a sign of working in progress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my mistake, sorry!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests were aligned to optimized partition kernel usage

@ShvetsKS
Copy link
Contributor Author

@trivialfis failed steps Jenkins Linux: Test and Jenkins Linux: Deploy have similar logs as latest commit in master branch^ https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/release_1.5.0/1/pipeline
seems there is no any failed checks/tests affected by current changes.

@trivialfis
Copy link
Member

Yup, I'm aware of the failure. It's blocking the release.

@napetrov
Copy link
Contributor

napetrov commented Oct 1, 2021

@trivialfis - are we considering this PR to be integrated prior to release?

@trivialfis
Copy link
Member

No, I don't think it's a good idea to integrate this amount of changes. I looked briefly before and I think it can use some cleanups. I'm on holiday right now. Will provide detailed review and possible changes once I come back.

@napetrov
Copy link
Contributor

napetrov commented Oct 1, 2021

@trivialfis - ok, thanks. Have good holidays!

@ShvetsKS ShvetsKS force-pushed the optimization_part_applysplit branch 2 times, most recently from a2a6ccd to fd8244c Compare October 3, 2021 09:29
@ShvetsKS
Copy link
Contributor Author

ShvetsKS commented Oct 3, 2021

similar fails in master branch here:
docker: Error response from daemon: OCI runtime create failed
@trivialfis should I try to restart tests again?

@trivialfis
Copy link
Member

Restarted.

@hcho3 The issue seems to be lingering.

@trivialfis trivialfis self-assigned this Oct 10, 2021
Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I appreciate your excellent optimization. But I do think there's a way to write more readable code.

The review is not in detail. A new partitioner that implements std::partition spans 550 LOC without any comment and with 29 public variables is a bit too much for me. Also, the old one seems to be still here for some reason. I can't merge a PR with this level of complexity without seriously thinking about a redesign/rewrite. There are some good reading materials in c++ core guidelines and are highly recommended:

https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#main
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#S-performance

@@ -115,7 +115,7 @@ class XGBoostGeneralSuite extends FunSuite with TmpFolderPerSuite with PerTest {
val eval = new EvalError()
val training = buildDataFrame(Classification.train)
val testDM = new DMatrix(Classification.test.iterator)
val paramMap = Map("eta" -> "1", "gamma" -> "0.5", "max_depth" -> "0",
val paramMap = Map("eta" -> "1", "gamma" -> "0.5", "max_depth" -> "6",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted back

@@ -0,0 +1,552 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the original one deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, deleted currently, test for partition was updated

dynamic_cast<const SparseColumn<BinIdxType>*>(GetColumnsRef<BinIdxType>()[fid].get());
}
for (size_t tid = 0; tid < nthreads; ++tid) {
states[tid].resize(1 << (max_depth + 1), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a MaxNodes function in TreeParam.

if (is_loss_guided) {
const int cleft = compleate_trees_depth_wise[0];
const int cright = compleate_trees_depth_wise[1];
// template!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, fixed

std::vector<uint16_t> curr_level_nodes_;
std::vector<int32_t> split_conditions_;
std::vector<uint64_t> split_ind_;
std::vector<uint16_t> compleate_trees_depth_wise_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compleate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

Comment on lines 33 to 63
std::vector<std::vector<Slice>> threads_addr;
std::vector<std::vector<uint16_t>> threads_id_for_nodes;
std::vector<std::vector<uint16_t>> node_id_for_threads;
std::vector<std::vector<uint32_t>> threads_rows_nodes_wise;
std::vector<std::vector<uint32_t>> threads_nodes_count;
std::vector<std::vector<int>> nodes_count;
std::vector<Slice> partitions;
std::vector<std::vector<uint32_t>> vec_rows;
std::vector<std::vector<uint32_t>> vec_rows_remain;
std::vector<std::unique_ptr<const Column<uint8_t> >> columns8;
std::vector<const DenseColumn<uint8_t, true>*> dcolumns8;
std::vector<const SparseColumn<uint8_t>*> scolumns8;
std::vector<std::unique_ptr<const Column<uint16_t> >> columns16;
std::vector<const DenseColumn<uint16_t, true>*> dcolumns16;
std::vector<const SparseColumn<uint16_t>*> scolumns16;
std::vector<std::unique_ptr<const Column<uint32_t> >> columns32;
std::vector<const DenseColumn<uint32_t, true>*> dcolumns32;
std::vector<const SparseColumn<uint32_t>*> scolumns32;
std::vector<std::vector<size_t> > states;
const RegTree* p_tree;
// can be common for all threads!
std::vector<std::vector<bool>> default_flags;
const uint8_t* data_hash;
std::vector<uint32_t> row_set_collection_vec;
uint32_t gmat_n_rows;
uint32_t* row_indices_ptr;
size_t n_threads = 0;
uint32_t summ_size = 0;
uint32_t summ_size_remain = 0;
uint32_t max_depth = 0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

29 public variables ... how do you keep track of that ...

const auto& dense_columns = GetDenseColumnsRef<BinIdxType>();
const auto& sparse_columns = GetSparseColumnsRef<BinIdxType>();
uint32_t count = 0;
uint32_t count2 = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temp_0, temp_1 is not a good way to name variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed, thanks!

const RowSetCollection::Elem e = row_set_collection_[nid];
for (const size_t *it = e.begin; it < e.end; ++it) {
grad_stat.Add(gpair_h[*it].GetGrad(), gpair_h[*it].GetHess());
for (const GradientPair& gh : gpair_h) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe std::accumulate ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will affect the precision, otherwise we need to support new operator+(xgboost::detail::GradientPairInternal<double>, const xgboost::detail::GradientPairInternal<float>)

for (auto const& entry : expand) {
if (entry.IsValid(param_, *num_leaves)) {
nodes_for_apply_split->push_back(entry);
evaluator_->ApplyTreeSplit(entry, p_tree);
(*num_leaves)++;
curr_level_nodes_[2*entry.nid] = (*p_tree)[entry.nid].LeftChild();
curr_level_nodes_[2*entry.nid + 1] = (*p_tree)[entry.nid].RightChild();
compleate_node_ids.push_back((*p_tree)[entry.nid].LeftChild());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compleate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed thanks!

}

template<typename BinIdxType, bool is_loss_guided, bool all_dense = true>
void CommonPartition(size_t tid, const size_t row_indices_begin,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::vector<std::unique_ptr<const Column<uint32_t> >> columns32;
std::vector<const DenseColumn<uint32_t, true>*> dcolumns32;
std::vector<const SparseColumn<uint32_t>*> scolumns32;
std::vector<std::vector<size_t> > states;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_neg_hess = true;
}
}
const size_t block_size = info.num_row_ / this->nthread_ + !!(info.num_row_ % this->nthread_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block size calculation is used in multiple places, including prediction I think? Any plan to create some abstraction for it?

https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rp-library

std::swap is a small function, yet it's a worthy function to create.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new function GetBlockSize and tried to use it in all similar cases. Thanks!

vec_rows_remain.resize(nthreads);
} else {
threads_nodes_count.clear();
threads_nodes_count.resize(n_threads);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why clear then resize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to clear full threads_nodes_cout buffers. If keep only resize(), information in threads_nodes_cout is not valid.

Copy link
Contributor Author

@ShvetsKS ShvetsKS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trivialfis thanks a lot for review and sorry for long response!
I applied part of comments but related to code complexity, number of input arguments, and reworking in general are still in progress.

const auto& dense_columns = GetDenseColumnsRef<BinIdxType>();
const auto& sparse_columns = GetSparseColumnsRef<BinIdxType>();
uint32_t count = 0;
uint32_t count2 = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed, thanks!

if (is_loss_guided) {
const int cleft = compleate_trees_depth_wise[0];
const int cright = compleate_trees_depth_wise[1];
// template!
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, fixed

const RowSetCollection::Elem e = row_set_collection_[nid];
for (const size_t *it = e.begin; it < e.end; ++it) {
grad_stat.Add(gpair_h[*it].GetGrad(), gpair_h[*it].GetHess());
for (const GradientPair& gh : gpair_h) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will affect the precision, otherwise we need to support new operator+(xgboost::detail::GradientPairInternal<double>, const xgboost::detail::GradientPairInternal<float>)

for (auto const& entry : expand) {
if (entry.IsValid(param_, *num_leaves)) {
nodes_for_apply_split->push_back(entry);
evaluator_->ApplyTreeSplit(entry, p_tree);
(*num_leaves)++;
curr_level_nodes_[2*entry.nid] = (*p_tree)[entry.nid].LeftChild();
curr_level_nodes_[2*entry.nid + 1] = (*p_tree)[entry.nid].RightChild();
compleate_node_ids.push_back((*p_tree)[entry.nid].LeftChild());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed thanks!

std::vector<uint16_t> curr_level_nodes_;
std::vector<int32_t> split_conditions_;
std::vector<uint64_t> split_ind_;
std::vector<uint16_t> compleate_trees_depth_wise_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

has_neg_hess = true;
}
}
const size_t block_size = info.num_row_ / this->nthread_ + !!(info.num_row_ % this->nthread_);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new function GetBlockSize and tried to use it in all similar cases. Thanks!

@@ -0,0 +1,552 @@

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, deleted currently, test for partition was updated

vec_rows_remain.resize(nthreads);
} else {
threads_nodes_count.clear();
threads_nodes_count.resize(n_threads);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to clear full threads_nodes_cout buffers. If keep only resize(), information in threads_nodes_cout is not valid.

@ShvetsKS ShvetsKS force-pushed the optimization_part_applysplit branch 2 times, most recently from eeb7a88 to 8ed8a91 Compare October 24, 2021 15:41
@trivialfis
Copy link
Member

t will affect the precision, otherwise we need to support new

Got it.

For some reason, I can't reply inline. I will take a deeper dive into the optimization and see if we can work on smaller subsets first. Thanks for the hard work!

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I took another look, but only at the histogram building kernel this time. Can we start by working on this function first? Merging smaller functions makes review and test easier, also has the benefit of decoupling different code sections. What do you think?

src/objective/regression_obj.cu Outdated Show resolved Hide resolved
src/common/threading_utils.h Show resolved Hide resolved
const size_t nrows = row_end - row_begin;
const size_t no_prefetch_size = Prefetch::NoPrefetchSize(nrows);

if (is_root) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask what's the difference between the is_root and existing contiguousBlock?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contiguous block is mostly possible on the root node and it's really rare case when nodes with depth > 0 have contiguous blocks so there is no need to handle such cases specifically.
Also is_root parameter handles the reset to zero node_ids buffer. As described above we need to track the node ids for each row (rows are not grouped for each node) so we have to reset nod_ids buffer to zero at the start of each tree building

src/common/hist_builder.h Outdated Show resolved Hide resolved
const uint32_t row_end,
const GHistIndexMatrix& gmat,
const BinIdxType* numa_data,
uint16_t* nodes_ids,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share why do we need to pass nodes_id for building histogram instead of passing the histogram buffer for this particular node? This way we can have a small scope for this function and makes it easier to test.

Also, I think the node should have 32-bit type bst_node_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.
Changes in hist_builder.h were introduced due to changes in partition builder.
Before optimizations "full" bin matrix went through the memory for each node on the same level during histogram building: potentially for each node on specific level we have to read the bin matrix almost from the beginning to the end with random strides. The bandwidth of the memory is significantly dropped in such conditions.
So to mitigate this problem in optimized code we have "sorted" row indecies (but no sorting is done exactly! :) ) for each specific level and as a side effect (rows are not grouped by nodes) we have to know what node_id specific row is belong to. node_id is required to know what node's histogram should be updated with specific bin matrix row.

Sorry if description is not clear, please let me know and I'll try to add more details.

As for 32-bit type usage, now it works for max_depth <= 16 perfectly. I have to check the performance impact before changing it from 16 to 32 bit and I hope there won't be any reason to add template parameter for it.

Copy link
Member

@trivialfis trivialfis Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed explanation! That is an insightful optimization.

For future references with these types of optimization, I think you should put that on the front page of PR before anything else and in the code comment, and potentially open an RFC like #7308 before making the change, so we can be clear about the design choice. (I write design doc from time to time just to keep myself clear about the changes ...). For this PR, I asked myself again, there's no way I can guess what you are trying to do by looking at the code. ;-)

So, let me try to summarize your summary ;-) Feel free to correct me if I'm wrong. Instead of asking which row should we process given a node index, you ask which node should we process given a row index. This optimization is only applicable when the depth-wise grow policy is used, so the code is split and specialized.

Can you design an intuitive interface for the partitioner and the histogram builder so that the code can be easily reused? I can give it some thoughts if you can confirm my summary.

As for 32-bit type usage, now it works for max_depth <= 16 perfectly.

I don't think this optimization is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of asking which row should we process given a node index, you ask which node should we process given a row index. This optimization is only applicable when the depth-wise grow policy is used, so the code is split and specialized.

It's clear and right summary :)
Yes. I had to specialize the partition builder related code for depth-wise and loss-guided grow polices, as for loss-guided policy we have to know exactly the set of rows splited to left and to right child nodes.

Can you design an intuitive interface for the partitioner and the histogram builder so that the code can be easily reused? I can give it some thoughts if you can confirm my summary.

Would it be applicable if the OptPartitionBuilder class supported the depthwise and lossguided policies (strategy pattern) instead of is_loss_guided template usage?
As for histogram builder, there is no significant changes in BuildHistKernel, only node_id usage which is applicable for both growing policies.

const size_t icol_end = any_missing ? row_ptr[ri+1] : icol_start + n_features;
const size_t row_size = icol_end - icol_start;
const size_t idx_gh = two * ri;
const uint32_t nid = is_root ? 0 : mapping_ids[nodes_ids[ri]];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me what is this mapping doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simple mapping of the real node id (in the currently build tree) to node number on current depth (0 <= nid < 2^current_depth). This mapping allows to re-use allocated histograms.

src/common/hist_builder.h Outdated Show resolved Hide resolved
template<typename FPType, bool do_prefetch,
typename BinIdxType, bool is_root,
bool any_missing>
void BuildHistKernel(const std::vector<GradientPair>& gpair,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent that it's now in a header so we don't have to define those specializations!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, seems some simplification was added :)

@ShvetsKS
Copy link
Contributor Author

Hi, I took another look, but only at the histogram building kernel this time. Can we start by working on this function first? Merging smaller functions makes review and test easier, also has the benefit of decoupling different code sections. What do you think?

@trivialfis Thanks a lot for provided review!
I'd love to follow your proposal, but main changes in histogram builder (handling of node ids, mapping, memory allocation etc.) are introduced to support changes in partition builder and not applicable without it. Seems only the movement of build hist kernels to header file to avoid template specialization can be decoupled. But I'll try to simplify it anyway.

@@ -204,6 +204,13 @@ class OptPartitionBuilder {
if (!all_dense) {
std::vector<size_t>& local_states = states[tid];
std::vector<bool>& local_default_flags = default_flags[tid];
if (row_indices_begin >= row_indices_end) {
Copy link
Member

@trivialfis trivialfis Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I saw that you have pushed a fix, presumably found during some local testing. But really, there's no way I can guess what you are trying to fix. These few lines look like a hacky workaround to some edge cases. It's fine, I do that all the time during development. But after the trial and error period, can we think of a more general design that doesn't have these conditions at all? Or make some cleanup so that the code can explain itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this fix. Thanks for comment!
Seems, currently it doesn't look like some hacky workaround ;)
And I agree that clear and more general design should be proposed for OptPartitionBuilder, possibly with strategy patter applying.

@trivialfis
Copy link
Member

I'm happy to schedule a call if needed. I think we have some different points of view and would be great to share more.

@ShvetsKS ShvetsKS force-pushed the optimization_part_applysplit branch 3 times, most recently from 906d5a9 to d155edc Compare May 14, 2022 17:37
@ShvetsKS ShvetsKS force-pushed the optimization_part_applysplit branch 2 times, most recently from bf0705f to b327799 Compare May 15, 2022 13:49
@ShvetsKS ShvetsKS force-pushed the optimization_part_applysplit branch from b327799 to 3b08089 Compare May 15, 2022 14:13
@ShvetsKS
Copy link
Contributor Author

ShvetsKS commented May 15, 2022

Thank you for continuing the work on optimization. But looking at the code I think my comment will be similar to previous ones. Can we work on smaller chunks of code first and merge the rest incrementally?

All changes in this PR are related to each other: new partition builder required to introduce changes into histogram building kernels, and each part doesn't work separetly.
As common histogram building kernels are used in 'approx' and 'hist' methods, I had to introduce changes in 'approx' builder also. Now new optimized partition builder is applicable for both methods and even common RowPartitioner was introduced to avoid existed code duplication :)
@trivialfis Do you think it's better to split changes in 'approx' and 'hist' methods into separate PR's?
Now I see only three not addressed comments related only to 'opt_partition_builder.h' about number of fileds and parameters, sorry for not addressing it yet. I'll try to simplify the code.

@ShvetsKS ShvetsKS force-pushed the optimization_part_applysplit branch from d41893c to 2525030 Compare May 22, 2022 20:42
razdoburdin pushed a commit to razdoburdin/xgboost that referenced this pull request Jun 13, 2022
razdoburdin pushed a commit to razdoburdin/xgboost that referenced this pull request Jun 14, 2022
@razdoburdin
Copy link
Contributor

Hello, everyone.
I have started decomposition of this PR into the smaller parts. The first one is presented here: #7991

@ShvetsKS ShvetsKS force-pushed the optimization_part_applysplit branch 6 times, most recently from 3ccb329 to 7780172 Compare June 26, 2022 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants