Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use adapter to initialize column matrix. #7912

Merged
merged 3 commits into from May 18, 2022

Conversation

trivialfis
Copy link
Member

  • Add an adapter for SparsePage. Might seem weird, but we already have something similar in GPU code and predictor. We can unify the code further by treating SparsePage as an ordinary adapter batch.
  • Implement initilaization for column matrix from adapter.
  • More precise tests.

@trivialfis
Copy link
Member Author

This is to implement #7890 .

Comment on lines 303 to 308
auto line = batch.GetLine(rid);
for (size_t i = 0; i < line.Size(); ++i) {
auto coo = line.GetElement(i);
if (data::IsValidFunctor {missing}(coo)) {
auto fid = coo.column_idx;
const uint32_t bin_id = row_index[k];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the performance implication of abstracting the element access? Is the overhead acceptable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, wouldn't it be inefficient to create the functor in a tight-loop? Let's create the functor instance outside the loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the overhead acceptable?

It's not the bottleneck. The major issue with this function is the boolean vector missing_flags_, which is not thread-safe. #7208 is likely to rewrite many of these anyway, right now I just want to make sure some of the features are merged before optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now I just want to make sure some of the features are merged before optimization.

I'm not against optimization (and like them), just trying to show a big picture of what we need in these data structures before we go nuts with optimization.

@trivialfis trivialfis merged commit 19775ff into dmlc:master May 18, 2022
@trivialfis trivialfis deleted the column-adapter branch May 18, 2022 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants