Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CPU input for device QuantileDMatrix. #8136

Merged
merged 10 commits into from Aug 11, 2022

Conversation

trivialfis
Copy link
Member

  • Copy GHistIndexMatrix to Ellpack when needed.

auto r_end = d_row_ptr[ridx + 1];
size_t rsize = r_end - r_begin;

if (ifeature >= rsize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you set the null values here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Let me try to merge the kernels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged.

* - The CPU format and the GPU format are different, the former uses a CSR + CSC for
* histogram index while the latter uses only Ellpack. This results into a design that
* we can obtain the GPU format from CPU but not the other way around since we can't
* recover the CSC from Ellpack. More concretely, if users want to construct a CPU
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the problem with obtaining CSC from ellpack?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to get the feature index for each element from ellpack?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's encoded in the bin number, which are the values in Ellpack. If the dataset is sparse you would have to look this up with binary search I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't think of that. Yup, we can recover the feature index by binary searching the cut values. Will update the comment and leave it for a different PR for copying data from GPU to CPU.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the note. Will work on the other direction of conversion.

@trivialfis trivialfis closed this Aug 11, 2022
@trivialfis trivialfis reopened this Aug 11, 2022
@trivialfis trivialfis merged commit 16bca5d into dmlc:master Aug 11, 2022
@trivialfis trivialfis deleted the ghist-ellpack branch August 11, 2022 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants