Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getData function for DMatrix #9379

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jinmfeng001
Copy link
Contributor

The python API has a get_data function, add a similar api for jvm-package too.

@jinmfeng001 jinmfeng001 force-pushed the add_get_data branch 3 times, most recently from b170d57 to caa4d9f Compare July 13, 2023 09:38
@jinmfeng001 jinmfeng001 force-pushed the add_get_data branch 5 times, most recently from 16e5ee7 to dcb94ac Compare July 26, 2023 15:04
@trivialfis
Copy link
Member

cc @wbo4958 .

@wbo4958
Copy link
Contributor

wbo4958 commented Jul 31, 2023

I will check it today. sorry for late


for (int row = 0; row < rowNum; row++) {
for(int col = 0; col < colNum; col++){
denseMatrix.set(row, col, 0.0f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to set the default to 0.0f when creating BigDenseMatrix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late response. No such functions provided by BigDenseMatrix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just return the CSR instead? Converting a CSR to a dense can be extremely memory hungry when input is sparse. Some datasets can span a few thousands of features with only a small portion of valid values due to encoding.

int[] featureIndex = new int[nonMissingNum];
float[] featureValue = new float[nonMissingNum];

XGBoostJNI.checkCall(XGBoostJNI.XGDMatrixGetDataAsCSR(handle, "{}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trivialfis, is there an API to get the dense matrix directly instead of converting CSR data to the dense dmatrix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking the source code, I didn't find such functions. But @trivialfis can confirm it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, DMatrix itself is a CSR matrix.

BigDenseMatrix denseMatrix = new BigDenseMatrix(rowNum, colNum);

for (int row = 0; row < rowNum; row++) {
for(int col = 0; col < colNum; col++){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for(int col = 0; col < colNum; col++) {

@wbo4958
Copy link
Contributor

wbo4958 commented Jul 31, 2023

some minor comments, Overall, LGTM.

@trivialfis trivialfis added this to In progress in 2.1 Roadmap via automation Aug 22, 2023
Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than converting to a dense matrix, the PR looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
2.1 Roadmap
In progress
Development

Successfully merging this pull request may close these issues.

None yet

3 participants