Initial support for multioutput regression. #7514

trivialfis · 2021-12-16T21:28:17Z

Close #7309 .

Add num target model parameter, which is configured from input labels.
Change elementwise metric and indexing for weights.
Add demo.
Add tests.

src/objective/regression_obj.cu

src/metric/elementwise_metric.cu

tests/cpp/metric/test_multiclass_metric.cc

trivialfis · 2021-12-16T22:02:06Z

src/objective/regression_obj.cu

@@ -83,8 +88,10 @@ class RegLossObj : public ObjFunction {
    // for better performance.
    const size_t n_data_blocks = std::max(static_cast<size_t>(1), (on_device ? ndata : nthreads));
    const size_t block_size = ndata / n_data_blocks + !!(ndata % n_data_blocks);
+    auto const n_targets = std::max(info.labels.Shape(1), static_cast<size_t>(1));
+


We should use more ndarray friendly ways to express the calculation in the future.

hcho3 · 2021-12-16T22:08:42Z

Can we throw an error when the user attempt to use other objectives with multi-output labels?

trivialfis · 2021-12-17T01:05:35Z

Can we throw an error when the user attempt to use other objectives with multi-output labels?

Added a check along with test. Thanks for the suggestion.

trivialfis · 2021-12-17T05:27:59Z

The support is primitive. We need to expand it to other regression objectives and add more documents.

Craigacp · 2021-12-18T01:38:02Z

Is this API suitable for wrapping up in XGBoost4J, or do you want to build out a more general one before doing that?

trivialfis · 2021-12-18T02:06:12Z

The API is good for regression and binary classification and should be stable unless major issue is found. So it's good enough to start looking into language bindings. I will try to add xgboost4j support later. Will keep you posted on the progress.

It's unlikely that we will implement multi class multi target based on existing interface so no need to worry about it at the moment. :-)

trivialfis · 2022-02-21T03:58:51Z

@Craigacp Apologies for the slow progress. I looked into the JVM packages and I can't build a complete stack for jvm from spark down to basic java wrapper. The required change is not trivial since we will have 2-dim inputs for both base_margin and label, I'm not sure what's the best way to implement that for jvm packages.

Craigacp · 2022-02-22T01:43:46Z

Do you have a branch somewhere with the current state of it? I'd assumed that DMatrix would grow to accept a 2d matrix for the targets and then most of the rest of the changes would be plumbing, but if there's something that needs some design effort I can take a look.

trivialfis · 2022-02-22T05:12:27Z

@Craigacp The master branch contains support for Python. If you are interested in a code walk I'm happy to chat offline.

I'd assumed that DMatrix would grow to accept a 2d matrix for the targets

Yes, that's pretty much the only requirement for regression. For multi-label classification some configuration needs to be done to make sure binary:logistic is used and no num_class is passed in.

The difficult part for me is just not being familiar with the jvm stack ...

trivialfis · 2022-02-22T05:32:09Z

On Python we use a JSON string to represent the input memory buffer, so we have complete information about shape and strides, etc. I'm not sure what's the best way to handle these numeric data types on JVM.

Craigacp · 2022-02-23T01:13:55Z

Ok, I should have some time next week to look through how the python code works. I might take you up on that code walk through once I've familiarised myself with it a bit.

The length of the header array should tell us the number of examples, and then for multidimensional things we can require that the target array not be sparse and have explicit zeros (though that might be pretty wasteful for multi-label classification). Then the target array is of known shape [numExamples, numOutputDimensions] linearised into a single vector (row-wise) and that should be enough information for the C API. But there might be a better way to do it.

hcho3 reviewed Dec 16, 2021

View reviewed changes

src/objective/regression_obj.cu Outdated Show resolved Hide resolved

trivialfis commented Dec 16, 2021

View reviewed changes

trivialfis added this to 1.6 In Progress in 2.0 Roadmap via automation Dec 17, 2021

trivialfis requested a review from hcho3 December 17, 2021 05:25

trivialfis added 17 commits December 18, 2021 01:49

Implement multi-output regression.

dd11715

Fix.

70967a3

Fix.

2f3a554

Validate the shape.

4eac681

Fix and check.

536c230

Cleanup.

9c3d31f

Fix.

a44f7d6

Fix.

a2ed7eb

Fix dask.

b73ba53

Clean up.

ca5c7bc

Test.

89b8f78

Fix.

f31fdee

Fix cupy test.

51a3a96

Fix from cudf.

0de17eb

Fix dask GPU hist.

bb12804

Empty partition.

33b54fc

Pass meta info instead of DMatrix.

01e8a2c

trivialfis force-pushed the multi-output-reg branch from 0a1f857 to 01e8a2c Compare December 17, 2021 17:49

Extract a function.

b6e0673

hcho3 approved these changes Dec 18, 2021

View reviewed changes

trivialfis merged commit 58a6723 into dmlc:master Dec 18, 2021

2.0 Roadmap automation moved this from 1.6 In Progress to 1.6 Done Dec 18, 2021

trivialfis deleted the multi-output-reg branch December 18, 2021 01:28

This was referenced Dec 18, 2021

Initial support for multi-output regression. #7309

Closed

Multiple output regression #2087

Closed

DaskDeviceQuantileDMatrix hangs with empty partitions unlike DaskDMatrix with Dask-cuDF inputs #7494

Closed

loretoparisi mentioned this pull request Dec 18, 2021

Multioutput regression abhishekkrthakur/autoxgb#13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for multioutput regression. #7514

Initial support for multioutput regression. #7514

trivialfis commented Dec 16, 2021

trivialfis Dec 16, 2021

hcho3 commented Dec 16, 2021

trivialfis commented Dec 17, 2021

trivialfis commented Dec 17, 2021

Craigacp commented Dec 18, 2021

trivialfis commented Dec 18, 2021

trivialfis commented Feb 21, 2022

Craigacp commented Feb 22, 2022

trivialfis commented Feb 22, 2022 •

edited

trivialfis commented Feb 22, 2022

Craigacp commented Feb 23, 2022

Initial support for multioutput regression. #7514

Initial support for multioutput regression. #7514

Conversation

trivialfis commented Dec 16, 2021

trivialfis Dec 16, 2021

Choose a reason for hiding this comment

hcho3 commented Dec 16, 2021

trivialfis commented Dec 17, 2021

trivialfis commented Dec 17, 2021

Craigacp commented Dec 18, 2021

trivialfis commented Dec 18, 2021

trivialfis commented Feb 21, 2022

Craigacp commented Feb 22, 2022

trivialfis commented Feb 22, 2022 • edited

trivialfis commented Feb 22, 2022

Craigacp commented Feb 23, 2022

trivialfis commented Feb 22, 2022 •

edited