option to return an array in metrics if multi-output #2200

mblondel · 2013-07-24T04:42:24Z

Thanks to the work of @arjoly, regression metrics now support multiple outputs (2d Y). Currently, the metrics return a scalar. It would be nice to have an option to return an array of size n_outputs.

mblondel · 2013-07-24T05:30:07Z

Also, multiple outputs are currently handled by flattening 2d arrays and view them as 1d array. This corresponds to micro averaging. For my application, I would prefer macro averaging (averaging over classes). For example, for the R^2 score that would be: np.mean([r2_score(Y_true[:, k], Y_pred[:, k]) for k in xrange(Y_true.shape[1])])

CC @ogrisel

mblondel · 2013-07-24T05:33:14Z

I'm not even sure micro averaging makes sense at all here. To be discussed...

ogrisel · 2013-07-24T13:57:45Z

Sounds like a reasonable request although I don't have any practical experience with scoring multi target / output regression model my self.

arjoly · 2013-07-25T10:10:53Z

This makes sense as well a weighting the output.

MechCoder · 2013-10-02T20:43:02Z

@ogrisel , @mblondel Hi, I would like to work on this issue, I just skimmed through the metrics, are you referring to something like this, in the r**2 implementation?

y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]
r2_score(y_true, y_pred)
0.938                                # It currently returns
[0.96551724,  0.91588785]  # It should return something like this?

Do correct me if I'm wrong.

mblondel · 2013-10-03T01:52:08Z

@manoj-kumar-s Yep, exactly. Thanks!

mblondel · 2013-10-03T01:57:11Z

It should be an option though (e.g., multi_output=True).

MechCoder · 2013-10-03T04:39:40Z

Great. I'm on it. I'll hopefully come up with a PR in 2-3 days.

arjoly · 2013-10-03T07:39:10Z

It should be an option though (e.g., multi_output=True).

I would use the keyword average to be consistent with the rest of the metrics.

mblondel · 2013-10-07T01:54:44Z

average='micro'|'macro'|None|False

Currently only micro average (average over the samples) is implemented but IMO macro average (average over classes) makes more sense.

arjoly · 2013-10-10T08:03:12Z

Also, multiple outputs are currently handled by flattening 2d arrays and view them as 1d array. This corresponds to micro averaging. For my application, I would prefer macro averaging (averaging over classes). For example, for the R^2 score that would be: np.mean([r2_score(Y_true[:, k], Y_pred[:, k]) for k in xrange(Y_true.shape[1])])

Is it really micro-averaged r2 score ? Just a small experiment

In [1]: import numpy as np
In [2]: y_true = np.random.rand(5, 3)
In [3]: y_pred = np.random.rand(5, 3)
In [4]: from sklearn.metrics import r2_score

# Current multi-output r2_score
In [5]: r2_score(y_true, y_pred)
Out[5]: -1.2018060998146924

 # It would be micro-r2 score
In [6]: r2_score(y_true.ravel(), y_pred.ravel())
Out[6]: -1.1395845816752996

In [7]: from sklearn.metrics import explained_variance_score

# Check that it's equal to r2_score in this case
In [8]: explained_variance_score(y_true.ravel(), y_pred.ravel()) 
Out[8]: -1.132385768714816

# r2-score with no averaging
In [9]: r2 = [r2_score(y_true[:, i], y_pred[:, i]) for i in range(y_true.shape[1])] 
In [10]: r2
Out[10]: [-1.0513131617660676, -1.2263410810199482, -1.2582117503263115]

# It would be macro-r2 score
In [11]: np.mean(r2) 
Out[11]: -1.178621997704109

# For reproducibility
In [12]: y_true
Out[12]: 
array([[ 0.28481499,  0.34159449,  0.89364091],
       [ 0.08516499,  0.24426185,  0.58491767],
       [ 0.65374035,  0.78358486,  0.84892285],
       [ 0.12355558,  0.32354626,  0.02966046],
       [ 0.65858239,  0.59705347,  0.00573082]])

In [13]: y_pred
Out[13]: 
array([[ 0.32639174,  0.87657742,  0.23203866],
       [ 0.66826156,  0.06449232,  0.21180403],
       [ 0.19938095,  0.65445628,  0.13731781],
       [ 0.19451816,  0.10242323,  0.50932089],
       [ 0.95501124,  0.33805111,  0.61441609]])

mblondel · 2013-10-10T09:41:36Z

Interesting... How is the returned value computed for In [5] then?

arjoly · 2013-10-10T12:52:30Z

The denominator is computed differently.

    numerator = ((y_true - y_pred) ** 2).sum(dtype=np.float64)
    denominator = ((y_true - y_true.mean(axis=0)) ** 2).sum(dtype=np.float64)

mblondel · 2013-10-10T13:09:33Z

I think that the above could be called micro average in the sense that you compute the 2d-array ((y_true - y_true.mean(axis=0)) ** 2) then sum over it with axis=None. But this is indeed different from flattening the entire array.

But I'm starting to think that we should only support macro average, i.e., the average of the per-output scores.

MechCoder · 2013-10-10T13:45:01Z

I am finally making sense of this discussion here.

Macro averaging is the same as doing np.mean(r2_score(array, average=None)) in my branch.
I'm a bit confused about micro averaging though, does it mean you flatten the 2-D array into a 1-D array, and perform the calculation.
I think doing

denominator = ((y_true - y_true.mean()) ** 2).sum(dtype=np.float64)

would do the trick right? which is equivalent, to flattening it to a 1-D array.
Is there any textbook definition for micro averaging?

MechCoder · 2013-10-10T13:53:52Z

And what do you think would be the best thing to do in my PR right now? Just implement average=None, for the multi-output case and average="macro" which corresponds to the mean of the average=None case?

mblondel · 2013-10-10T13:58:53Z

The concepts of micro and macro averages arise when computing metrics which are originally designed for binary classification (e.g., precision, recall) in the multiclass case. micro=average over instances, macro=average over classes.

Here, I think that macro average makes the most sense (average over outputs). "micro" average seems a bit ambiguous and ill-defined.

MechCoder · 2013-10-10T14:09:45Z

Got it. So the best thing to do now, would be just to have a None and macro case?

mblondel · 2013-10-10T14:26:01Z

Let's wait for other people's opinion. The macro case can be implemented recursively (e.g., by calling np.mean(r2_score(..., average=None)) inside r2_score. I don't think there's much to gain by vectorizing the operations.

arjoly · 2014-07-20T09:34:23Z

To not lose discussion in #2493

During the sprint, we discuss (me, @eickenberg and @MechCoder ) about the blocking points of this pull request. It turns ont the difference between macro-averaging and and the current implementation could be solved using output_weights properly.

The macro-r2 / macro-explained variance correspond to uniform output_weight (= 1 / n_outputs) and the current version use output_weight proportional to the fraction of variance explained by each output.

Thus we decided to keep both version. I am also fine with changing default to macro.

amueller · 2015-05-08T17:28:55Z

Closed by #4491, right?

MechCoder · 2015-05-08T17:31:53Z

yes indeed.

MechCoder mentioned this issue Oct 6, 2013

[MRG] Regression metrics return arrays for multi-output cases #2493

Closed

2 tasks

MechCoder closed this as completed May 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

option to return an array in metrics if multi-output #2200

option to return an array in metrics if multi-output #2200

mblondel commented Jul 24, 2013

mblondel commented Jul 24, 2013

mblondel commented Jul 24, 2013

ogrisel commented Jul 24, 2013

arjoly commented Jul 25, 2013

MechCoder commented Oct 2, 2013

mblondel commented Oct 3, 2013

mblondel commented Oct 3, 2013

MechCoder commented Oct 3, 2013

arjoly commented Oct 3, 2013

mblondel commented Oct 7, 2013

arjoly commented Oct 10, 2013

mblondel commented Oct 10, 2013

arjoly commented Oct 10, 2013

mblondel commented Oct 10, 2013

MechCoder commented Oct 10, 2013

MechCoder commented Oct 10, 2013

mblondel commented Oct 10, 2013

MechCoder commented Oct 10, 2013

mblondel commented Oct 10, 2013

arjoly commented Jul 20, 2014

amueller commented May 8, 2015

MechCoder commented May 8, 2015

option to return an array in metrics if multi-output #2200

option to return an array in metrics if multi-output #2200

Comments

mblondel commented Jul 24, 2013

mblondel commented Jul 24, 2013

mblondel commented Jul 24, 2013

ogrisel commented Jul 24, 2013

arjoly commented Jul 25, 2013

MechCoder commented Oct 2, 2013

mblondel commented Oct 3, 2013

mblondel commented Oct 3, 2013

MechCoder commented Oct 3, 2013

arjoly commented Oct 3, 2013

mblondel commented Oct 7, 2013

arjoly commented Oct 10, 2013

mblondel commented Oct 10, 2013

arjoly commented Oct 10, 2013

mblondel commented Oct 10, 2013

MechCoder commented Oct 10, 2013

MechCoder commented Oct 10, 2013

mblondel commented Oct 10, 2013

MechCoder commented Oct 10, 2013

mblondel commented Oct 10, 2013

arjoly commented Jul 20, 2014

amueller commented May 8, 2015

MechCoder commented May 8, 2015