Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to return an array in metrics if multi-output #2200

Closed
mblondel opened this issue Jul 24, 2013 · 22 comments
Closed

option to return an array in metrics if multi-output #2200

mblondel opened this issue Jul 24, 2013 · 22 comments
Labels
Easy Well-defined and straightforward way to resolve Enhancement

Comments

@mblondel
Copy link
Member

Thanks to the work of @arjoly, regression metrics now support multiple outputs (2d Y). Currently, the metrics return a scalar. It would be nice to have an option to return an array of size n_outputs.

@mblondel
Copy link
Member Author

Also, multiple outputs are currently handled by flattening 2d arrays and view them as 1d array. This corresponds to micro averaging. For my application, I would prefer macro averaging (averaging over classes). For example, for the R^2 score that would be: np.mean([r2_score(Y_true[:, k], Y_pred[:, k]) for k in xrange(Y_true.shape[1])])

CC @ogrisel

@mblondel
Copy link
Member Author

I'm not even sure micro averaging makes sense at all here. To be discussed...

@ogrisel
Copy link
Member

ogrisel commented Jul 24, 2013

Sounds like a reasonable request although I don't have any practical experience with scoring multi target / output regression model my self.

@arjoly
Copy link
Member

arjoly commented Jul 25, 2013

This makes sense as well a weighting the output.

@MechCoder
Copy link
Member

@ogrisel , @mblondel Hi, I would like to work on this issue, I just skimmed through the metrics, are you referring to something like this, in the r**2 implementation?

y_true = [[0.5, 1], [-1, 1], [7, -6]]
y_pred = [[0, 2], [-1, 2], [8, -5]]
r2_score(y_true, y_pred)
0.938                                # It currently returns
[0.96551724,  0.91588785]  # It should return something like this?

Do correct me if I'm wrong.

@mblondel
Copy link
Member Author

mblondel commented Oct 3, 2013

@manoj-kumar-s Yep, exactly. Thanks!

@mblondel
Copy link
Member Author

mblondel commented Oct 3, 2013

It should be an option though (e.g., multi_output=True).

@MechCoder
Copy link
Member

Great. I'm on it. I'll hopefully come up with a PR in 2-3 days.

@arjoly
Copy link
Member

arjoly commented Oct 3, 2013

It should be an option though (e.g., multi_output=True).

I would use the keyword average to be consistent with the rest of the metrics.

@mblondel
Copy link
Member Author

mblondel commented Oct 7, 2013

average='micro'|'macro'|None|False

Currently only micro average (average over the samples) is implemented but IMO macro average (average over classes) makes more sense.

@arjoly
Copy link
Member

arjoly commented Oct 10, 2013

Also, multiple outputs are currently handled by flattening 2d arrays and view them as 1d array. This corresponds to micro averaging. For my application, I would prefer macro averaging (averaging over classes). For example, for the R^2 score that would be: np.mean([r2_score(Y_true[:, k], Y_pred[:, k]) for k in xrange(Y_true.shape[1])])

Is it really micro-averaged r2 score ? Just a small experiment

In [1]: import numpy as np
In [2]: y_true = np.random.rand(5, 3)
In [3]: y_pred = np.random.rand(5, 3)
In [4]: from sklearn.metrics import r2_score

# Current multi-output r2_score
In [5]: r2_score(y_true, y_pred)
Out[5]: -1.2018060998146924

 # It would be micro-r2 score
In [6]: r2_score(y_true.ravel(), y_pred.ravel())
Out[6]: -1.1395845816752996

In [7]: from sklearn.metrics import explained_variance_score

# Check that it's equal to r2_score in this case
In [8]: explained_variance_score(y_true.ravel(), y_pred.ravel()) 
Out[8]: -1.132385768714816

# r2-score with no averaging
In [9]: r2 = [r2_score(y_true[:, i], y_pred[:, i]) for i in range(y_true.shape[1])] 
In [10]: r2
Out[10]: [-1.0513131617660676, -1.2263410810199482, -1.2582117503263115]

# It would be macro-r2 score
In [11]: np.mean(r2) 
Out[11]: -1.178621997704109

# For reproducibility
In [12]: y_true
Out[12]: 
array([[ 0.28481499,  0.34159449,  0.89364091],
       [ 0.08516499,  0.24426185,  0.58491767],
       [ 0.65374035,  0.78358486,  0.84892285],
       [ 0.12355558,  0.32354626,  0.02966046],
       [ 0.65858239,  0.59705347,  0.00573082]])

In [13]: y_pred
Out[13]: 
array([[ 0.32639174,  0.87657742,  0.23203866],
       [ 0.66826156,  0.06449232,  0.21180403],
       [ 0.19938095,  0.65445628,  0.13731781],
       [ 0.19451816,  0.10242323,  0.50932089],
       [ 0.95501124,  0.33805111,  0.61441609]])

@mblondel
Copy link
Member Author

Interesting... How is the returned value computed for In [5] then?

@arjoly
Copy link
Member

arjoly commented Oct 10, 2013

The denominator is computed differently.

    numerator = ((y_true - y_pred) ** 2).sum(dtype=np.float64)
    denominator = ((y_true - y_true.mean(axis=0)) ** 2).sum(dtype=np.float64)

@mblondel
Copy link
Member Author

I think that the above could be called micro average in the sense that you compute the 2d-array ((y_true - y_true.mean(axis=0)) ** 2) then sum over it with axis=None. But this is indeed different from flattening the entire array.

But I'm starting to think that we should only support macro average, i.e., the average of the per-output scores.

@MechCoder
Copy link
Member

I am finally making sense of this discussion here.

Macro averaging is the same as doing np.mean(r2_score(array, average=None)) in my branch.
I'm a bit confused about micro averaging though, does it mean you flatten the 2-D array into a 1-D array, and perform the calculation.
I think doing

denominator = ((y_true - y_true.mean()) ** 2).sum(dtype=np.float64)

would do the trick right? which is equivalent, to flattening it to a 1-D array.
Is there any textbook definition for micro averaging?

@MechCoder
Copy link
Member

And what do you think would be the best thing to do in my PR right now? Just implement average=None, for the multi-output case and average="macro" which corresponds to the mean of the average=None case?

@mblondel
Copy link
Member Author

The concepts of micro and macro averages arise when computing metrics which are originally designed for binary classification (e.g., precision, recall) in the multiclass case. micro=average over instances, macro=average over classes.

Here, I think that macro average makes the most sense (average over outputs). "micro" average seems a bit ambiguous and ill-defined.

@MechCoder
Copy link
Member

Got it. So the best thing to do now, would be just to have a None and macro case?

@mblondel
Copy link
Member Author

Let's wait for other people's opinion. The macro case can be implemented recursively (e.g., by calling np.mean(r2_score(..., average=None)) inside r2_score. I don't think there's much to gain by vectorizing the operations.

@arjoly
Copy link
Member

arjoly commented Jul 20, 2014

To not lose discussion in #2493

During the sprint, we discuss (me, @eickenberg and @MechCoder ) about the blocking points of this pull request. It turns ont the difference between macro-averaging and and the current implementation could be solved using output_weights properly.

The macro-r2 / macro-explained variance correspond to uniform output_weight (= 1 / n_outputs) and the current version use output_weight proportional to the fraction of variance explained by each output.

Thus we decided to keep both version. I am also fine with changing default to macro.

@amueller
Copy link
Member

amueller commented May 8, 2015

Closed by #4491, right?

@MechCoder
Copy link
Member

yes indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve Enhancement
Projects
None yet
Development

No branches or pull requests

5 participants