New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
option to return an array in metrics if multi-output #2200
Comments
Also, multiple outputs are currently handled by flattening 2d arrays and view them as 1d array. This corresponds to micro averaging. For my application, I would prefer macro averaging (averaging over classes). For example, for the R^2 score that would be: CC @ogrisel |
I'm not even sure micro averaging makes sense at all here. To be discussed... |
Sounds like a reasonable request although I don't have any practical experience with scoring multi target / output regression model my self. |
This makes sense as well a weighting the output. |
@ogrisel , @mblondel Hi, I would like to work on this issue, I just skimmed through the metrics, are you referring to something like this, in the
Do correct me if I'm wrong. |
@manoj-kumar-s Yep, exactly. Thanks! |
It should be an option though (e.g., |
Great. I'm on it. I'll hopefully come up with a PR in 2-3 days. |
I would use the keyword |
Currently only micro average (average over the samples) is implemented but IMO macro average (average over classes) makes more sense. |
Is it really micro-averaged r2 score ? Just a small experiment In [1]: import numpy as np
In [2]: y_true = np.random.rand(5, 3)
In [3]: y_pred = np.random.rand(5, 3)
In [4]: from sklearn.metrics import r2_score
# Current multi-output r2_score
In [5]: r2_score(y_true, y_pred)
Out[5]: -1.2018060998146924
# It would be micro-r2 score
In [6]: r2_score(y_true.ravel(), y_pred.ravel())
Out[6]: -1.1395845816752996
In [7]: from sklearn.metrics import explained_variance_score
# Check that it's equal to r2_score in this case
In [8]: explained_variance_score(y_true.ravel(), y_pred.ravel())
Out[8]: -1.132385768714816
# r2-score with no averaging
In [9]: r2 = [r2_score(y_true[:, i], y_pred[:, i]) for i in range(y_true.shape[1])]
In [10]: r2
Out[10]: [-1.0513131617660676, -1.2263410810199482, -1.2582117503263115]
# It would be macro-r2 score
In [11]: np.mean(r2)
Out[11]: -1.178621997704109
# For reproducibility
In [12]: y_true
Out[12]:
array([[ 0.28481499, 0.34159449, 0.89364091],
[ 0.08516499, 0.24426185, 0.58491767],
[ 0.65374035, 0.78358486, 0.84892285],
[ 0.12355558, 0.32354626, 0.02966046],
[ 0.65858239, 0.59705347, 0.00573082]])
In [13]: y_pred
Out[13]:
array([[ 0.32639174, 0.87657742, 0.23203866],
[ 0.66826156, 0.06449232, 0.21180403],
[ 0.19938095, 0.65445628, 0.13731781],
[ 0.19451816, 0.10242323, 0.50932089],
[ 0.95501124, 0.33805111, 0.61441609]]) |
Interesting... How is the returned value computed for In [5] then? |
The denominator is computed differently. numerator = ((y_true - y_pred) ** 2).sum(dtype=np.float64)
denominator = ((y_true - y_true.mean(axis=0)) ** 2).sum(dtype=np.float64) |
I think that the above could be called micro average in the sense that you compute the 2d-array But I'm starting to think that we should only support macro average, i.e., the average of the per-output scores. |
I am finally making sense of this discussion here. Macro averaging is the same as doing
would do the trick right? which is equivalent, to flattening it to a 1-D array. |
And what do you think would be the best thing to do in my PR right now? Just implement |
The concepts of micro and macro averages arise when computing metrics which are originally designed for binary classification (e.g., precision, recall) in the multiclass case. micro=average over instances, macro=average over classes. Here, I think that macro average makes the most sense (average over outputs). "micro" average seems a bit ambiguous and ill-defined. |
Got it. So the best thing to do now, would be just to have a |
Let's wait for other people's opinion. The |
To not lose discussion in #2493
|
Closed by #4491, right? |
yes indeed. |
Thanks to the work of @arjoly, regression metrics now support multiple outputs (2d Y). Currently, the metrics return a scalar. It would be nice to have an option to return an array of size n_outputs.
The text was updated successfully, but these errors were encountered: