You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When instantiating the multiclass (or multilabel) accuracy metric through the Accuracy wrapper class (legacy), the default value for average is micro. When instantiating directly through MulticlassAccuracy (new way since 0.11 I believe), the default value is macro. This is inconsistent, which can lead to very unexpected results.
The same is true for all metrics that are subclasses of MulticlassStatScores, BinaryStatScores or MultilabelStatScores as well as their respective functional interfaces.
To Reproduce
Instantiate the metrics directly as well as through the wrapper.
TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): >=0.11 (1.3 in my case)
Python & PyTorch Version (e.g., 1.0): irrelevant
Any other relevant information such as OS (e.g., Linux): irrelevant
Additional context
I would argue that in the case of accuracy the default being macro in the task-specific classes is not only inconsistent with legacy but actually wrong. The common deinition of accuracy is
which is how accuracy is computed when setting average="micro".
Setting average="macro" can still be useful, as it is less prone to class imbalance. However, I think TorchMetrics should adhere to common definitions with the default settings, and would therefore argue for making micro the default.
The same is kind of true for precision and recall, which are also commonly defined as micro averages, if they are defined globally at all. Usually we encounter recall and precision as class-wise metrics.
The text was updated successfully, but these errors were encountered:
🐛 Bug
When instantiating the multiclass (or multilabel) accuracy metric through the
Accuracy
wrapper class (legacy), the default value foraverage
ismicro
. When instantiating directly throughMulticlassAccuracy
(new way since 0.11 I believe), the default value ismacro
. This is inconsistent, which can lead to very unexpected results.The same is true for all metrics that are subclasses of
MulticlassStatScores
,BinaryStatScores
orMultilabelStatScores
as well as their respective functional interfaces.To Reproduce
Code sample
Expected behavior
Consistency between the different interfaces.
Environment
conda
,pip
, build from source): >=0.11 (1.3 in my case)Additional context
I would argue that in the case of accuracy the default being
macro
in the task-specific classes is not only inconsistent with legacy but actually wrong. The common deinition of accuracy iswhich is how accuracy is computed when setting
average="micro"
.Setting
average="macro"
can still be useful, as it is less prone to class imbalance. However, I think TorchMetrics should adhere to common definitions with the default settings, and would therefore argue for makingmicro
the default.The same is kind of true for precision and recall, which are also commonly defined as micro averages, if they are defined globally at all. Usually we encounter recall and precision as class-wise metrics.
The text was updated successfully, but these errors were encountered: