Classification: option to disable input formatting [wip] #1676

SkafteNicki · 2023-03-31T11:00:03Z

What does this PR do?

Fixes #1526
Fixes #1604
Fixes #1989
Fixes #2195
Fixes #2329

Before submitting

Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

Borda · 2023-04-17T12:01:21Z

@SkafteNicki how is it going here? 🐰

samet-akcay · 2023-08-03T10:47:33Z

Hi @SkafteNicki, do you guys have any update on this one please?

Borda · 2023-08-03T16:47:41Z

do you guys have any update on this one please?

We need to address failing checks

Borda · 2023-08-07T20:04:27Z

@SkafteNicki how is it going here? 🐰

Borda · 2023-08-09T13:44:06Z

src/torchmetrics/functional/classification/precision_recall_curve.py

@@ -329,6 +349,16 @@ def binary_precision_recall_curve(
            Specifies a target value that is ignored and does not contribute to the metric calculation
        validate_args: bool indicating if input arguments and tensors should be validated for correctness.
            Set to ``False`` for faster computations.
+        input_format: str or bool specifying the format of the input preds tensor. Can be one of:


yes, this looks good to me...
cc: @Lightning-AI/core-metrics @awaelchli

for more information, see https://pre-commit.ci

codecov · 2024-01-09T23:35:23Z

Codecov Report

Merging #1676 (a7d719b) into master (4c999b8) will decrease coverage by 0%.
Report is 14 commits behind head on master.
The diff coverage is 36%.

Additional details and impacted files

@@          Coverage Diff           @@
##           master   #1676   +/-   ##
======================================
- Coverage      69%     69%   -0%     
======================================
  Files         307     307           
  Lines       17352   17406   +54     
======================================
+ Hits        11961   11992   +31     
- Misses       5391    5414   +23

idc9 · 2024-02-03T15:50:05Z

Thanks for working on the bug in #1604! I suspect this bug is silently affecting people currently using torchmetrics.

It looks like the current solution is a new argument input_formatand gives the user the option to specify if they are providing probabilities, logits, etc (see below documentation). This almost fixes the problem, but this argument defaults to input_format='auto' which will cause exactly the bug discussed in #1604 (i.e. if your logits happen to all be in [0, 1] this will incorrectly fail to convert them to probabilities).

I see two options

Change the default to input_format='logits' (or probs though I think logits is the most common input). This default options is explicit about behavior that is a but subtle and will prevent the bug from happening by defaults
Get rid of the auto option all together. The auto option will always be buggy if the user inputs logits.

I personally see no reason to include the auto option and would vote for option 2.

input_format: str specifying the format of the input preds tensor. Can be one of:

    - ``'auto'``: automatically detect the format based on the values in the tensor. If all values
        are in the [0,1] range, we consider the tensor to be probabilities and only thresholds the values.
        If all values are non-float we consider the tensor to be labels and does nothing. Else we consider the
        tensor to be logits and will apply sigmoid to the tensor and threshold the values.
    - ``'probs'``: preds tensor contains values in the [0,1] range and is considered to be probabilities. Only
        thresholding will be applied to the tensor and values will be checked to be in [0,1] range.
    - ``'logits'``: preds tensor contains values outside the [0,1] range and is considered to be logits. We
        will apply sigmoid to the tensor and threshold the values before calculating the metric.
    - ``'labels'``: preds tensor contains integer values and is considered to be labels. No formatting will be
        applied to preds tensor.
    - ``'none'``: will disable all input formatting. This is the fastest option but also the least safe.

SkafteNicki · 2024-02-04T13:50:55Z

@idc9 thanks for giving your opinions on this issue.
I already laid out on slack some of my opinions, but just for transparency here are the two main reasons for having auto as the default:

To keep everything backwards compatible
To say "The auto option will always be buggy if the user inputs logits." is simply wrong. For reasonably sized input, any model should really output logit values outside the [0,1] range which after sigmoid transformation corresponds to probabilities in the [0.5, 0.73] range. If a model is only outputting values in the 0.5-0.73 probability range, this will mean it is never going to predict the negative class, and it is never confident about the positive class. Both seems highly unlikely for any well trained model.

NoahAtKintsugi · 2024-02-05T19:28:04Z

@SkafteNicki In cases where GPU memory limits batch sizes to be very small, there will be no "reasonably-sized input", so "auto" will be buggy. What about not having "auto", but rather having only "probs" and "logits", with "probs" being the default and verifying that its input is in [0, 1]? This will be backwards compatible, except raise an error in the case that the old version was doing the wrong thing. The error message can suggest the user might mean input_format = "logits".

idc9 · 2024-02-06T13:18:39Z

Sorry I should have been more careful with my language! Let me rephrase “The auto option will always be buggy if the user inputs logits” to “The ‘auto’ option leaves open the possibility for the bug to happen when the user inputs logits”.

Since the auto option leaves open the possibility of this bug occurring I suspect we don’t we want to preserve backwards compatibility. This is a subtle/unexpected issue many users won’t immediately realize it can be an issue. Given it’s straightforward for the user to specify logits/prob, I’m curious if there is any case where ‘auto’ is useful (i.e. should it just be removed)?

As @SkafteNicki pointed out the logits bug will only occur if all of the predicted probabilities lie in [0.5, 0.73]. It’s true this is likely not the usual scenario but it probably does come up. A few examples where it may be more common include: small batches, early phases of model training, difficult applications where hold-out probabilities are (sadly) closeish to 0.5, unbalanced classes, situations where your label prediction rule thresholds the probability at some value different from 0.5.

SkafteNicki · 2024-02-06T13:39:30Z

@NoahAtKintsugi and @idc9 Alright, I am willing to give me on this, but we are then going to break backwards compatibility, which I do not take as a light issue. If we do this change we need to do it right. You are probably right that it is better to break stuff now and then fix this issue that may be a potential hidden problem for some users.

@Borda and @justusschock please give some opinions on this. Do we

continue with input_format="auto" as the default keeping everything backwards compatibility
set input_format="auto" for v1.4 (next release), include user warning that it will be changing to input_format="logits"/"probs" (which one we choose does not matter to me) in v1.5
something else...

for more information, see https://pre-commit.ci

Borda · 2024-02-15T19:19:08Z

set input_format="auto" for v1.4 (next release), include user warning that it will be changing to input_format="logits"/"probs" (which one we choose does not matter to me) in v1.5

I think due to stability, we may add a warning but being having a hard switch from 2.0
cc: @lantiga

idc9 · 2024-02-20T03:36:27Z

Defaulting to logits might make the most sense. I suspect is that it's more common for users to default to outputting logits than outputting probabilities (e.g. https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html, https://github.com/huggingface/pytorch-image-models/blob/main/train.py, https://github.com/pytorch/examples/blob/main/imagenet/main.py) probably since this avoids a small bit of extra computation.

Borda · 2024-03-28T21:51:18Z

@SkafteNicki, how is this one going? :)

initial idea

16a6ee9

SkafteNicki added the bug / fix Something isn't working label Mar 31, 2023

SkafteNicki added this to the v0.12 milestone Mar 31, 2023

SkafteNicki added this to In progress in Classification refactor via automation Mar 31, 2023

SkafteNicki mentioned this pull request Mar 31, 2023

Sigmoid function in PrecisionRecallCurve leads to information loss #1526

Open

Borda added 2 commits March 31, 2023 17:20

Merge branch 'master' into bugfix/disable_input_format

0875df2

Merge branch 'master' into bugfix/disable_input_format

99f2488

Borda assigned SkafteNicki Apr 3, 2023

Borda and others added 2 commits April 3, 2023 10:05

Merge branch 'master' into bugfix/disable_input_format

af183a3

Merge branch 'master' into bugfix/disable_input_format

f5a883b

Borda added 3 commits April 17, 2023 14:01

Merge branch 'master' into bugfix/disable_input_format

424104d

Merge branch 'master' into bugfix/disable_input_format

d937d90

Merge branch 'master' into bugfix/disable_input_format

b52a09a

SkafteNicki modified the milestones: v1.0.0, future Jun 2, 2023

Merge branch 'master' into bugfix/disable_input_format

1c14524

Merge branch 'master' into bugfix/disable_input_format

54e52b5

SkafteNicki modified the milestones: future, v1.0.x Aug 9, 2023

SkafteNicki added 2 commits August 9, 2023 11:24

new interface

5cd443b

fix

22f06fa

Borda reviewed Aug 9, 2023

View reviewed changes

SkafteNicki added 3 commits August 17, 2023 10:59

Merge branch 'master' into bugfix/disable_input_format

f18dc30

base functional implementation

5de39d3

base module implementation

9f535db

SkafteNicki added the API / design label Aug 17, 2023

Borda marked this pull request as ready for review January 9, 2024 21:46

Borda requested review from justusschock and stancld as code owners January 9, 2024 21:46

pre-commit-ci bot and others added 3 commits January 9, 2024 21:46

[pre-commit.ci] auto fixes from pre-commit.com hooks

e716faf

for more information, see https://pre-commit.ci

test hamming

4fc84a2

[pre-commit.ci] auto fixes from pre-commit.com hooks

783e7aa

for more information, see https://pre-commit.ci

Borda force-pushed the master branch from 04684b9 to 80a7b68 Compare January 11, 2024 12:55

Borda modified the milestones: v1.2.x, v1.3.x Jan 11, 2024

mergify bot added the has conflicts label Jan 11, 2024

Merge branch 'master' into bugfix/disable_input_format

bd27107

mergify bot removed the has conflicts label Jan 12, 2024

Merge branch 'master' into bugfix/disable_input_format

7b0f1e7

Merge branch 'master' into bugfix/disable_input_format

1c39d86

Merge branch 'master' into bugfix/disable_input_format

ea4d8bf

mergify bot added the has conflicts label Feb 15, 2024

Borda and others added 2 commits February 15, 2024 20:16

Merge branch 'master' into bugfix/disable_input_format

3495ba2

[pre-commit.ci] auto fixes from pre-commit.com hooks

a7d719b

for more information, see https://pre-commit.ci

mergify bot removed the has conflicts label Feb 15, 2024

mergify bot added the has conflicts label Feb 27, 2024

Borda force-pushed the master branch from 306bb3d to 4ed43e6 Compare March 14, 2024 12:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classification: option to disable input formatting [wip] #1676

Classification: option to disable input formatting [wip] #1676

SkafteNicki commented Mar 31, 2023 •

edited

Borda commented Apr 17, 2023

samet-akcay commented Aug 3, 2023

Borda commented Aug 3, 2023

Borda commented Aug 7, 2023

Borda Aug 9, 2023

codecov bot commented Jan 9, 2024 •

edited

idc9 commented Feb 3, 2024

SkafteNicki commented Feb 4, 2024

NoahAtKintsugi commented Feb 5, 2024

idc9 commented Feb 6, 2024

SkafteNicki commented Feb 6, 2024

Borda commented Feb 15, 2024

idc9 commented Feb 20, 2024

Borda commented Mar 28, 2024

Classification: option to disable input formatting [wip] #1676

Are you sure you want to change the base?

Classification: option to disable input formatting [wip] #1676

Conversation

SkafteNicki commented Mar 31, 2023 • edited

What does this PR do?

Did you have fun?

Borda commented Apr 17, 2023

samet-akcay commented Aug 3, 2023

Borda commented Aug 3, 2023

Borda commented Aug 7, 2023

Borda Aug 9, 2023

Choose a reason for hiding this comment

codecov bot commented Jan 9, 2024 • edited

Codecov Report

idc9 commented Feb 3, 2024

SkafteNicki commented Feb 4, 2024

NoahAtKintsugi commented Feb 5, 2024

idc9 commented Feb 6, 2024

SkafteNicki commented Feb 6, 2024

Borda commented Feb 15, 2024

idc9 commented Feb 20, 2024

Borda commented Mar 28, 2024

SkafteNicki commented Mar 31, 2023 •

edited

codecov bot commented Jan 9, 2024 •

edited