`BinaryPrecisionRecallCurve` for large datasets (>100 million samples) #1309

jpcbertoldo · 2022-11-03T00:15:02Z

What does this PR do?

This PR provides an alternative to this warning:

rank_zero_warn(
    "Metric `PrecisionRecallCurve` will save all targets and predictions in buffer."
    " For large datasets this may lead to large memory footprint."
)

Context

Currently, BinaryPrecisionRecallCurve (and related methods by consequence) has two operation modes, which I will call "computed-thresholds" (arg thresholds=None) and "given-thresholds" (otherwise).

"computed-thresholds"

All possible thresholds in the sample ares used in compute(). It has a high memory consumption if the number of instances is high because all the preds and targets are kept in memory during the updates.

How much is "high" memory consumption?

I will consider 100 million samples the order of magnitude where "high" starts.
This roughly corresponds to ~750Mb of for the tensor preds under the "computed-thresholds" mode: 1e8 (samples) * 4 (bytes/float32) = 375Mb.

In my research I'd like to compute roc curves of pixel-wise classifications with 1024 x 1024 resolution images, and I have ~2000 in test, which would account for 1024 * 1024 * 2000 * 4 (byte/float32) ~= 7.8Gb at least .

"given-thresholds"

Thresholds are pre-defined, so all possible binarizations are known in advance. At each update all are tested, giving as many confusion matrices.

The memory consumption is low because and a function of the number of thresholds
If num_thresholds = 10e6, then it uses roughly 10e6 * (2 *2) (confmat shape) * 8 (bytes/long) ~= 300mb which is much lower even with 1 million thresholds.
However, it requires the user to know meaningful threshols in advance, which is often not the case.

A possible (inconvenient) solution would be to estimate the thresholds on a first call then compute the curve.

Solution

I propose to create a hybrid strategy where the update switches from "computed-thresholds" to "given-thresholds" dinamically.

Given a budget (number of instances), the keeps preds and targets in the state until that budget is reached, then estimates meaningful thresholds and compute the confusion matrices at that point. From there it behaves like "given-thresholds" mode.

API sketch

# the arg `thresholds` can be a str 
# `budget` is the max number of instances seen before the mode switch
metric = BinaryPrecisionRecallCurve(thresholds="dynamic", budget=10e8)  # 100 million for instance

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure to update the docs?
Did you write any new necessary tests?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

SkafteNicki · 2022-11-03T08:58:34Z

Hi @jpcbertoldo, thanks for proposing this enhancement (I have also seen your thread on slack)
I do like the idea, but I wonder if it is possible to still fit this feature into the thresholds argument, so we do not have to change the outer interface again. We could reserve str input to this feature such that:

metric = BinaryPrecisionRecallCurve(thresholds="12345")
metric = BinaryPrecisionRecallCurve(thresholds="250mb")
metric = BinaryPrecisionRecallCurve(thresholds="3gb")

all enabled this feature. What do you think about that?

jpcbertoldo · 2022-11-03T11:18:55Z

Hi @jpcbertoldo, thanks for proposing this enhancement (I have also seen your thread on slack) I do like the idea, but I wonder if it is possible to still fit this feature into the thresholds argument, so we do not have to change the outer interface again. We could reserve str input to this feature such that:
metric = BinaryPrecisionRecallCurve(thresholds="12345")
metric = BinaryPrecisionRecallCurve(thresholds="250mb")
metric = BinaryPrecisionRecallCurve(thresholds="3gb")
all enabled this feature. What do you think about that?

Yeah makes sense. I was actually thinking of considering just the case threshols="200mb", I think it makes a lot more sense than thinking about numbers of instances.

jpcbertoldo · 2022-11-03T11:22:16Z

@SkafteNicki by the way, I just saw this

New Features:

Submit a github issue - describe what is the motivation of such feature (adding the use case or an example is helpful).

Let's discuss to determine the feature scope.

Could we skip this step for this time? : )

…rtoldo/torchmetrics into jpcbertoldo/roc-for-large-datasets

SkafteNicki · 2022-11-03T15:47:04Z

@SkafteNicki by the way, I just saw this

New Features:

Submit a github issue - describe what is the motivation of such feature (adding the use case or an example is helpful).

Let's discuss to determine the feature scope.

Could we skip this step for this time? : )

That is normally how we like it. But since you already had implemented some, no need to open a issue about this. It is recommended because we do not want people to implement a lot before talking with us, because we may decide it is not a feature we want and therefore their work would go to waste.

SkafteNicki

Have you made any thoughts on extending this to multiclass / multilabel?

SkafteNicki · 2022-11-03T15:52:30Z

src/torchmetrics/functional/classification/precision_recall_curve.py

@@ -14,6 +14,7 @@

 from typing import List, Optional, Sequence, Tuple, Union

+import humanfriendly


We really do not want to introduce any new dependencies.
The conversion from mb and gb seems to be something we can do ourself?

I saw that one coming haha.

So, i put it anyway because it feels like the kind of functionality pruned to silly mistakes while a tiny library like this has it neatly packed in.

I can try to make a minimal version of it based on the library's source code. Is that a better solution?

lets make it conditional, if user already have it, then use it

if module_available("humanfriendly"): import humanfriendly else: humanfriendly = None

SkafteNicki · 2022-11-03T15:54:14Z

src/torchmetrics/classification/precision_recall_curve.py

+    class _ComputationMode(Enum):
+        """Internal state of the dynamic mode."""
+
+        BINNED = "binned"
+        NON_BINNED = "non-binned"
+        NON_BINNED_DYNAMIC = "non-binned-dynamic"


it seems weird to me having a class inside a class def?

It's kind of rare indeed but i usually do this in such cases where it is strictly only used internally 🤷‍♂️

Should i pop it out?

SkafteNicki · 2022-11-03T15:55:22Z

src/torchmetrics/classification/precision_recall_curve.py

+def _validate_memory_budget(budget: int):
+    if budget <= 0:
+        raise ValueError("Budget must be larger than 0.")
+
+    if _budget_bytes_to_nsamples(budget) <= _DYNAMIC_THRESHOLDS_MIN_NSAMPLES:
+        warnings.warn(
+            f"Budget is relatively small ({humanfriendly.format_size(budget, binary=True)}). "
+            "The dynamic mode is recommended for bigger samples."
+        )
+
+    return budget


if we only have the thresholds argument i guess all this logic can be moved to the _adjust_threshold_arg function

In the multiclass/label cases the estimation (number of samples) <-> (memory consumption) would be different.

I don't have a very strong opinion on this, i will put some more thought on the multi* cases first.

jpcbertoldo · 2022-11-03T17:07:14Z

Have you made any thoughts on extending this to multiclass / multilabel?

Just as much as in my reply to your comment 😬.

I'll be putting some effort on that this weekend :)

jpcbertoldo · 2022-11-21T10:08:45Z

@SkafteNicki i'm having a hard time to make my self available to invest some time on this.
Maybe could you give me a hand and tell me what/where I have to change things to adapt it for the multilabel case?

Borda · 2022-12-23T05:13:14Z

@stancld could you help here to finish this PR? 🦦

Borda · 2023-02-28T10:41:49Z

src/torchmetrics/classification/precision_recall_curve.py

 from typing import Any, List, Optional, Tuple, Union

+import humanfriendly


lets have this as optional

Borda · 2023-02-28T10:42:12Z

src/torchmetrics/classification/precision_recall_curve.py

+    class _ComputationMode(Enum):
+        """Internal state of the dynamic mode."""
+
+        BINNED = "binned"
+        NON_BINNED = "non-binned"
+        NON_BINNED_DYNAMIC = "non-binned-dynamic"


Borda · 2023-02-28T10:42:37Z

src/torchmetrics/classification/precision_recall_curve.py

+        if isinstance(thresholds, str):
+            return BinaryPrecisionRecallCurve._ComputationMode.NON_BINNED_DYNAMIC
+        elif thresholds is None:
+            return BinaryPrecisionRecallCurve._ComputationMode.NON_BINNED
+        else:
+            return BinaryPrecisionRecallCurve._ComputationMode.BINNED


Suggested change

if isinstance(thresholds, str):

return BinaryPrecisionRecallCurve._ComputationMode.NON_BINNED_DYNAMIC

elif thresholds is None:

return BinaryPrecisionRecallCurve._ComputationMode.NON_BINNED

else:

return BinaryPrecisionRecallCurve._ComputationMode.BINNED

if isinstance(thresholds, str):

return BinaryPrecisionRecallCurve._ComputationMode.NON_BINNED_DYNAMIC

if thresholds is None:

return BinaryPrecisionRecallCurve._ComputationMode.NON_BINNED

return BinaryPrecisionRecallCurve._ComputationMode.BINNED

Borda · 2023-02-28T10:44:03Z

src/torchmetrics/functional/classification/precision_recall_curve.py

@@ -14,6 +14,7 @@

 from typing import List, Optional, Sequence, Tuple, Union

+import humanfriendly


lets make it conditional, if user already have it, then use it

if module_available("humanfriendly"): import humanfriendly else: humanfriendly = None

Borda · 2023-02-28T10:44:48Z

src/torchmetrics/functional/classification/precision_recall_curve.py

    if isinstance(thresholds, int):
        thresholds = torch.linspace(0, 1, thresholds, device=device)
    if isinstance(thresholds, list):
        thresholds = torch.tensor(thresholds, device=device)
+    if isinstance(thresholds, str):


Suggested change

if isinstance(thresholds, str):

if isinstance(thresholds, str) and humanfriendly:

Borda · 2023-02-28T10:45:23Z

@jpcbertoldo, how is it going here? I think we are on good path... :)

jpcbertoldo · 2023-03-24T13:19:14Z

@jpcbertoldo, how is it going here? I think we are on good path... :)

Hi @Borda, I lost track of this since a while (vacation, other priorities...). I will try to come back to it soon!

jpcbertoldo · 2023-06-14T08:02:29Z

@Borda I was thinking if we could do a simpler solution for this by computing the metric in 2 rounds ("epochs").

1st round: just find the min/max, then, with a given maximum number of points, linearly -- or some smarter way ? -- space the thresholds between the min/max.

2nd round: compute the actual TPR/FPRs in an online way (because now the thresholds are known in advance).

An alternative for the 1st round: instead of just min/max, keep track of unique values, which may provide information for eventually having parts of the threshold range more or less dense. The set of unique values could be in float16 instead of float32 to increase the redundancy of values (i'm guessing it shouldn't affect so much the accuracy of the AUC).

Borda · 2023-08-08T07:07:45Z

@jpcbertoldo apology for the late reply...
@SkafteNicki what do you think about the suggestion above?

jpcbertoldo added 2 commits November 3, 2022 00:02

first version

48188d5

update consts

1e5f130

jpcbertoldo added 4 commits November 3, 2022 14:29

simplify

b98edf7

first version

ca963bb

update consts

6698d76

simplify

d5a8309

jpcbertoldo force-pushed the jpcbertoldo/roc-for-large-datasets branch from b98edf7 to d5a8309 Compare November 3, 2022 13:31

jpcbertoldo added 3 commits November 3, 2022 14:44

correct import

a1b687d

add humanfriendly to requirements

86b1178

Merge branch 'jpcbertoldo/roc-for-large-datasets' of github.com:jpcbe…

bea71f6

…rtoldo/torchmetrics into jpcbertoldo/roc-for-large-datasets

SkafteNicki added the enhancement New feature or request label Nov 3, 2022

SkafteNicki added this to In progress in Classification refactor via automation Nov 3, 2022

SkafteNicki assigned jpcbertoldo Nov 3, 2022

SkafteNicki added this to the v0.11 milestone Nov 3, 2022

SkafteNicki reviewed Nov 3, 2022

View reviewed changes

Borda requested a review from SkafteNicki November 3, 2022 22:02

Merge branch 'master' into jpcbertoldo/roc-for-large-datasets

0b3afad

SkafteNicki modified the milestones: v0.11, v0.12 Nov 18, 2022

Borda force-pushed the master branch from 8bcfb73 to bc057d4 Compare November 30, 2022 11:10

Merge branch 'master' into jpcbertoldo/roc-for-large-datasets

560e899

Borda force-pushed the master branch from 53566a9 to 606dc17 Compare February 24, 2023 13:05

Borda reviewed Feb 28, 2023

View reviewed changes

SkafteNicki modified the milestones: v0.12, future Mar 31, 2023

Borda force-pushed the master branch from 696e673 to f6760b5 Compare August 22, 2023 20:19

Borda force-pushed the master branch from 04684b9 to 80a7b68 Compare January 11, 2024 12:55

Borda force-pushed the master branch from 306bb3d to 4ed43e6 Compare March 14, 2024 12:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`BinaryPrecisionRecallCurve` for large datasets (>100 million samples) #1309

`BinaryPrecisionRecallCurve` for large datasets (>100 million samples) #1309

jpcbertoldo commented Nov 3, 2022 •

edited by Borda

SkafteNicki commented Nov 3, 2022

jpcbertoldo commented Nov 3, 2022

jpcbertoldo commented Nov 3, 2022

SkafteNicki commented Nov 3, 2022

SkafteNicki left a comment

SkafteNicki Nov 3, 2022

jpcbertoldo Nov 3, 2022

Borda Feb 28, 2023

SkafteNicki Nov 3, 2022

jpcbertoldo Nov 3, 2022

Borda Feb 28, 2023

SkafteNicki Nov 3, 2022

jpcbertoldo Nov 3, 2022

jpcbertoldo commented Nov 3, 2022

jpcbertoldo commented Nov 21, 2022

Borda commented Dec 23, 2022

Borda Feb 28, 2023

Borda Feb 28, 2023

Borda Feb 28, 2023

Borda Feb 28, 2023

Borda Feb 28, 2023

Borda commented Feb 28, 2023

jpcbertoldo commented Mar 24, 2023

jpcbertoldo commented Jun 14, 2023

Borda commented Aug 8, 2023

		@@ -14,6 +14,7 @@

		from typing import List, Optional, Sequence, Tuple, Union

		import humanfriendly

		from typing import Any, List, Optional, Tuple, Union

		import humanfriendly

	if isinstance(thresholds, str):
	if isinstance(thresholds, str) and humanfriendly:

BinaryPrecisionRecallCurve for large datasets (>100 million samples) #1309

Are you sure you want to change the base?

BinaryPrecisionRecallCurve for large datasets (>100 million samples) #1309

Conversation

jpcbertoldo commented Nov 3, 2022 • edited by Borda

What does this PR do?

Context

"computed-thresholds"

"given-thresholds"

Solution

API sketch

Before submitting

PR review

SkafteNicki commented Nov 3, 2022

jpcbertoldo commented Nov 3, 2022

jpcbertoldo commented Nov 3, 2022

SkafteNicki commented Nov 3, 2022

SkafteNicki left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpcbertoldo commented Nov 3, 2022

jpcbertoldo commented Nov 21, 2022

Borda commented Dec 23, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Borda commented Feb 28, 2023

jpcbertoldo commented Mar 24, 2023

jpcbertoldo commented Jun 14, 2023

Borda commented Aug 8, 2023

`BinaryPrecisionRecallCurve` for large datasets (>100 million samples) #1309

`BinaryPrecisionRecallCurve` for large datasets (>100 million samples) #1309

jpcbertoldo commented Nov 3, 2022 •

edited by Borda