Releases · Lightning-AI/torchmetrics

15 May 11:24

Borda

v1.4.0.post0

3f11239

Minor dependency correction Latest

Latest

Full Changelog: v1.4.0...v1.4.0.post0

Assets 4

06 May 09:20

Borda

v1.4.0

f6d1f44

Metrics for segmentation

In Torchmetrics v1.4, we are happy to introduce a new domain of metrics to the library: segmentation metrics. Segmentation metrics are used to evaluate how well segmentation algorithms are performing, e.g., algorithms that take in an image and pixel-by-pixel decide what kind of object it is. These kind of algorithms are necessary in applications such as self driven cars. Segmentations are closely related to classification metrics, but for now, in Torchmetrics, expect the input to be formatted differently; see the documentation for more info. For now, MeanIoU and GeneralizedDiceScore have been added to the subpackage, with many more to follow in upcoming releases of Torchmetrics. We are happy to receive any feedback on metrics to add in the future or the user interface for the new segmentation metrics.

Torchmetrics v1.3 adds new metrics to the classification and image subpackage and has multiple bug fixes and other quality-of-life improvements. We refer to the changelog for the complete list of changes.

[1.4.0] - 2024-05-03

Added

Added SensitivityAtSpecificity metric to classification subpackage (#2217)
Added QualityWithNoReference metric to image subpackage (#2288)
Added a new segmentation metric:
- MeanIoU (#1236)
- GeneralizedDiceScore (#1090)
Added support for calculating segmentation quality and recognition quality in PanopticQuality metric (#2381)
Added pretty-errors for improving error prints (#2431)
Added support for torch.float weighted networks for FID and KID calculations (#2483)
Added zero_division argument to selected classification metrics (#2198)

Changed

Made __getattr__ and __setattr__ of ClasswiseWrapper more general (#2424)

Fixed

Fix getitem for metric collection when prefix/postfix is set (#2430)
Fixed axis names with Precision-Recall curve (#2462)
Fixed list synchronization with partly empty lists (#2468)
Fixed memory leak in metrics using list states (#2492)
Fixed bug in computation of ERGAS metric (#2498)
Fixed BootStrapper wrapper not working with kwargs provided argument (#2503)
Fixed warnings being suppressed in MeanAveragePrecision when requested (#2501)
Fixed corner-case in binary_average_precision when only negative samples are provided (#2507)

Key Contributors

@baskrahmer, @Borda, @ChristophReich1996, @daniel-code, @furkan-celik, @i-aki-y, @jlcsilva, @NielsRogge, @oguz-hanoglu, @SkafteNicki, @ywchan2005

New Contributors

@eamonn-zh made their first contribution in #2345
@nsmlzl made their first contribution in #2346
@fschlatt made their first contribution in #2364
@JonasVerbickas made their first contribution in #2358
@AtomicVar made their first contribution in #2391
@JDongian made their first contribution in #2400
@daniel-code made their first contribution in #2390
@baskrahmer made their first contribution in #2457
@ChristophReich1996 made their first contribution in #2381
@lukazso made their first contribution in #2491
@S-aiueo32 made their first contribution in #2499
@dominicgkerr made their first contribution in #2493
@Shoumik-Gandre made their first contribution in #2482
@randombenj made their first contribution in #2511
@NielsRogge made their first contribution in #1236
@i-aki-y made their first contribution in #2198

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Full Changelog: v1.3.0...v1.4.0

Contributors

JDongian, randombenj, and 20 other contributors

Assets 4

18 Mar 12:39

Borda

v1.3.2

c1f8334

Minor patch release

[1.3.2] - 2024-03-18

Fixed

Fixed negative variance estimates in certain image metrics (#2378)
Fixed dtype being changed by deepspeed for certain regression metrics (#2379)
Fixed plotting of metric collection when prefix/postfix is set (#2429)
Fixed bug when top_k>1 and average="macro" for classification metrics (#2423)
Fixed case where label prediction tensors in classification metrics were not validated correctly (#2427)
Fixed how auc scores are calculated in PrecisionRecallCurve.plot methods (#2437)

Full Changelog: v1.3.1...v1.3.2

Key Contributors

@Borda, @SkafteNicki

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

Borda and SkafteNicki

Assets 4

12 Feb 19:10

Borda

v1.3.1

cdbecce

Minor patch release

[1.3.1] - 2024-02-12

Fixed

Fixed how backprop is handled in LPIPS metric (#2326)
Fixed MultitaskWrapper not being able to be logged in lightning when using metric collections (#2349)
Fixed high memory consumption in Perplexity metric (#2346)
Fixed cached network in FeatureShare not being moved to the correct device (#2348)
Fix naming of statistics in MeanAveragePrecision with custom max det thresholds (#2367)
Fixed custom aggregation in retrieval metrics (#2364)
Fixed initialize aggregation metrics with default floating type (#2366)
Fixed plotting of confusion matrices (#2358)

Full Changelog: v1.3.0...v1.3.1

Key Contributors

@Borda, @fschlatt, @JonasVerbickas, @nsmlzl, @SkafteNicki

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

Borda, nsmlzl, and 3 other contributors

Assets 4

30 Jan 10:36

Borda

v1.3.0.post0

1a2f406

Minor release patch

Full Changelog: v1.3.0...v1.3.0.post0

Assets 4

11 Jan 12:56

Borda

v1.3.0

80a7b68

New Image metrics & wrappers

TorchMetrics v1.3 is out now! This release introduces seven new metrics in the different subdomains of TorchMetrics, adding some nice features to already established metrics. In this blogpost, we present the new metrics with short code samples.

We are happy to see the continued adoption of TorchMetrics in over 19,000 Github repositories projects, and we are proud to release that we have passed 1,800 GitHub stars.

New metrics

The retrieval domain has received one new metric in this release: RetrievalAUROC. This metric calculates the Area Under the Receiver Operation Curve for document retrieval data. It is similar to the standard AUROC metric from classification but also supports the additional indexes argument that all retrieval metrics support.

from torch import tensor
from torchmetrics.retrieval import RetrievalAUROC
indexes = tensor([0, 0, 0, 1, 1, 1, 1])
preds = tensor([0.2, 0.3, 0.5, 0.1, 0.3, 0.5, 0.2])
target = tensor([False, False, True, False, True, False, True])
r_auroc = RetrievalAUROC()
r_auroc(preds, target, indexes=indexes)
# tensor(0.7500)

The image subdomain is receiving two new metrics in v1.3, which brings the total number image-specific metrics in TorchMetrics to 21! As with other metrics, these two new metrics work by comparing a predicted image tensor to a ground truth image, but they focus on different properties for their metric calculation.

The first metrics is SpatialCorrelationCoefficient. As the name indicates this metric focuses on how well the spatial structure of the predicted image correlates with the ground truth image.

import torch
torch.manual_seed(42)
from torchmetrics.image import SpatialCorrelationCoefficient as SCC
preds = torch.randn([32, 3, 64, 64])
target = torch.randn([32, 3, 64, 64])
scc = SCC()
scc(preds, target)
# tensor(0.0023)

The second metrics is SpatialDistortionIndex compares the spatial structure of the images, and is especially useful for evaluating multi spectral images

import torch
from torchmetrics.image import SpatialDistortionIndex
preds = torch.rand([16, 3, 32, 32])
target = {
  'ms': torch.rand([16, 3, 16, 16]),
  'pan': torch.rand([16, 3, 32, 32]),
}
sdi = SpatialDistortionIndex()
sdi(preds, target)
# tensor(0.0090)

A new wrapper metric called FeatureShare has also been added. This can be seen as a specialized version of MetricCollection that can be combined with metrics that use a neural network as part of their metric calculation. For example, FrechetInceptionDistance , InceptionScore, KernelInceptionDistance all, by default, use an inception network for their metric calculations. When these metrics were combined inside a MetricCollection, the underlying neural network was still called three times, which is quite redundant and wastes resources. In principle, it should be possible only to call it once and then propagate the value to all metrics, which is exactly what the FeatureShare wrapper solves.

import torch
from torchmetrics.wrappers import FeatureShare
from torchmetrics import MetricCollection
from torchmetrics.image import FrechetInceptionDistance, KernelInceptionDistance

def fs_wrapper():
    fs = FeatureShare([FrechetInceptionDistance(), KernelInceptionDistance(subset_size=10, subsets=2)])
    fs.update(torch.randint(255, (50, 3, 64, 64), dtype=torch.uint8), real=True)
    fs.update(torch.randint(255, (50, 3, 64, 64), dtype=torch.uint8), real=False)
    fs.compute()

def mc_wrapper():
    mc = MetricCollection([FrechetInceptionDistance(), KernelInceptionDistance(subset_size=10, subsets=2)])
    mc.update(torch.randint(255, (50, 3, 64, 64), dtype=torch.uint8), real=True)
    mc.update(torch.randint(255, (50, 3, 64, 64), dtype=torch.uint8), real=False)
    mc.compute()

# lets compare (using ipython timeit function)
% timeit fs_wrapper()
# 8.38 s ± 564 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
% timeit mc_wrapper()
# 13.8 s ± 232 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This will most likely be significantly faster than the alternative metric collection, as show in the code example.

Improved features

In v1.2, several new arguments were added to MeanAveragePrecision metric from the detection package. This metric has seen a further small improvement in that the argument extended_summary=True also returns confidence scores. The confidence scores are the score assigned by the model on how confident a given predicted bounding box belongs to a certain class.

from torch import tensor
from torchmetrics.detection import MeanAveragePrecision
# enable extended summary
map_metric = MeanAveragePrecision(extended_summary=True)
preds = [
	{
		"boxes": torch.tensor([[0.5, 0.5, 1, 1]]),
		"scores": torch.tensor([1.0]),
		"labels": torch.tensor([0]),
	}
]
target = [
	{"boxes": torch.tensor([[0, 0, 1, 1]]), "labels": torch.tensor([0])}
]
map_metric.update(preds, target)
result = map_metric.compute()

# new confidence score can be found in the "score" key
confidence_scores = result["scores"]
# in this case confidence_score will have shape (10, 101, 1, 4, 3)
# because
#   * We are by default evaluating for 10 different IoU thresholds
#   * We evaluate the PR-curve based on 101 linearly spaced locations
#   * We only have 1 class (see the labels tensor)
#   * There are 4 area sizes we evaluate on (small, medium, large and all)
#   * By default `max_detection_thresholds=[1,10,100]` meaning we evaluate for 3 values

From v1.3 all retrieval metrics now support an argument called aggregation that determines how the metric should be aggregated over different documents. The supported options are "mean", "median", "max", "min" with the default value being "mean" which is fully backward compatible with earlier versions of TorchMetrics.

from torch import tensor
from torchmetrics.retrieval import RetrievalHitRate
indexes = tensor([0, 0, 0, 1, 1, 1, 1])
preds = tensor([0.2, 0.3, 0.5, 0.1, 0.3, 0.5, 0.2])
target = tensor([True, False, False, False, True, False, True])
hr2 = RetrievalHitRate(aggregation="max")
hr2(preds, target, indexes=indexes)
# tensor(1.000)

Finally, the SacreBLEU metric from the text domain now supports even more tokenizers: "ja-mecab", "ko-mecab", "flores101", "flores200”.

Changes and bugfixes

Users should be aware that from v1.3, TorchMetrics now only supports v1.10 of Pytorch and up (before v1.8). We always try to provide support for Pytorch releases for up to two years.

There have been several bug fixes related to numerical stability in several metrics. For this reason, we always recommend that users use the most recent version of Torchmetrics for the best experience.

Thank you!

As always, we offer a big thank you to all of our community members for their contributions and feedback. Please open an issue in the repo if you have any recommendations for the next metrics we should tackle.

If you want to ask a question or join us in expanding Torchmetrics, please join our discord server, where you can ask questions and get guidance in the #torchmetrics channel.

🔥 Check out the documentation and code! 🚀

[1.3.0] - 2024-01-10

Added

Added more tokenizers for SacreBLEU metric (#2068)
Added support for logging MultiTaskWrapper directly with lightnings log_dict method (#2213)
Added FeatureShare wrapper to share submodules containing feature extractors between metrics (#2120)
Added new metrics to image domain:
- SpatialDistortionIndex (#2260)
- Added CriticalSuccessIndex (#2257)
- Spatial Correlation Coefficient (#2248)
Added average argument to multiclass versions of PrecisionRecallCurve and ROC (#2084)
Added confidence scores when extended_summary=True in MeanAveragePrecision (#2212)
Added RetrievalAUROC metric (#2251)
Added aggregate argument to retrieval metrics (#2220)
Added utility functions in segmentation.utils for future segmentation metrics (#2105)

Changed

Changed minimum supported Pytorch version from 1.8 to 1.10 (#2145)
Changed x-/y-axis order for PrecisionRecallCurve to be consistent with scikit-learn (#2183)

Deprecated

Deprecated metric._update_called (#2141)
Deprecated specicity_at_sensitivity in favour of specificity_at_sensitivity (#2199)

Fixed

Fixed support for half precision + CPU in metrics requiring topk operator (#2252)
Fixed warning incorrectly being raised in Running metrics (#2256)
Fixed integration with custom feature extractor in FID metric (#2277)

Full Changelog: v1.2.0...v1.3.0

Key Contributors

@Borda, @HoseinAkbarzadeh, @matsumotosan, @miskf...

Contributors

clumsy, kyle-dorman, and 12 other contributors

Assets 4

01 Dec 17:06

Borda

v1.2.1

3514d71

Lazy imports

[1.2.1] - 2023-11-30

Added

Added error if NoTrainInceptionV3 is being initialized without torch-fidelity not being installed (#2143)
Added support for Pytorch v2.1 (#2142)

Changed

Change default state of SpectralAngleMapper and UniversalImageQualityIndex to be tensors (#2089)
Use arange and repeat for deterministic bincount (#2184)

Removed

Removed unused lpips third-party package as dependency of LearnedPerceptualImagePatchSimilarity metric (#2230)

Fixed

Fixed numerical stability bug in LearnedPerceptualImagePatchSimilarity metric (#2144)
Fixed numerical stability issue in UniversalImageQualityIndex metric (#2222)
Fixed incompatibility for MeanAveragePrecision with pycocotools backend when too little max_detection_thresholds are provided (#2219)
Fixed support for half precision in Perplexity metric (#2235)
Fixed device and dtype for LearnedPerceptualImagePatchSimilarity functional metric (#2234)
Fixed bug in Metric._reduce_states(...) when using dist_sync_fn="cat" (#2226)
Fixed bug in CosineSimilarity where 2d is expected but 1d input was given (#2241)
Fixed bug in MetricCollection when using compute groups and compute is called more than once (#2211)

Full Changelog: v1.2.0...v1.2.1

Key Contributors

@Borda, @jankng, @kyle-dorman, @SkafteNicki, @tanguymagne

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

kyle-dorman, Borda, and 3 other contributors

Assets 4

22 Sep 13:17

Borda

v1.2.0

0d7d6c7

Clustering metrics

Torchmetrics v1.2 is out now! The latest release includes 11 new metrics within a new subdomain: Clustering.
In this blog post, we briefly explain what clustering is, why it’s a useful measure and newly added metrics that can be used with code samples.

Clustering - what is it?

Clustering is an unsupervised learning technique. The term unsupervised here refers to the fact that we do not have ground truth targets as we do in classification. The primary goal of clustering is to discover hidden patterns or structures within data without prior knowledge about the meaning or importance of particular features. Thus, clustering is a form of data exploration compared to supervised learning, where the goal is “just” to predict if a data point belongs to one class.

The key goal of clustering algorithms is to split data into clusters/sets where data points from the same cluster are more similar to each other than any other points from the remaining clusters. Some of the most common and widely used clustering algorithms are K-Means, Hierarchical clustering, and Gaussian Mixture Models (GMM).

An objective quality evaluation/measure is required regardless of the clustering algorithm or internal optimization criterion used. In general, we can divide all clustering metrics into two categories: extrinsic metrics and intrinsic metrics.

Extrinsic metrics

Extrinsic metrics are characterized by requirements of some ground truth labeling, even if used for an unsupervised method. This may seem counter-intuitive at first as we, by clustering definition, do not use such ground truth labeling. However, most clustering algorithms are still developed on datasets with labels available, so these metrics use this fact as an advantage.

Intrinsic metrics

In contrast, intrinsic metrics do not need any ground truth information. These metrics estimate inter-cluster consistency (cohesion of all points assigned to a single set) compared to other clusters (separation). This is often done by comparing the distance in the embedding space.

Update to Mean Average Precision

MeanAveragePrecision, the most widely used metric for object detection in computer vision, now supports two new arguments: average and backend.

The average argument controls averaging over multiple classes. By the core definition, the default way is macro averaging, where the metric is calculated for each class separately and then averaged together. This will continue to be the default in Torchmetrics, but now we also support the setting average="micro". Every object under this setting is essentially considered to be the same class, and the returned value is, therefore, calculated simultaneously over all objects.
The second argument - backend, is important, as it indicates what computational backend will be used for the internal computations. Since MeanAveragePrecision is not a simple metric to compute, and we value the correctness of our metric, we rely on some third-party library to do the internal computations. By default, we rely on users to have the official pycocotools installed, but with the new argument, we will also be supporting other backends.

[1.2.0] - 2023-09-22

Added

Added metric to cluster package:
- MutualInformationScore (#2008)
- RandScore (#2025)
- NormalizedMutualInfoScore (#2029)
- AdjustedRandScore (#2032)
- CalinskiHarabaszScore (#2036)
- DunnIndex (#2049)
- HomogeneityScore (#2053)
- CompletenessScore (#2053)
- VMeasureScore (#2053)
- FowlkesMallowsIndex (#2066)
- AdjustedMutualInfoScore (#2058)
- DaviesBouldinScore (#2071)
Added backend argument to MeanAveragePrecision (#2034)

Full Changelog: v1.1.0...v1.2.0

New Contributors since `v1.1.0`

@matsumotosan made their first contribution in #2008
@GlavitsBalazs made their first contribution in #2042
@OmerShubi made their first contribution in #2081
@munahaf made their first contribution in #2082

Key Contributors

@matsumotosan, @SkafteNicki

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

munahaf, GlavitsBalazs, and 3 other contributors

Assets 4

11 Sep 13:14

Borda

v1.1.2

520625c

Weekly patch release

[1.1.2] - 2023-09-11

Fixed

Fixed tie breaking in ndcg metric (#2031)
Fixed bug in BootStrapper when very few samples were evaluated that could lead to crash (#2052)
Fixed bug when creating multiple plots that lead to not all plots being shown (#2060)
Fixed performance issues in RecallAtFixedPrecision for large batch sizes (#2042)
Fixed bug related to MetricCollection used with custom metrics have prefix/postfix attributes (#2070)

Contributors

@GlavitsBalazs, @SkafteNicki

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

GlavitsBalazs and SkafteNicki

Assets 4

29 Aug 06:57

Borda

v1.1.1

b57bb6d

Weekly patch release

[1.1.1] - 2023-08-29

Added

Added average argument to MeanAveragePrecision (#2018)

Fixed

Fixed bug in PearsonCorrCoef is updated on single samples at a time (#2019)
Fixed support for pixel-wise MSE (#2017)
Fixed bug in MetricCollection when used with multiple metrics that return dicts with same keys (#2027)
Fixed bug in detection intersection metrics when class_metrics=True resulting in wrong values (#1924)
Fixed missing attributes higher_is_better, is_differentiable for some metrics (#2028)

Contributors

@adamjstewart, @SkafteNicki

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Contributors

adamjstewart and SkafteNicki

Assets 4

Releases: Lightning-AI/torchmetrics

Minor dependency correction

Metrics for segmentation

[1.4.0] - 2024-05-03

Added

Changed

Fixed

Key Contributors

New Contributors

Contributors

Minor patch release

[1.3.2] - 2024-03-18

Fixed

Key Contributors

Contributors

Minor patch release

[1.3.1] - 2024-02-12

Fixed

Key Contributors

Contributors

Minor release patch

New Image metrics & wrappers

New metrics

Improved features

Changes and bugfixes

[1.3.0] - 2024-01-10

Added

Changed

Deprecated

Fixed

Key Contributors

Contributors

Lazy imports

[1.2.1] - 2023-11-30

Added

Changed

Removed

Fixed

Key Contributors

Contributors

Clustering metrics

Clustering - what is it?

Extrinsic metrics

Intrinsic metrics

Update to Mean Average Precision

[1.2.0] - 2023-09-22

Added

New Contributors since v1.1.0

Key Contributors

Contributors

Weekly patch release

[1.1.2] - 2023-09-11

Fixed

Contributors

Contributors

Weekly patch release

[1.1.1] - 2023-08-29

Added

Fixed

Contributors

Contributors

New Contributors since `v1.1.0`