You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add new audio metrics for generative audio processing
Motivation
The evaluation of speech processing (denoising, dereverberation and in general enhancement) highly depends on audio metrics. Nowadays, generative AI is heavily used for speech/audio enhancement, becoming the new SOTA. However, the performance evaluation of speech enhancement with generative AI needs of reference/target less metrics that highly correlate with MOS (Mean Opinion Score). Currently implemented metrics do not allow for the correct assessment of generative speech enhancement algorithms (e.g. those based on diffusion or GANs) because they heavily rely on reference/target audio.
Newer metrics, such as DNSMOS, NISQA, CDPAM, WARPQ allow for a fundamented assessment of the performance of such algorithms (they are either reference-less or designed for generative methods). In addition, they have shown outperformance over traditional metrics (PESQ, STOI...) regarding MOS correlation.
Pitch
It would be great to have these metrics included, as they are currently available in scattered repositories WARPQ DNSMOS CDPAM NISQA
Alternatives
I cannot think of any
The text was updated successfully, but these errors were encountered:
Hi @d-caviedes,
Thanks for wanting to contribute to torchmetrics. Feel free to contribute any metric within the audio domain that you can :)
In general we are looking to add any metric that is used by researches or companies on a regular basis.
We welcome both parts of a implementation and full implementations and we will of cause help you with specific implementation details to get the metric into the torchmetrics library.
Cool. Should I just work on my branch and go for pull request afterwards?
yes, as soon you feel you want to share your work or need some guidance, please open a draft PR :)
Borda
changed the title
Contribution: Add new audio/speech metrics for generative audio. (I can help!)
Contribution: Add new audio/speech metrics for generative audio
Mar 28, 2024
馃殌 Feature
Add new audio metrics for generative audio processing
Motivation
The evaluation of speech processing (denoising, dereverberation and in general enhancement) highly depends on audio metrics. Nowadays, generative AI is heavily used for speech/audio enhancement, becoming the new SOTA. However, the performance evaluation of speech enhancement with generative AI needs of reference/target less metrics that highly correlate with MOS (Mean Opinion Score). Currently implemented metrics do not allow for the correct assessment of generative speech enhancement algorithms (e.g. those based on diffusion or GANs) because they heavily rely on reference/target audio.
Newer metrics, such as DNSMOS, NISQA, CDPAM, WARPQ allow for a fundamented assessment of the performance of such algorithms (they are either reference-less or designed for generative methods). In addition, they have shown outperformance over traditional metrics (PESQ, STOI...) regarding MOS correlation.
Pitch
It would be great to have these metrics included, as they are currently available in scattered repositories
WARPQ
DNSMOS
CDPAM
NISQA
Alternatives
I cannot think of any
The text was updated successfully, but these errors were encountered: