Integrating Uncertainty Into Neural Network-Based Speech Enhancement

Huajian Fang,Timo Gerkmann,Stefan Wermter,Dennis Becker

doi:10.1109/taslp.2023.3265202

Abstract

Supervised masking approaches in the time-frequency domain aim to employ deep neural networks to estimate a multiplicative mask to extract clean speech. This leads to a single estimate for each input without any guarantees or measures of reliability. In this paper, we study the benefits of modeling uncertainty in clean speech estimation. Prediction uncertainty is typically categorized into <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">aleatoric uncertainty</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">epistemic uncertainty</i> . The former refers to inherent randomness in data, while the latter describes uncertainty in the model parameters. In this work, we propose a framework to jointly model aleatoric and epistemic uncertainties in neural network-based speech enhancement. The proposed approach captures aleatoric uncertainty by estimating the statistical moments of the speech posterior distribution and explicitly incorporates the uncertainty estimate to further improve clean speech estimation. For epistemic uncertainty, we investigate two Bayesian deep learning approaches: Monte Carlo dropout and Deep ensembles to quantify the uncertainty of the neural network parameters. Our analyses show that the proposed framework promotes capturing practical and reliable uncertainty, while combining different sources of uncertainties yields more reliable predictive uncertainty estimates. Furthermore, we demonstrate the benefits of modeling uncertainty on speech enhancement performance by evaluating the framework on different datasets, exhibiting notable improvement over comparable models that fail to account for uncertainty.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2023
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Integrating Uncertainty Into Neural Network-Based Speech Enhancement

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Similar Papers

Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models
Huajian Fang ... Timo Gerkmann
-
Huajian Fang, et. al.Huajian Fang ... Timo Gerkmann
04 Jun 2023
04 Jun 2023

Integrating Statistical Uncertainty into Neural Network-Based Speech Enhancement
Huajian Fang ... Stefan Wermter
-
Huajian Fang, et. al.Huajian Fang ... Stefan Wermter
23 May 2022
23 May 2022

MetaDetect: Uncertainty Quantification and Prediction Quality Estimates for Object Detection
Marius Schubert ... Matthias Rottmann
-
Marius Schubert, et. al.Marius Schubert ... Matthias Rottmann
18 Jul 2021
18 Jul 2021

Do I know this? segmentation uncertainty under domain shift
Katharina V Hoebel ... Andreanne Lemay
-
Katharina V Hoebel, et. al.Katharina V Hoebel ... Andreanne Lemay
04 Apr 2022
04 Apr 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrating Uncertainty Into Neural Network-Based Speech Enhancement

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing