Deep Learning for Minimum Mean-Square Error and Missing Data Approaches to Robust Speech Processing

Aaron Nicolson

doi:10.25904/1912/4020

Abstract

Speech corrupted by background noise (or noisy speech) can cause misinterpretation and fatigue during phone and conference calls, and for hearing aid users. Noisy speech can also severely impact the performance of speech processing systems such as automatic speech recognition (ASR), automatic speaker verification (ASV), and automatic speaker identification (ASI) systems. Currently, deep learning approaches are employed in an end-to-end fashion to improve robustness. The target speech (or clean speech) is used as the training target or large noisy speech datasets are used to facilitate multi-condition training. In this dissertation, we propose competitive alternatives to the preceding approaches by updating two classic robust speech processing techniques using deep learning. The two techniques include minimum mean-square error (MMSE) and missing data approaches. An MMSE estimator aims to improve the perceived quality and intelligibility of noisy speech. This is accomplished by suppressing any background noise without distorting the speech. Prior to the introduction of deep learning, MMSE estimators were the standard speech enhancement approach. MMSE estimators require the accurate estimation of the a priori signal-to-noise ratio (SNR) to attain a high level of speech enhancement performance. However, current methods produce a priori SNR estimates with a large tracking delay and a considerable amount of bias. Hence, we propose a deep learning approach to a priori SNR estimation that is significantly more accurate than previous estimators, called Deep Xi. Through objective and subjective testing across multiple conditions, such as real-world non-stationary and coloured noise sources at multiple SNR levels, we show that Deep Xi allows MMSE estimators to produce the highest quality enhanced speech amongst all clean speech magnitude spectrum estimators. Missing data approaches improve robustness by performing inference only on noisy speech features that reliably represent clean speech. In particular, the marginalisation method was able to significantly increase the robustness of Gaussian mixture model (GMM)-based speech classification systems (e.g. GMM-based ASR, ASV, or ASI systems) in the early 2000s. However, deep neural networks (DNNs) used in current speech classification systems are non-probabilistic, a requirement for marginalisation. Hence, multi-condition training or noisy speech pre-processing is used to increase the robustness of DNN-based speech classification systems. Recently, sum-product networks (SPNs) were proposed, which are deep probabilistic graphical models that can perform the probabilistic queries required for missing data approaches. While available toolkits for SPNs are in their infancy, we show through an ASI task that SPNs using missing data approaches could be a strong alternative for robust speech processing in the future. This dissertation demonstrates that MMSE estimators and missing data approaches are still relevant approaches to robust speech processing when assisted by deep learning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep Learning for Minimum Mean-Square Error and Missing Data Approaches to Robust Speech Processing

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Sum-Product Networks for Robust Automatic Speaker Identification
Aaron Nicolson ... Kuldip K Paliwal
-
Aaron Nicolson, et. al.Aaron Nicolson ... Kuldip K Paliwal
25 Oct 2020
25 Oct 2020

Bayesian target location and velocity estimation for multiple-input multiple-output radar
M Jiang ... R.S Blum
IET Radar, Sonar & Navigation | VOL. 5
M Jiang, et. al.M Jiang ... R.S Blum
01 Jan 2010
IET Radar, Sonar & Navigation | VOL. 5

Efficient VQ-based MMSE estimation for robust speech recognition
Jose A Gonzalez ... Angel M Gomez
-
Jose A Gonzalez, et. al.Jose A Gonzalez ... Angel M Gomez
01 Mar 2010
01 Mar 2010

On the Equivalence of Maximum SNR and MMSE Estimation: Applications to Additive Non-Gaussian Channels and Quantized Observations
Luca Rugini ... Paolo Banelli
IEEE Transactions on Signal Processing | VOL. 64
Luca Rugini, et. al.Luca Rugini ... Paolo Banelli
19 May 2016
IEEE Transactions on Signal Processing | VOL. 64

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Learning for Minimum Mean-Square Error and Missing Data Approaches to Robust Speech Processing

Abstract

Talk to us

Similar Papers