On-line Gaussian mixture modeling in the log-power domain for signal-to-noise ratio estimation and speech enhancement

Tran Huy Dat,Kazuya Takeda,Fumitada Itakura

doi:10.1016/j.specom.2006.06.009

Tran Huy Dat, Kazuya Takeda + Show 1 more

https://doi.org/10.1016/j.specom.2006.06.009

Copy DOI

Abstract

We present on-line Gaussian mixture modeling (GMM) in the log-power domain of actual noisy speech and its applications to segmental signal-to-noise ratio (SNR) estimation and speech enhancement. The basic idea in this method is the use of conventional two-component GMM modeling in the log-power domain to estimate the distributions of noise and noisy speech subspaces in each speech segment of a length of 0.5–2 s. Given the subspace distributions, the statistical estimation method is adopted in the applications. For the segmental SNR estimation, the average speech level is estimated from noisy speech using a nonlinear moment of modeled distributions. This method is suitable under real conditions, when neither reference signals nor speech activity is available, and is shown to be more robust and accurate than conventional methods, particularly under low-SNR conditions. The proposed GMM model is extended to the multiband log-power domains for noise estimation. We use long-term information, which is obtained by GMM modeling in each segment of 0.5 s, to update the local distributions of noise and noisy speech power at each actual time–frequency index. The cumulative distribution function equalization (CDFE) is then used to estimate the noise and subtract it from the noisy speech power. The advantage of the CDFE method for noise estimation is that the estimation is given in the logarithmic domain without any approximation. The proposed speech enhancement is tested using the AURORA-2J database. We also compare the proposed method to the conventional minimum statistic and quantile-based noise estimation. The proposed method is found to be superior to the conventional in the speech recognition rate over most noise environments and shown to provide very good compromise between speech enhancement and speech recognition performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On-line Gaussian mixture modeling in the log-power domain for signal-to-noise ratio estimation and speech enhancement

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Jul 21, 2006
Citations: 15

Similar Papers

Speech Enhancement Based on the Decomposition of Speech Into Deterministic and Stochastic Components and Psychoacoustic Model
Seokhwan Jo ... Chang D Yoo
-
Seokhwan Jo, et. al.Seokhwan Jo ... Chang D Yoo
01 Jan 2007
01 Jan 2007

SNR and Local Noise Power Estimations Based on Gaussian Mixture Modeling on the Log-Power Domain
K Takeda ... H Fujimura
-
K Takeda, et. al.K Takeda ... H Fujimura
18 Mar 2005
18 Mar 2005

Deep Learning for Minimum Mean-Square Error and Missing Data Approaches to Robust Speech Processing

-

04 Dec 2020
04 Dec 2020

Modeling Speech Structure to Improve T-F Masks for Speech Enhancement and Recognition
Suliang Bu ... Shaojun Wang
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30
Suliang Bu, et. al.Suliang Bu ... Shaojun Wang
01 Jan 2021
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On-line Gaussian mixture modeling in the log-power domain for signal-to-noise ratio estimation and speech enhancement

Abstract

Talk to us

Similar Papers

More From: Speech Communication