Speech enhancement beyond minimum mean squared error with perceptual noise shaping.

Lae‐Hoon Kim,Mark Hasegawa‐Johnson,Kyung‐Tae Kim

doi:10.1121/1.3384190

Abstract

Residual error signal after speech enhancement through linear filtering can be decomposed into two disjoint portions: speech signal distortion and background noise suppression. Speech is known to follow a super‐Gaussian probabilistic distribution function (PDF) such as Laplacian, while background noise follows Gaussian PDF. Minimum mean squared error estimation requires only second order statistics not only for the noise but also for the speech. Therefore higher‐order dependence of observed speech on the original speech may cause leakage of speech information into the error residual. This talk will formulate an optimization problem minimizing higher‐order statistics (HOS) as well as energy of the signal distortion constrained by a limit on the maximum audibility of the residual noise. Note that due to the non‐stationary nature of speech, we perform the speech enhancement in short overlapping frames. Minimizing HOS of the speech distortion ensures that the speech distortion includes only noise terms, with minimum leakage from the speech signal. The constraint on the residual noise margin prevents over‐suppressing, which may result in unwanted speech distortion.

Full Text