A noise-type and level-dependent MPO-based speech enhancement architecture with variable frame analysis for noise-robust speech recognition

Vikramjit Mitra,Abeer Alwan,Carol Y Espy-Wilson,Bengt J Borgstrom

doi:10.21437/interspeech.2009-703

Abstract

In previous work, a speech enhancement algorithm based on phase opponency and a periodicity measure (MPO-APP) was developed for speech recognition. Axiomatic thresholds were used in the MPO-APP regardless of the signal-to-noise ratio (SNR) of the corrupted speech or any characterization of the noise. The current work developed an algorithm for adjusting the threshold in the MPO-APP based on the SNR and whether the speech signal is clean, corrupted by aperiodic noise or corrupted with noise with periodic components. In addition, variable frame rate (VFR) analysis has been incorporated so that dynamic regions in the speech signal are more heavily sampled than steady-state regions. The result is a 2-stage algorithm that gives superior performance to the previous MPO-APP, and to several other state-of-the-art speech enhancement algorithms. Index Terms: Speech enhancement, robust speech recognition, SNR estimation, variable frame rate analysis, phase opponency.

Full Text