An improved MMSE estimator based modified group delay spectrum for Forensic Automatic Speaker Recognition

Salim Djeghiour,Mhania Guerti

doi:10.1007/s10772-021-09829-9

Abstract

This paper presents an improved speech enhancement algorithm called Minimum Mean Square Error (MMSE), based on the MODified Group Delay spectrum (MODGD), for Forensic Automatic Speaker Recognition (FASR) under noisy environments. This algorithm uses the MODGD instead of the amplitude spectrum, to compute the power spectrum of the noise-corrupt signal. In the proposed estimator, the MODGD retains most of the formants information. Therefore, it enhances the noisy speech signal with high quality even at extremely low Signal-to-Noise Ratio (SNR) levels. The evaluation of the improved algorithm in simulated FASR scenarios was performed by adding different noise levels, extracted from the NOISEX-92 database to the clean NIST2000-traces. The results obtained show that the proposed MMSE–MODGD estimator provides greater suppression of noise components in regions of low SNR than the MMSE estimator. In addition, there is a drastic reduction in Equal Proportion Probability (EPP) (the improvements are 1.84% for babble noise and 1.25% for factory and white noises), combining FASR techniques with the proposed MMSE–MODGD estimator than with the conventional estimator.

Full Text