Evaluation of quality of speech enhanced by HMM and AR model-based systems

Anisa Yasmin,Li Deng,Paul Fieguth

doi:10.1121/1.427281

Abstract

Speech enhancement algorithms have demonstrated their application potential in a wide variety of speech communication contexts in which the quality or the intelligibility of speech has been degraded by the presence of background noise: hearing aids, cellular phones, public telephones, hands-free telephones, air-ground communications, etc. By far the two most popular choices for model-based speech enhancement are the Wiener-filter-based hidden Markov model (HMM), and the autoregressive (AR) model-based Kalman filter. Although researchers have been studying such enhancement systems for some time, relatively little has been undertaken in comparing the quality of enhanced speech produced by these two systems. This paper studies the Wiener/HMM and AR/KF models and conducts a comprehensive comparative study of the relative quality of enhanced speech, based on utterances from the TIMIT database contaminated by simulated and sampled empirical noises. The Wiener/HMM and AR/KF comparison includes both subjective and qualitative evaluations: subjective assessments are based on mean opinion scores (MOS) and the inspection of temporal and spectrogram plots; objective evaluations are based on average and segmental signal-to-noise ratios. HMM enhanced speech has most of the noise removed, but with interruptions and discontinuities present due to the switched nature of the HMM. AR model-based enhancement possesses more audible background noise in the high-frequency region above 4 kHz, however the speech is smoother, with fewer discontinuities.

Full Text