On the relation between speech corruption models in the spectral and the cepstral domain

Ramon Fernandez Astudillo,Timo Gerkmann

doi:10.1109/icassp.2013.6639028

Abstract

The Gaussian distortion model in the short-time Fourier transform (STFT) domain is the basis of many of the modern speech enhancement algorithms. One of the reasons is that additive sources and late reverberation can be analyzed and processed quite efficiently in this domain. The STFT domain is however not well related to acoustic quality and is also not well suited for learning models due to the high variability of speech in this domain. On the other hand, the cepstral domain has proved to be very well suited for these last two purposes, however, at the cost of loosing the simple linear relation between desired source and additive interferences. In this paper we explore the relation between the Gaussian distortion models in the STFT and the cepstral domain. We show how the assumption of a jointly Gaussian distortion model in the cepstrum domain is fulfilled for well-known distortion models in STFT domain. We provide closed-form solutions relating the joint distributions of corrupted and clean speech in the STFT and the cepstrum domain. We also propose various ways in which this model can be used to enhance speech.

Full Text