This paper proposes Discrete Cosine Transform (DCT) based speech enhancement algorithms. These algorithms utilize minimum mean square error (MMSE) estimator of clean short-time spectral amplitude, which respectively uses Gaussian, Laplace and Gamma probability density functions (PDF) as speech priors. We consider the noise process is additive and Gaussian. The proposed estimators are closed-form solutions, whereas the conventional Discrete Fourier Transform (DFT) based estimators derived under super-Gaussian speech priors have no closed-form solutions. We also examine the estimators with the Speech Presence Uncertainty (SPU) that addresses the speech or silence problem with probability. Compared to the alternative approaches, such as the Ephraim and Malah or the Erkelens et.al MMSE-STSA estimators, the proposed methods demonstrate superior performance in terms of Segmental SNR (SegSNR), Perceptual Evaluation of Speech Quality (PESQ), short-time objective intelligibility measure (STOI), and mean subjective preference score, while exhibiting an equal or lower complexity.
Read full abstract