Abstract
This paper considers techniques for single-channel speech enhancement based on the discrete Fourier transform (DFT). Specifically, we derive minimum mean-square error (MMSE) estimators of speech DFT coefficient magnitudes as well as of complex-valued DFT coefficients based on two classes of generalized gamma distributions, under an additive Gaussian noise assumption. The resulting generalized DFT magnitude estimator has as a special case the existing scheme based on a Rayleigh speech prior, while the complex DFT estimators generalize existing schemes based on Gaussian, Laplacian, and Gamma speech priors. Extensive simulation experiments with speech signals degraded by various additive noise sources verify that significant improvements are possible with the more recent estimators based on super-Gaussian priors. The increase in perceptual evaluation of speech quality (PESQ) over the noisy signals is about 0.5 points for street noise and about 1 point for white noise, nearly independent of input signal-to-noise ratio (SNR). The assumptions made for deriving the complex DFT estimators are less accurate than those for the magnitude estimators, leading to a higher maximum achievable speech quality with the magnitude estimators.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Audio, Speech and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.