Abstract
A new and effective algorithm is proposed in this paper based on Gaussian Mixture Modeling (GMM) and Minimum Mean Square Error (MMSE) criterion for speech enhancement. GMM mean vectors are used to model the space span by the power spectra of the input noisy speech frames. No assumption is made on the nature or stationarity of the noise. No Voice Activity Detection (VAD) or any other means is used to estimate the input Signal to Noise Ratio (SNR). The mean vectors derived from mixture models of Power Spectral Densities (PSDs) of speech and different noise sources are used to form sets of over-determined system of equations, as many as noise source candidates, whose solutions lead to the MMSE estimations of speech and noise power spectra. These are then used for noise suppression by applying Wiener filtering carried out on overlapping frames. The input SNR is estimated and the nature of the noise involved is determined as by-products of the method used. Results are compared with those of two variants of a method based on approximate but explicit MMSE Bayesian estimation that show good results but suffer from long processing times. It is shown that, at the cost of a slight lower improvement in SNR and PESQ score, the new algorithm reduces the computation time to 1/30 which makes it suitable for practical applications.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.