Abstract

We introduce single-channel supervised speech enhancement algorithms based on regularized non-negative matrix factorization (NMF). In the proposed framework, the log-likelihood functions (LLF) of the magnitude spectra for both the clean speech and noise, based on Gaussian mixture models (GMM), are included as regularization terms in the NMF cost function. By using this proposed regularization as a priori information in the enhancement stage, we can exploit the statistical properties of both the clean speech and noise signals. For further improvement of the enhanced speech quality, we also incorporate a masking model of the human auditory system in our approach. Specifically, we construct a weighted Wiener filter (WWF) where the power spectral densities (PSD) of the speech and noise are estimated from the above mentioned NMF algorithm with the proposed regularization. The weighting factor in the WWF is selected based on a masking threshold which is obtained from the estimated PSD of the enhanced speech. Experimental results of perceptual evaluation of speech quality (PESQ), source-to-distortion ratio (SDR) and segmental signal-to-noise ratio (SNR) show that the proposed speech enhancement algorithms (i.e., regularized NMF with and without masking model) provide better performance in speech enhancement than the benchmark algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call