Abstract

Noise in the environment significantly decreases the speech intelligibility of telephone conversations. Despite clean speech output from the device, the listener is still hard to get information. This study focuses on intelligibility enhancement (IENH) of telephone speech in near-end background noise based on normal-to-Lombard speech conversion. The proposed approach uses long short-term memory (LSTM) and Bayesian Gaussian mixture model (BGMM) to build the speech mapping model. Compared with previous studies, we fully consider the short-term correlations of speech and implement feature mappings with higher dimensional features and more types of features. Evaluations indicate that the proposed approach has achieved better results in both objective and subjective evaluation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call