Abstract

Recently machine learning based speech enhancement approaches have shown immense promise to improve the intelligibility of noisy speech for both normal hearing and hearing impaired listeners. In this paper we study speech intelligibility potential of the single-microphone speech enhancement based on Deep Neural Networks (DNNs), a part of machine learning family. We have shown that DNN based speech enhancement approach, once trained purposely to handle many types of noise and signal-to-noise ratios (SNRs), shown immense potential of attaining large speech intelligibility improvements. The deep neural network models are trained to learn mapping from the noisy speech features and the coefficients of ratio time-frequency masks are estimated. The estimated masks are applied to noisy speech magnitude spectra in order to attain an enhanced intelligibility speech by utilizing the phase of noisy speech. The results at many different noisy conditions including exhibition hall, coffee shop, airport, car and babble and five SNRs: −10dB, −5dB, 0dB, 5dB and 10dB reported that deep neural network-based ratio mask outperformed the competing methods including Nonnegative matrix factorization (NMF) and log minimum mean square error (LMMSE) estimation in terms of the short time objective intelligibility (STOI) and Normalized subband envelope correlation (NSEC) objective speech intelligibility metrics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call