Abstract

This paper focuses on a regression-based deep neural network (DNN) approach for single-channel speech enhancement. While DNN can lead to improved speech quality compared to classical approaches, it is afflicted by high computational complexity in the training stage. The main contribution of this work is to reduce the DNN complexity by introducing a spectral feature mapping from noisy mel frequency cepstral coefficients (MFCC) to enhanced short-time Fourier transform (STFT) spectrum. This approach requires much fewer input features and consequently lead to reduced DNN complexity. Exploiting the frequency domain speech features obtained from this mapping also avoids the information loss in reconstructing the speech signal back to time domain from its MFCC. Compared to the STFT-based DNN approach, the complexity of our approach for the training phase is reduced by a factor of 4.75. Moreover, experimental results of perceptual evaluation of speech quality (PESQ) and source-to-distortion ratio (SDR) show that the proposed approach outperforms the benchmark algorithms and this for various noise types, and different SNR levels.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call