Abstract
In this paper, we propose a deep neural network (DNN) ensemble for reducing artificial noise in speech bandwidth extension (BWE). The proposed DNN ensemble consists of three DNN models; one is a classification model, and the other two are regression models. When estimating sub-band energies of the high-frequency region using sequential DNNs in a frequency domain, the over-estimation of sub-band energies causes annoying artificial noise. To mitigate this artificial noise, we design a DNN classification model that can classify over-estimation frames against normal frames. Then, we separately develop two DNN regression models using half of the entire training set and a limited training set built with over-estimation frames and some normal frames to improve the performance at the over-estimation frames. Since the outputs of the classification model are probabilities of either a normal frame or an over-estimation frame, respectively, two regression models are adjustably combined by using the probabilistic weights; thus, the final output of the DNN ensemble is the weighted sum of two estimated sub-band energies. As a result, artificial noise is significantly reduced, yielding improved speech quality. The proposed method is objectively and subjectively evaluated by comparing it with conventional approaches.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have