Recurrent neural network for spectral mapping in speech bandwidth extension

Yingxue Wang,Shenghui Zhao,Jingming Kuang,Qiang Zhu,Jianxin Li

doi:10.1109/globalsip.2016.7905840

Abstract

We present a recurrent neural network (RNN) based speech bandwidth extension (BWE) method. The conventional Gaussian mixture model (GMM) based BWE methods perform stably and effectively. However, GMM based methods suffer from two fundamental and competing problems: 1) inadequacy of GMM in modeling the non-linear relationship between the low frequency (LF) and high frequency (HF), 2) temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To cope these problems, a RNN is employed to capture temporal information and construct deep non-linear relationships between the spectral envelope features of LF and HF. The proposed RNN is trained layer-by-layer from a cascade of two recurrent temporal restricted Boltzmann machines (RTRBMs) and a feedforward neural network (NN). The proposed method takes advantage of the strong ability of RTRBMs in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM based methods and other NN based methods.

Full Text