Abstract

We present a recurrent neural network (RNN) based speech bandwidth extension (BWE) method. The conventional Gaussian mixture model (GMM) based BWE methods perform stably and effectively. However, GMM based methods suffer from two fundamental and competing problems: 1) inadequacy of GMM in modeling the non-linear relationship between the low frequency (LF) and high frequency (HF), 2) temporal correlations across speech frames are ignored, resulting in spectral detail loss of the reconstructed speech by BWE. To cope these problems, a RNN is employed to capture temporal information and construct deep non-linear relationships between the spectral envelope features of LF and HF. The proposed RNN is trained layer-by-layer from a cascade of two recurrent temporal restricted Boltzmann machines (RTRBMs) and a feedforward neural network (NN). The proposed method takes advantage of the strong ability of RTRBMs in discovering the temporal correlation between adjacent frames and modeling deep non-linear relationships between input and output. Both the objective and subjective evaluations indicate that our proposed method outperforms the conventional GMM based methods and other NN based methods.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.