Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

Zhen-Hua Ling,Yu Gu,Li-Rong Dai,Yang Ai

doi:10.1109/taslp.2018.2798811

Abstract

This paper presents a waveform modeling and generation method using hierarchical recurrent neural networks (HRNN) for speech bandwidth extension (BWE). Different from conventional BWE methods which predict spectral parameters for reconstructing wideband speech waveforms, this BWE method models and predicts waveform samples directly without using vocoders. Inspired by SampleRNN which is an unconditional neural audio generator, the HRNN model represents the distribution of each wideband or high-frequency waveform sample conditioned on the input narrowband waveform samples using a neural network composed of long short-term memory (LSTM) layers and feed-forward (FF) layers. The LSTM layers form a hierarchical structure and each layer operates at a specific temporal resolution to efficiently capture long-span dependencies between temporal sequences. Furthermore, additional conditions, such as the bottleneck (BN) features derived from narrowband speech using a deep neural network (DNN)-based state classifier, are employed as auxiliary input to further improve the quality of generated wideband speech. The experimental results of comparing several waveform modeling methods show that the HRNN-based method can achieve better speech quality and run-time efficiency than the dilated convolutional neural network (DCNN)-based method and the plain sample-level recurrent neural network (SRNN)-based method. Our proposed method also outperforms the conventional vocoder-based BWE method using LSTM-RNNs in terms of the subjective quality of the reconstructed wideband speech.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: May 1, 2018
Citations: 50

Similar Papers

Conversational evaluation of artificial bandwidth extension of telephone speech using a mobile handset
Hannu Pulakka ... Paavo Alku
-
Hannu Pulakka, et. al.Hannu Pulakka ... Paavo Alku
01 Mar 2012
01 Mar 2012

Sentiment Analysis in the Light of LSTM Recurrent Neural Networks
Subarno Pal ... Amitava Nag
International Journal of Synthetic Emotions | VOL. 9
Subarno Pal, et. al.Subarno Pal ... Amitava Nag
01 Jan 2018
International Journal of Synthetic Emotions | VOL. 9

ConvLSNet: A lightweight architecture based on ConvLSTM model for the classification of pulmonary conditions using multichannel lung sound recordings
Faezeh Majzoobi ... Sobhan Goudarzi
Artificial Intelligence In Medicine | VOL. 154
Faezeh Majzoobi, et. al.Faezeh Majzoobi ... Sobhan Goudarzi
22 Jun 2024
Artificial Intelligence In Medicine | VOL. 154

Active Noise Reduction with Filtered Least-Mean-Square Algorithm Improved by Long Short-Term Memory Models for Radiation Noise of Diesel Engine
Semin Kwon ... Bo-Seung Kim
Applied Sciences | VOL. 12
Semin Kwon, et. al.Semin Kwon ... Bo-Seung Kim
12 Oct 2022
Applied Sciences | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing