Abstract

Speech super-resolution or speech bandwidth expansion aims to upsample a given speech signal by generating the missing high-frequency content. In this paper, we propose a deep neural network approach exploiting the adversarial training ideas that have been shown effective in image super-resolution. Specifically, our proposed network follows the generative adversarial networks setup, where the generator network uses a convolutional autoencoder architecture with one-dimensional convolution kernels to generate high-frequency log-power spectra from the low-frequency log-power spectra of the input speech. We propose to use both the reconstruction loss and the adversarial loss for training, and we employ a recent regularization method that penalizes the gradient norms of the discriminator to stabilize the training. We compare our proposed approach with two state-of-the-art neural network baselines and evaluate these methods with both objective speech quality measures and subjective perceptual and intelligibility tests. Results show that our proposed method outperforms both baselines in terms of both objective and subjective evaluations. To gain insights of the network architecture, we analyze key parameters of the proposed network including the number of layers, the number of convolution kernels, and the relative weight of the reconstruction and adversarial losses. Besides, we analyze the computational complexity of our method and the baselines and discuss ways for phase estimation. We further develop a noise-resilient version of the proposed approach by training the network with noisy speech inputs. Objective evaluation validates the noise-resilient property on unseen noise types.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.