Abstract

The bandwidth of the speech signals is often limited in speech communications due to the specifications of the standardized codecs or insufficient bitrates. So, we present a bandwidth extension (BWE) framework of speech signals coded by the enhanced voice services (EVS) codec. Previous studies on speech bandwidth extension based on convolutional neural network (CNN) which exist channel and spatial redundancy feature information. This work proposes an end-to-end architecture which combines novel channel and spatial reconstruction module. Specifically, the spatial reconstruction module utilizes a mask to distinguish high-frequency and low-frequency features. Group convolutions is then employed to enhance high-frequency features and suppress low-frequency features. The channel reconstruction module is introduced to reduce unnecessary feature information. Additionally, we introduce a novel time–frequency loss function, incorporating time-domain loss and frequency-domain loss based on Mel-spectrum and multi-resolution representation, to optimize the network. To assess the model and loss function’s performance, we conducted experiments on subdatasets encoded at rates of 6.6kbps, 7.2kbps, and 8kbps, respectively. The experimental results demonstrated that our proposed model surpassed other baseline models in terms of LSD, SNR, PESQ, and MOS scores.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call