Abstract
Noise and reverberation can severely degrade speech quality and intelligibility, so many deep neural network-based noisy-reverberant speech enhancement methods have been proposed, among which classic methods include spectral masking and spectral mapping. Spectrum masking and spectrum mapping have their advantages and disadvantages in different noise environments, and they are complementary. This paper proposes a dual branch deep interactive UNet (DBDIUNet) for monaural speech enhancement to combine the advantages of spectral mapping and spectral masking. The DBDIUNet uses a classical encoder-decoder architecture, including a shared encoder and two decoders. One decoder outputs the complex ideal ratio mask (cIRM), and the other outputs the enhanced complex spectrum. The two signals are coupled by coherent averaging to get the enhanced speech signal. A novel deep interaction structure is proposed for the interaction of information between the two decoders, which achieves a very significant performance improvement at the minimal cost of computational consumption and hyperparameters. Compared with the noisy speech on the Interspeech 2020 deep noise suppression challenge blind test set, DBDIUNet improves the WB-PESQ, NB-PESQ, STOI, SI-SDR indicators by 1.575, 0.955, 7.9%, 8.67 respectively. In the noisy-reverberant speech enhancement test, DBDIUNet improves the WB-PESQ, STOI, SI-SDR, DNSMOS, and SRMR by 0.98, 10.24%, 5.43, 1.51, 3.43, respectively, which exceeds the state-of-the-art model.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have