Adversarial Networks-Based Speech Enhancement with Deep Regret Loss

Hilman Pardede,Endang Suryawati,Renni Kusumowardani,Ade Ramdan,Asri R. Yuliani,Vicky Zilvan

doi:10.1109/niss55057.2022.10085296

Abstract

Speech enhancement is often applied for speech-based systems due to the proneness of speech signals to additive background noise. While speech processing-based methods are traditionally used for speech enhancement, with advancements in deep learning technologies, many efforts have been made to implement them for speech enhancement. Using deep learning, the networks learn mapping functions from noisy data to clean ones and then learn to reconstruct the clean speech signals. As a consequence, deep learning methods can reduce what is so-called musical noise that is often found in traditional speech enhancement methods. Currently, one popular deep learning architecture for speech enhancement is generative adversarial networks (GAN). However, the cross-entropy loss that is employed in GAN often causes the training to be unstable. So, in many implementations of GAN, the cross-entropy loss is replaced with the least-square loss. In this paper, to improve the training stability of GAN using cross-entropy loss, we propose to use deep regret analytic generative adversarial networks (Dragan) for speech enhancements. It is based on applying a gradient penalty on cross-entropy loss. We also employ relativistic rules to stabilize the training of GAN. Then, we applied it to the least square and Dragan losses. Our experiments suggest that the proposed method improve the quality of speech better than the least-square loss on several objective quality metrics.

Full Text