Improved Relativistic Cycle-Consistent GAN With Dilated Residual Network and Multi-Attention for Speech Enhancement

Yutian Wang,Guochen Yu,Qin Zhang,Jingling Wang,Hui Wang

doi:10.1109/access.2020.3029417

Yutian Wang, Guochen Yu + Show 3 more

Open Access

https://doi.org/10.1109/access.2020.3029417

Copy DOI

Abstract

Generative adversarial networks (GANs) have been increasingly used as feature mapping functions in speech enhancement, in which the noisy speech features are transformed to the clean ones through the generators. This article proposes a novel speech enhancement model based on a cycle-consistent relativistic GAN with Dilated Residual Networks and a Multi-attention mechanism. Using the adversarial loss, improved cycle-consistency losses, and an identity-mapping loss, a noisy-to-clean generator G and an inverse clean-to-noisy generator F simultaneously learn the forward and backward mappings between the source and target domains. To guarantee the stability of the training process, we replace vanilla GAN loss with relativistic average GAN loss and use spectral normalization in discriminators so that they conform to Lipschitz continuity. Furthermore, we employ two attention-based components as multi-attention mechanism to reduce importing signal distortion: attention U-net gates and dilated residual self-attention blocks. By employing these components, our proposed generators can capture long-term inner dependencies between elements of speech features and further preserve linguistic information. Experimental results on a public dataset indicate that the proposed model achieves state-of-the-art speech enhancement performance, especially in reducing speech distortion and improving signal overall quality. Compared with the representative GAN-based approaches, the proposed method significantly achieves the best performance in terms of STOI, CSIG, COVL, and CBAK objective metrics. Moreover, we demonstrate the contribution of each proposed component including relativistic average loss, attention U-net gate, self-attention layers, spectral normalization, and dilation operation by ten comparison systems.

Highlights

Speech enhancement removes additive noisy interferences from noisy speech signal while preserving the intelligibility of the original clean speech
By employing the relativistic loss, multi-attention, and dilated residual networks, our method outperforms all the baselines in terms of CSIG, COVL, and STOI measures, proving that our method focuses on reducing speech distortion, as well as improving speech intelligibility and overall signal quality
Compared with the deep feature losses model (DFL-SE), the CSIG and COVL values of our enhanced speech are respectively increased by 8.8% and 8.1%, whilst providing near CBAK scores, proving our method focuses on reducing speech distortion instead of background noise

Summary

INTRODUCTION

Speech enhancement removes additive noisy interferences from noisy speech signal while preserving the intelligibility of the original clean speech. Wang et al.: Improved Relativistic Cycle-Consistent GAN With Dilated Residual Network and Multi-Attention as the source speech and the noise is strictly additive, which are uncommon in the real environment To address these limitations, some approaches (e.g., denoising auto-encoders [11]) train a direct mapping network from the noisy input features to the enhanced ones. Inspired by recent studies on CycleGAN-based approaches for speech processing [22]–[24], we propose a relativistic-loss cycleconsistent GAN with multi-attention and dilated residual network (DRN) for single-channel speech enhancement This model contains a noisy-to-clean generator G and an inverse clean-to-noisy generator F, which transforms the noisy features into the enhanced ones and vice versa. We impose two relativistic average adversarial losses LRLSadv (DX ) and LRLSadv (FY →X ) for the inverse noisy-to-clean mapping

IMPROVED CYCLE-CONSISTENCY LOSS

IMPROVED TECHNIQUES FOR OUR MODEL

OBJECTIVE

Findings

CONCLUSION AND FUTURE WORK