Light-Weight Self-Attention Augmented Generative Adversarial Networks for Speech Enhancement

Lujun Li,Zhenxing Lu,Ludwig Kürzinger,Gerhard Rigoll,Tobias Watzel

doi:10.3390/electronics10131586

Lujun Li, Zhenxing Lu + Show 3 more

Open Access

https://doi.org/10.3390/electronics10131586

Copy DOI

Journal: Electronics	Publication Date: Jun 30, 2021
Citations: 8	License type: CC BY 4.0

Affiliation: Technical University of Munich

Abstract

Generative adversarial networks (GANs) have shown their superiority for speech enhancement. Nevertheless, most previous attempts had convolutional layers as the backbone, which may obscure long-range dependencies across an input sequence due to the convolution operator’s local receptive field. One popular solution is substituting recurrent neural networks (RNNs) for convolutional neural networks, but RNNs are computationally inefficient, caused by the unparallelization of their temporal iterations. To circumvent this limitation, we propose an end-to-end system for speech enhancement by applying the self-attention mechanism to GANs. We aim to achieve a system that is flexible in modeling both long-range and local interactions and can be computationally efficient at the same time. Our work is implemented in three phases: firstly, we apply the stand-alone self-attention layer in speech enhancement GANs. Secondly, we employ locality modeling on the stand-alone self-attention layer. Lastly, we investigate the functionality of the self-attention augmented convolutional speech enhancement GANs. Systematic experiment results indicate that equipped with the stand-alone self-attention layer, the system outperforms baseline systems across classic evaluation criteria with up to 95% fewer parameters. Moreover, locality modeling can be a parameter-free approach for further performance improvement, and self-attention augmentation also overtakes all baseline systems with acceptably increased parameters.

Highlights

Speech enhancement aims to improve speech intelligibility and quality in adverse environments by transforming the interfered speech to its original clean version [1]
This paper presents a series of speech enhancement generative neural networks (GANs) (SEGANs) equipped with a self-attention mechanism in three ways: first, we deploy the stand-alone self-attention layer in a SEGAN
A maximum of three layers of SEGAN are equipped with the self-attention mechanism each time: one convolutional layer of the encoder, one deconvolutional layer of the decoder, and one convolutional layer of the discriminator. They experimented with the performance of SASEGAN-all, i.e., coupling self-attention layers to allconvolutional layers, we query whether there are more optimized coupling combinations. For example, can coupling the self-attention mechanism to the 10th and 11thconvolutional layers outperform SASEGAN-all with even smaller parameters? In addition, inspired by [32,33], we explore the feasibility of substituting selfattention layers withconvolutional layers completely, namely SEGAN with stand-alone self-attention layers

Summary

Introduction

Speech enhancement aims to improve speech intelligibility and quality in adverse environments by transforming the interfered speech to its original clean version [1]. Speech enhancement can serve as a front end for downstream speech-related tasks, e.g., speech recognition [2], speaker identification [3], speech emotion recognition [4], etc. It is applied successfully in communication systems, e.g., hearing aids [5] and cochlear implants [6]. Most previous attempts had convolutional layers as the backbone, limiting the network’s ability in capturing longrange dependencies due to the convolution operator’s local receptive field To remedy this issue, one popular solution is substituting RNNs for CNNs, but RNNs are computationally inefficient, caused by the unparallelization of their temporal iterations

Objectives

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Light-Weight Self-Attention Augmented Generative Adversarial Networks for Speech Enhancement

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Deep Learning for Tropical Cyclone Nowcasting: Experiments with Generative Adversarial and Recurrent Neural Networks
Hamish Steptoe ... Theano Xirouchaki
-
Hamish Steptoe, et. al.Hamish Steptoe ... Theano Xirouchaki
27 Mar 2022
27 Mar 2022

Crowd jumping load simulation with generative adversarial networks
...
Smart Structures and Systems | VOL. 27
, et. al. ...
01 Apr 2021
Smart Structures and Systems | VOL. 27

Time-Frequency Mask-based Speech Enhancement using Convolutional Generative Adversarial Network
Neil Shah ... Meet H Soni
-
Neil Shah, et. al.Neil Shah ... Meet H Soni
01 Nov 2018
01 Nov 2018

Temporally Conditioning of Generative Adversarial Networks with LSTM for Music Generation
Romit Mohanty ... Pranjal Pratap Dubey
-
Romit Mohanty, et. al.Romit Mohanty ... Pranjal Pratap Dubey
23 Mar 2023
23 Mar 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Light-Weight Self-Attention Augmented Generative Adversarial Networks for Speech Enhancement

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics