A Multi-Resolution Approach to GAN-Based Speech Enhancement

Hyung Yong Kim,Nam Soo Kim,Sung Jun Cheon,Ji Won Yoon,Woo Hyun Kang

doi:10.3390/app11020721

Abstract

Recently, generative adversarial networks (GANs) have been successfully applied to speech enhancement. However, there still remain two issues that need to be addressed: (1) GAN-based training is typically unstable due to its non-convex property, and (2) most of the conventional methods do not fully take advantage of the speech characteristics, which could result in a sub-optimal solution. In order to deal with these problems, we propose a progressive generator that can handle the speech in a multi-resolution fashion. Additionally, we propose a multi-scale discriminator that discriminates the real and generated speech at various sampling rates to stabilize GAN training. The proposed structure was compared with the conventional GAN-based speech enhancement algorithms using the VoiceBank-DEMAND dataset. Experimental results showed that the proposed approach can make the training faster and more stable, which improves the performance on various metrics for speech enhancement.

Highlights

Speech enhancement is essential for various speech applications such as robust speech recognition, hearing aids, and mobile communications [1,2,3,4]
Motivated from the progressive generative adversarial networks (GANs), which starts with generating low-resolution images and progressively increases the resolution [30,31], we propose a novel generator that can incrementally widen the frequency band of the speech by applying an up-sampling block to the decoder Gdec
SERGAN and the proposed method were evaluated in terms of the real-time factor(RTF) to verify the real-time feasibility, which is defined as the ratio of the time taken to enhance the speech to the duration of the speech

Summary

Introduction

Speech enhancement is essential for various speech applications such as robust speech recognition, hearing aids, and mobile communications [1,2,3,4]. A number of efforts have been devoted to stabilize the GAN training in image processing, by modifying the loss function [28] or the generator and discriminator structures [30,31]. We propose novel generator and discriminator structures for the GANbased speech enhancement which reflect the speech characteristics while ensuring stable training. The conventional generator is trained to find a mapping function from the noisy speech to the clean speech by using sequential convolution layers, which is considered an ineffective approach especially for speech data. Empirical results showed that the proposed generator and discriminator were successful in stabilizing GAN training and outperformed the conventional GAN-based speech enhancement techniques. The experimental results showed that the multi-scale structure is an effective solution for both deterministic and GAN-based models, outperforming the conventional GANbased speech enhancement techniques.

GAN-Based Speech Enhancement

Multi-Resolution Approach for Speech Enhancement

Progressive Generator

Multi-Scale Discriminator

Dataset

Network Structure

Objective Evaluation

Subjective Evaluation

Experiments and Results

Objective Results

Subjective Results

Real-Time Feasibility

Analysis and Comparison of Spectorgrams

Fast and Stable Training of Proposed Model

Comparison with Conventional GAN-Based Speech Enhancement Techniques

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Jan 13, 2021
Citations: 17	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Multi-Resolution Approach to GAN-Based Speech Enhancement

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Speech Enhancement Generative Adversarial Network Architecture with Gated Linear Units and Dual-Path Transformers
Dehui Zhang ... You Zhou
-
Dehui Zhang, et. al.Dehui Zhang ... You Zhou
09 Oct 2022
09 Oct 2022

Improved Relativistic Cycle-Consistent GAN With Dilated Residual Network and Multi-Attention for Speech Enhancement
Yutian Wang ... Guochen Yu
IEEE Access | VOL. 8
Yutian Wang, et. al.Yutian Wang ... Guochen Yu
01 Jan 2020
IEEE Access | VOL. 8

Noise-management algorithm may improve speech intelligibility in noise
Francis K Kuk ... Carsten Paludan-Müller
The Hearing Journal | VOL. 59
Francis K Kuk, et. al.Francis K Kuk ... Carsten Paludan-Müller
01 Apr 2006
The Hearing Journal | VOL. 59

Joint Ideal Ratio Mask and Generative Adversarial Networks for Monaural Speech Enhancement
Jing Yuan ... Changchun Bao
-
Jing Yuan, et. al.Jing Yuan ... Changchun Bao
01 Aug 2018
01 Aug 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Multi-Resolution Approach to GAN-Based Speech Enhancement

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences