Adversarial training for data-driven speech enhancement without parallel corpus

Takuya Higuchi,Tomohiro Nakatani,Keisuke Kinoshita,Marc Delcroix

doi:10.1109/asru.2017.8268914

Abstract

This paper describes a way of performing data-driven speech enhancement for noise robust automatic speech recognition (ASR), where we train a model for speech enhancement without a parallel corpus. Data-driven speech enhancement with deep models has recently been investigated and proven to be a promising approach for ASR. However, for model training, we need a parallel corpus consisting of noisy speech signals and corresponding clean speech signals for supervision. Therefore a deep model can be trained only with a simulated dataset, and we cannot take advantage of a large number of noisy recordings that do not have corresponding clean speech signals. As a first step towards model training without supervision, this paper proposes a novel approach introducing adversarial training for a time-frequency mask estimator. Our cost function for model training is defined by discriminators instead of by using the distance between the model outputs and the supervision. The discriminators distinguish between true signals and enhanced signals obtained with time-frequency masks estimated with a mask estimator. The mask estimator is trained to cheat the discriminators, which enables the mask estimator to estimate the appropriate time-frequency masks without a parallel corpus. The enhanced signal is finally obtained with masking-based beamforming. Experimental results show that, even without exploiting parallel data, our speech enhancement approach achieves improved ASR performance compared with results obtained with unprocessed signals and achieves comparable ASR performance to that obtained with a model trained with a parallel corpus based on a minimum mean squared error (MMSE) criterion.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adversarial training for data-driven speech enhancement without parallel corpus

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation
Xiong Xiao ... Haizhou Li
EURASIP Journal on Advances in Signal Processing | VOL. 2016
Xiong Xiao, et. al.Xiong Xiao ... Haizhou Li
13 Jan 2016
EURASIP Journal on Advances in Signal Processing | VOL. 2016

Autocorrelation-based Methods for Noise-Robust Speech Recognition
Gholamreza Farahani ... Mohammad Mehdi
-
Gholamreza Farahani, et. al.Gholamreza Farahani ... Mohammad Mehdi
01 Jun 2007
01 Jun 2007

Novel speech processing techniques for robust automatic speech recognition

-

01 Jan 2006
01 Jan 2006

Speech enhancement by minimum mean-square error spectral amplitude estimation assuming weibull speech priors
Mojtaba Bahrami ... Neda Faraji
-
Mojtaba Bahrami, et. al.Mojtaba Bahrami ... Neda Faraji
01 Oct 2017
01 Oct 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adversarial training for data-driven speech enhancement without parallel corpus

Abstract

Talk to us

Similar Papers