A Conditional Generative Model for Speech Enhancement

Zeng-Xi Li,Yan Song,Ian Mcloughlin,Li-Rong Dai

doi:10.1007/s00034-018-0798-4

Abstract

Deep learning-based speech enhancement approaches like deep neural networks (DNN) and Long Short-Term Memory (LSTM) have already demonstrated superior results to classical methods. However, these methods do not take full advantage of temporal context information. While DNN and LSTM consider temporal context in the noisy source speech, it does not do so for the estimated clean speech. Both DNN and LSTM also have a tendency to over-smooth spectra, which causes the enhanced speech to sound muffled. This paper proposes a novel architecture to address both issues, which we term a conditional generative model (CGM). By adopting an adversarial training scheme applied to a generator of deep dilated convolutional layers, CGM is designed to model the joint and symmetric conditions of both noisy and estimated clean spectra. We evaluate CGM against both DNN and LSTM in terms of Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) on TIMIT sentences corrupted by ITU-T P.501 and NOISEX-92 noise in a range of matched and mismatched noise conditions. Results show that both the CGM architecture and the adversarial training mechanism lead to better PESQ and STOI in all tested noise conditions. In addition to yielding significant improvements in PESQ and STOI, CGM and adversarial training both mitigate against over-smoothing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Conditional Generative Model for Speech Enhancement

Abstract

Talk to us

Similar Papers

More From: Circuits, Systems, and Signal Processing

Lead the way for us

Journal: Circuits, Systems, and Signal Processing	Publication Date: Mar 13, 2018
Citations: 10

Similar Papers

Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice Activity Detection
Geon Woo Lee ... Hong Kook Kim
Applied Sciences | VOL. 10
Geon Woo Lee, et. al.Geon Woo Lee ... Hong Kook Kim
06 May 2020
Applied Sciences | VOL. 10

Deep causal speech enhancement and recognition using efficient long-short term memory Recurrent Neural Network.
Zhenqing Li ... Amil Daraz
PloS one | VOL. 19
Zhenqing Li, et. al.Zhenqing Li ... Amil Daraz
03 Jan 2024
PloS one | VOL. 19

A Hybrid Approach for Deep Noise Suppression Using Deep Neural Networks
Mohit Bansal ... Arnold Sachith A Hans
-
Mohit Bansal, et. al.Mohit Bansal ... Arnold Sachith A Hans
01 Jan 2021
01 Jan 2021

New research on monaural speech segregation based on quality assessment
Xiaoping Xie ... Fei Ding
Computer Speech & Language | VOL. 85
Xiaoping Xie, et. al.Xiaoping Xie ... Fei Ding
05 Dec 2023
Computer Speech & Language | VOL. 85

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Conditional Generative Model for Speech Enhancement

Abstract

Talk to us

Similar Papers

More From: Circuits, Systems, and Signal Processing