A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement

Yan-Hui Tu,Jun Du,Tian Gao,Chin-Hui Lee

doi:10.1109/taslp.2020.2996503

Abstract

We propose a multi-target, signal-to-noise-ratio (SNR)-progressive learning (SNR-PL) framework for regression based speech enhancement (SE). At low SNR levels, it is often not easy to directly learn the complicated regression required in SE. We therefore decompose the original SE problem of mapping noisy to clean speech features, with a large SNR gap, into a series of sub-problems, each with a small SNR increment and presumably easier to learn. In our configurations, each hidden layer of the proposed regression neural network is guided to explicitly learn an intermediate target with a specified but small SNR gain. Tested on both deep neural network (DNN) and long short-term memory (LSTM) architectures, SNR-PL consistently outperforms the conventional “black box” DNN framework in terms of both objective measure superiority and network model compactness. Furthermore, with the best configured LSTM-based SNR-PL model, we often observe that the performance is easily saturated or even degraded when increasing the number of intermediate targets, due to the fact that useful information is lost in dimension reduction when involving more target layers. Accordingly, to address this information loss issue, we explore densely connected networks on top of the LSTM structure where the input and the preceding intermediate targets are concatenated together to learn the next target. Finally, to fully utilize the rich and complementary information of intermediate targets, a simple post-processing strategy is adopted to further improve the performance. Evaluated on the simulation speech data, experimental results in unseen noises cases demonstrate that the proposed approach consistently performs better than the conventional LSTM approach in terms of objective speech enhancement measures for speech intelligibility and quality. Furthermore, when evaluated on real data provided by the CHiME-4 Challenge for automatic speech recognition (ASR) of noisy microphone array speech, we show that the proposed approach with intermediate outputs can directly improve the ASR performance, while the conventional LSTM approach increases the word error rate.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2020
Citations: 64

Similar Papers

Densely Connected Progressive Learning for LSTM-Based Speech Enhancement
Tian Gao ... Jun Du
-
Tian Gao, et. al.Tian Gao ... Jun Du
01 Apr 2018
01 Apr 2018

Speech Enhancement System for Automatic Speech Recognition in Automotive Environment
Gokul G Nair ... C Santhosh Kumar
-
Gokul G Nair, et. al.Gokul G Nair ... C Santhosh Kumar
06 Jul 2021
06 Jul 2021

Mariana
Yongqiang Zou ... Bin Xiao
Proceedings of the VLDB Endowment | VOL. 7
Yongqiang Zou, et. al.Yongqiang Zou ... Bin Xiao
01 Aug 2014
Proceedings of the VLDB Endowment | VOL. 7

Bottleneck and Embedding Representation of Speech for DNN-based Language and Speaker Recognition
Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
-
Alicia Lozano-Diez, et. al.Alicia Lozano-Diez ... Joaquin Gonzalez-Rodriguez
21 Nov 2018
21 Nov 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing