A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech

Yan-Hui Tu,Jun Du,Chin-Hui Lee

doi:10.1007/s11265-017-1295-x

Abstract

We propose a novel speaker-dependent (SD) multi-condition (MC) training approach to joint learning of deep neural networks (DNNs) of acoustic models and an explicit speech separation structure for recognition of multi-talker mixed speech in a single-channel setting. First, an MC acoustic modeling framework is established to train a SD-DNN model in multi-talker scenarios. Such a recognizer significantly reduces the decoding complexity and improves the recognition accuracy over those using speaker-independent DNN models with a complicated joint decoding structure assuming the speaker identities in mixed speech are known. In addition, a SD regression DNN for mapping the acoustic features of mixed speech to the speech features of a target speaker is jointly trained with the SD-DNN based acoustic models. Experimental results on Speech Separation Challenge (SSC) small-vocabulary recognition show that the proposed approach under multi-condition training achieves an average word error rate (WER) of 3.8%, yielding a relative WER reduction of 65.1% from a top performance, DNN-based pre-processing only approach we proposed earlier under clean-condition training (Tu et al. 2016). Furthermore, the proposed joint training DNN framework generates a relative WER reduction of 13.2% from state-of-the-art systems under multi-condition training. Finally, the effectiveness of the proposed approach is also verified on the Wall Street Journal (WSJ0) task with medium-vocabulary continuous speech recognition in a simulated multi-talker setting.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech

Abstract

Talk to us

Similar Papers

More From: Journal of Signal Processing Systems

Lead the way for us

Journal: Journal of Signal Processing Systems	Publication Date: Oct 4, 2017
Citations: 6

Similar Papers

A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition
Yan-Hui Tu ... Li-Rong Dai
-
Yan-Hui Tu, et. al.Yan-Hui Tu ... Li-Rong Dai
01 Oct 2016
01 Oct 2016

A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition
Zhen Huang ... Chin-Hui Lee
Neurocomputing | VOL. 218
Zhen Huang, et. al.Zhen Huang ... Chin-Hui Lee
14 Sep 2016
Neurocomputing | VOL. 218

Automatic acoustic siren detection in traffic noise by part-based models
Jens Schroder ... Stefan Goetze
-
Jens Schroder, et. al.Jens Schroder ... Stefan Goetze
01 May 2013
01 May 2013

Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training.
Arun Narayanan ... Deliang Wang
IEEE/ACM transactions on audio, speech, and language processing | VOL. 23
Arun Narayanan, et. al.Arun Narayanan ... Deliang Wang
01 Jan 2014
IEEE/ACM transactions on audio, speech, and language processing | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech

Abstract

Talk to us

Similar Papers

More From: Journal of Signal Processing Systems