An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech

Yan-Hui Tu,Jun Du,Qing Wang,Xiao Bao,Li-Rong Dai,Chin-Hui Lee

doi:10.1016/j.csl.2016.12.004

Abstract

We present an information fusion approach to the robust recognition of multi-microphone speech. It is based on a deep learning framework with a large deep neural network (DNN) consisting of subnets designed from different perspectives. Multiple knowledge sources are then reasonably integrated via an early fusion of normalized noisy features with multiple beamforming techniques, enhanced speech features, speaker-related features, and other auxiliary features concatenated as the input to each subnet to compensate for imperfect front-end processing. Furthermore, a late fusion strategy is utilized to leverage the complementary natures of the different subnets by combining the outputs of all subnets to produce a single output set. Testing on the CHiME-3 task of recognizing microphone array speech, we demonstrate in our empirical study that the different information sources complement each other and that both early and late fusions provide significant performance gains, with an overall word error rate of 10.55% when combining 12 systems. Furthermore, by utilizing an improved technique for beamforming and a powerful recurrent neural network (RNN)-based language model for rescoring, a WER of 9.08% can be achieved for the best single DNN system with one-pass decoding among all of the systems submitted to the CHiME-3 challenge.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Dec 23, 2016
Citations: 15

Similar Papers

An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework
Jun Du ... Qing Wang
-
Jun Du, et. al.Jun Du ... Qing Wang
01 Dec 2015
01 Dec 2015

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition
Xiong Xiao ... Eng Siong Chng
-
Xiong Xiao, et. al.Xiong Xiao ... Eng Siong Chng
01 Mar 2017
01 Mar 2017

Unsupervised adaptation of student DNNS learned from teacher RNNS for improved ASR performance
Lahiru Samarakoon ... Brian Mak
-
Lahiru Samarakoon, et. al.Lahiru Samarakoon ... Brian Mak
01 Dec 2017
01 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language