Learning Dynamic Stream Weights For Coupled-HMM-based Audio-visual Speech Recognition

Ahmed Hussen Abdelaziz,Steffen Zeiler,Dorothea Kolossa

doi:10.1109/taslp.2015.2409785

Abstract

With the increasing use of multimedia data in communication technologies, the idea of employing visual information in automatic speech recognition (ASR) has recently gathered momentum. In conjunction with the acoustical information, the visual data enhances the recognition performance and improves the robustness of ASR systems in noisy and reverberant environments. In audio-visual systems, dynamic weighting of audio and video streams according to their instantaneous confidence is essential for reliably and systematically achieving high performance. In this paper, we present a complete framework that allows blind estimation of dynamic stream weights for audio-visual speech recognition based on coupled hidden Markov models (CHMMs). As a stream weight estimator, we consider using multilayer perceptrons and logistic functions to map multidimensional reliability measure features to audiovisual stream weights. Training the parameters of the stream weight estimator requires numerous input-output tuples of reliability measure features and their corresponding stream weights. We estimate these stream weights based on oracle knowledge using an expectation maximization algorithm. We define 31-dimensional feature vectors that combine model-based and signal-based reliability measures as inputs to the stream weight estimator. During decoding, the trained stream weight estimator is used to blindly estimate stream weights. The entire framework is evaluated using the Grid audio-visual corpus and compared to state-of-the-art stream weight estimation strategies. The proposed framework significantly enhances the performance of the audio-visual ASR system in all examined test conditions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Learning Dynamic Stream Weights For Coupled-HMM-based Audio-visual Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE/ACM Transactions on Audio, Speech, and Language Processing	Publication Date: Jan 1, 2015
Citations: 51

Similar Papers

Stream weight estimation for multistream audio–visual speech recognition in a multispeaker environment
Xu Shao ... Jon Barker
Speech Communication | VOL. 50
Xu Shao, et. al.Xu Shao ... Jon Barker
19 Nov 2007
Speech Communication | VOL. 50

A newem estimationof dynamic stream weights for coupled-HMM-based audio-visual ASR
Ahmed Hussen Abdelaziz ... Dorothea Kolossa
-
Ahmed Hussen Abdelaziz, et. al.Ahmed Hussen Abdelaziz ... Dorothea Kolossa
01 May 2014
01 May 2014

Unsupervised Stream-Weights Computation in Classification and Recognition Tasks
Eduardo Sanchez-Soto ... Khalid Daoudi
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 17
Eduardo Sanchez-Soto, et. al.Eduardo Sanchez-Soto ... Khalid Daoudi
01 Mar 2009
IEEE Transactions on Audio, Speech, and Language Processing | VOL. 17

Audio-visual speech recognition using minimum classification error training
C Miyajima ... K Tokuda
-
C Miyajima, et. al.C Miyajima ... K Tokuda
11 Dec 2000
11 Dec 2000

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Learning Dynamic Stream Weights For Coupled-HMM-based Audio-visual Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing