Adaptive fusion of acoustic and visual sources for automatic speech recognition

Alexandrina Rogozan,Paul Deléglise

doi:10.1016/s0167-6393(98)00056-9

Abstract

Among the various methods proposed to improve the accuracy and the robustness of automatic speech recognition (ASR), the use of additional knowledge sources is a successful one. In particular, a recent method proposes supplementing the acoustic information with visual data mostly derived from the speaker's lip shape. Perceptual studies support this approach by emphasising the importance of visual information for speech recognition in humans. This paper describes a method we have developed for adaptive integration of acoustic and visual information in ASR. Each modality is involved in the recognition process with a different weight, which is dynamically adapted during this process mainly according to the signal-to-noise ratio provided as a contextual input. We tested this method on continuous hidden Markov model-based systems developed according to direct identification (DI), separate identification (SI) and hybrid identification (DI + SI) strategies. Experiments performed under various noise-level conditions show that the DI + SI based system is the most promising one when compared to both DI and SI-based systems for a speaker-dependent continuous-spelling of French letters recognition task. They also confirm that using adaptive modality weights instead of fixed weights allows for performance improvement and that weight estimation could benefit from using visemes as decision units for the visual recogniser in SI and DI + SI based systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Adaptive fusion of acoustic and visual sources for automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Oct 1, 1998
Citations: 63

Similar Papers

DISCRIMINATIVE LEARNING OF VISUAL DATA FOR AUDIOVISUAL SPEECH RECOGNITION
Alexandrina Rogozan
International Journal on Artificial Intelligence Tools | VOL. 08
Alexandrina RogozanAlexandrina Rogozan
01 Mar 1999
International Journal on Artificial Intelligence Tools | VOL. 08

Reaching over the gap: A review of efforts to link human and automatic speech recognition research
Odette Scharenborg
Speech Communication | VOL. 49
Odette ScharenborgOdette Scharenborg
03 Feb 2007
Speech Communication | VOL. 49

Comparing human and automatic speech recognition in simple and complex acoustic scenes
Constantin Spille ... Bernd T Meyer
Computer Speech & Language | VOL. 52
Constantin Spille, et. al.Constantin Spille ... Bernd T Meyer
14 Apr 2018
Computer Speech & Language | VOL. 52

Phoneme confusions in human and automatic speech recognition
Bernd T Meyer ... Birger Kollmeier
-
Bernd T Meyer, et. al.Bernd T Meyer ... Birger Kollmeier
27 Aug 2007
27 Aug 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive fusion of acoustic and visual sources for automatic speech recognition

Abstract

Talk to us

Similar Papers

More From: Speech Communication