Audio-visual speech recognition using an infrared headset

Jing Huang,Gerasimos Potamianos,Jonathan Connell,Chalapathy Neti

doi:10.1016/j.specom.2004.10.007

Abstract

It is well known that frontal video of the speaker’s mouth region contains significant speech information that, when combined with the acoustic signal, can improve accuracy and noise robustness of automatic speech recognition (ASR) systems. However, extraction of such visual speech information from full-face videos is computationally expensive, as it requires tracking faces and facial features. In addition, robust face detection remains challenging in practical human–computer interaction (HCI), where the subject’s posture and environment (lighting, background) are hard to control, and thus successfully compensate for. In this paper, in order to bypass these hindrances to practical bimodal ASR, we consider the use of a specially designed, wearable audio-visual headset, a feasible solution in certain HCI scenarios. Such a headset can consistently focus on the speaker’s mouth region, thus eliminating altogether the need for face tracking. In addition, it employs infrared illumination to provide robustness against severe lighting variations. We study the appropriateness of this novel device for audio-visual ASR by conducting both small- and large-vocabulary recognition experiments on data recorded using it under various lighting conditions. We benchmark the resulting ASR performance against bimodal data containing frontal, full-face videos collected at an ideal, studio-like environment, under uniform lighting. The experiments demonstrate that the infrared headset video contains comparable speech information to the studio, full-face video data, thus being a viable sensory device for audio-visual ASR.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Audio-visual speech recognition using an infrared headset

Abstract

Talk to us

Similar Papers

More From: Speech Communication

Lead the way for us

Journal: Speech Communication	Publication Date: Oct 1, 2004
Citations: 57

Similar Papers

Measuring the effect of high-speed video data on the audio-visual speech recognition accuracy
D V Ivanko ... M Zelezny
Information and Control Systems | VOL. -
D V Ivanko, et. al.D V Ivanko ... M Zelezny
19 Apr 2019
Information and Control Systems | VOL. -

Audio and visual modality combination in speech processing applications
Gerasimos Potamianos ... Argyrios Vartholomaios
-
Gerasimos Potamianos, et. al.Gerasimos Potamianos ... Argyrios Vartholomaios
24 Apr 2017
24 Apr 2017

Audio-visual speech recognition integrating 3D lip information obtained from the Kinect
Jianrong Wang ... Kiyoshi Honda
Multimedia Systems | VOL. 22
Jianrong Wang, et. al.Jianrong Wang ... Kiyoshi Honda
06 Dec 2015
Multimedia Systems | VOL. 22

Optimum integration weight for decision fusion audio-visual speech recognition
R Rajavel ... P.S Sathidevi
International Journal of Computational Science and Engineering | VOL. 10
R Rajavel, et. al.R Rajavel ... P.S Sathidevi
01 Jan 2015
International Journal of Computational Science and Engineering | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Audio-visual speech recognition using an infrared headset

Abstract

Talk to us

Similar Papers

More From: Speech Communication