Audiovisual Voice Activity Detection Based on Microphone Arrays and Color Information

Vicente P Minotto,Jacob Scharcanski,Claudio R Jung,Carlos B O Lopes,Bowon Lee

doi:10.1109/jstsp.2012.2237379

Abstract

Audiovisual voice activity detection is a necessary stage in several problems, such as advanced teleconferencing, speech recognition, and human-computer interaction. Lip motion and audio analysis provide a large amount of information that can be integrated to produce more robust audiovisual voice activity detection (VAD) schemes, as we discuss in this paper. Lip motion is very useful for detecting the active speaker, and in this paper we introduce a new approach for lips and visual VAD. First, the algorithm performs skin segmentation to reduce the search area for lip extraction, and the most likely lip and non-lip regions are detected using a Bayesian approach within the delimited area. Lip motion is then detected using Hidden Markov Models (HMMs) that estimate the likely occurrence of active speech within a temporal window. Audio information is captured by an array of microphones, and the sound-based VAD is related to finding spatio-temporally coherent sound sources through another set of HMMs. To increase the robustness of the proposed system, a late fusion approach is employed to combine the result of each modality (audio and video). Our experimental results indicate that the proposed audiovisual approach presents better results when compared to existing VAD algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Audiovisual Voice Activity Detection Based on Microphone Arrays and Color Information

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing

Lead the way for us

Journal: IEEE Journal of Selected Topics in Signal Processing	Publication Date: Feb 1, 2013
Citations: 61

Similar Papers

Color-based lips extraction applied to voice activity detection
C B O Lopes ... J Scharcanski
-
C B O Lopes, et. al.C B O Lopes ... J Scharcanski
01 Sep 2011
01 Sep 2011

Improving hands-free speech recognition in a car through audio-visual voice activity detection
Friedrich Faubel ... Munir Georges
-
Friedrich Faubel, et. al.Friedrich Faubel ... Munir Georges
01 May 2011
01 May 2011

Simultaneous-Speaker Voice Activity Detection and Localization Using Mid-Fusion of SVM and HMMs
Vicente P Minotto ... Claudio R Jung
IEEE Transactions on Multimedia | VOL. 16
Vicente P Minotto, et. al.Vicente P Minotto ... Claudio R Jung
01 Jun 2014
IEEE Transactions on Multimedia | VOL. 16

ロボットを対象とした二階層視聴覚統合音声認識システム
Takami Yoshida ... Kazuhiro Nakadai
Journal of the Robotics Society of Japan | VOL. 28
Takami Yoshida, et. al.Takami Yoshida ... Kazuhiro Nakadai
01 Jan 2009
Journal of the Robotics Society of Japan | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Audiovisual Voice Activity Detection Based on Microphone Arrays and Color Information

Abstract

Talk to us

Similar Papers

More From: IEEE Journal of Selected Topics in Signal Processing