Visual voice activity detection via chaos based lip motion measure robust under illumination changes

Taeyup Song,Hanseok Ko,Kyungsun Lee

doi:10.1109/tce.2014.6852001

Abstract

In this paper, a vision based voice activity detection (VVAD) algorithm is proposed using chaos theory. In conventional VVAD algorithm, the movement measure of lip region is found by applying an optical flow algorithm to detect the visual speech frame using a motion based energy feature set. However, since motion based feature is unstable under illumination changes, a new form of robust feature set is desirable. It is propositioned that contextual changes such as lip opening or closing motion during speech utterances under illumination variation can be observed as chaos-like and the resultant complex fractal trajectories in phase space can be observed. The fractality is measured in phase space from two sequential video input frames and subsequently any visual speech frames are robustly detected. Representative experiments are performed in image sequence containing a driver scene undergoing illumination fluctuations in moving vehicle environment. Experimental results indicate that a substantial improvement is obtained in terms of achieving significantly lower false alarm rate over the conventional method.

Full Text