音声と画像の統合によるドライバの発話区間検出

Yoshiki Ninomiya,Chiyomi Miyajima,Daisuke Negi,Kensaku Mori,Toshiki Maeno,Yasuhito Suenaga,Takayuki Kitasaka,Yoshihide Ban

doi:10.3169/itej.62.435

Abstract

Voice activity detection is an important part of the development of speech functions for on-board car navigation and assistance systems. It is difficult to detect voice activity using only sound information in a vehicle environment that has a wide variety of sounds and noises. We propose an suitable image feature and integration method that can be used to develop a robust bimodal voice activity detection (VAD) systems using a driver's voice and facial images. We select the normal correlation value between sequential mouth images and the number of low-intensity pixels in mouth image, which we then used as the feature for VAD. We propose a system in which the discrimination function consist of the sum of weighted singles feature discrimination functions and combinations of logical addition and multiplication of singles feature discrimination functions. The experimental results show that the proposed sound and image features can be useful and that the proposed integration method has a 97% hit rate, which is 9 points better than the previous integration method at the point that false alarm rate is about 12%.

Full Text