Abstract

Voice activity detection is an important part of the development of speech functions for on-board car navigation and assistance systems. It is difficult to detect voice activity using only sound information in a vehicle environment that has a wide variety of sounds and noises. We propose an suitable image feature and integration method that can be used to develop a robust bimodal voice activity detection (VAD) systems using a driver's voice and facial images. We select the normal correlation value between sequential mouth images and the number of low-intensity pixels in mouth image, which we then used as the feature for VAD. We propose a system in which the discrimination function consist of the sum of weighted singles feature discrimination functions and combinations of logical addition and multiplication of singles feature discrimination functions. The experimental results show that the proposed sound and image features can be useful and that the proposed integration method has a 97% hit rate, which is 9 points better than the previous integration method at the point that false alarm rate is about 12%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.