Abstract

Bimodal speech recognition is a novel extension of acoustic speech recognition for which both acoustic and visual speech information are used to improve the recognition accuracy in noisy environments. Although various bimodal speech systems have been developed, a rigorous and detailed comparison of the possible geometric visual features from speakers' faces has not been given yet in the previous papers. Thus, in this paper, the geometric visual features are compared and analyzed rigorously for their importance in bimodal speech recognition. The relevant information of each possible single visual feature is used to determine the best combination of geometric visual features for both visual-only and bimodal speech recognition. From the geometric visual features analyzed, lip vertical aperture is the most relevant; and the set formed by the vertical and horizontal lip apertures and the first order derivative of the lip corner angle gives the best results among the possibilities of reduced set of geometric features that were analyzed. Also, in this paper, the effect of the modelling parameters of hidden Markov models (HMM) on each single geometric lip feature's recognition accuracy is analyzed. Finally, the accuracy of acoustic-only, visual-only, and bimodal speech recognition methods are experimentally determined and compared using the optimized HMMs and geometric visual features. Compared to acoustic and visual-only speech recognition, the bimodal speech recognition scheme has a much improved recognition accuracy using the geometric visual features, especially in the presence of noise. The results obtained showed that a set of as few as three labial geometric features are sufficient to improve the recognition rate by as much as 20% (from 62%, with acoustic-only information, to 82%, with audio-visual information at a signal to noise ratio (SNR) of 0 dB).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.