Abstract

The problem of visual speech representation for bimodal based speech recognition includes particular challenges in themodeling of the inner lip texture reflecting different pronunciations,such as the appearance of teeth and tongue. This paper proposesand analyzesseveral possible statistical inner lip texture descriptors to determine an effective and discriminantfeature. Simply usinggrayscale without full specification of the underlying colour model tends to loss some significant discriminative information. Therefore thorough exploration on the color space components selection in computing the local inner lip texture is thus a primary goal of the present research. The L channel of Lab color space is finally determined as the basis for the development of the inner lip texture model. Through feature level fusion, the final classification of visual speech is performedbased on the proposed inner lip texture descriptor and standard geometric features. Together with audio speech,this paper furthers the development ofthe CHMM based bimodal Chinese character pronunciationrecognition system. The experimental results show that the local inner texture descriptors, such as the color moment with geometric feature,outperform the holistic inner texture descriptors, such as the statistical histogram, in representing visual speechwith theclose discriminability but low dimensionality.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call