Abstract

This paper presents a dynamic visual feature extraction scheme to capture important lip motion information for visual speech recognition. Discriminative projections based on a-priori chosen speech classes, phonemes and visemes, are applied to the concatenation of pre-extracted static visual features. First- and second-order temporal derivatives are subsequently extracted to further represent the dynamic differences. Experiments on a connected digits task demonstrate that the proposed high discriminative dynamic features, when augmented to the static, yields superior recognition performance. Compared to the commonly used delta and acceleration features, the proposed dynamic feature leads to an 8% absolute improvement in terms of word accuracy for the considered recognition task.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.