Abstract

Automatic lip-reading (ALR) is a challenging task and a significant amount of research has been devoted to this topic in recent years. However, continuous Russian speech recognition still remains a not well-investigated area. In this paper, we present the results of Russian visual speech recognition (VSR) system using pixel-based and advanced geometry-based features. A HAVRUS video database, comprising of 4000 utterances of continuous Russian speech, collected from 20 speakers, is used in this study. Pixel-based features (principal component analysis-based or PCA) and geometry-based features (active appearance model-based or AAM) were used for the feature representation, and a Gaussian mixture hidden Markov models (HMM) were used for classification. Our evaluation experiments show a significant improvement (up to 9%) in recognition accuracy by using proposed geometry-based features when compared to pixel-based PCA features. The combined VSR is planned for future studies to augment the performance of audio-based automatic speech recognition systems in human–robot interfaces.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call