Abstract

The lip region provides the most direct visual information in the process of multi-sensory speech perception, which is applied to speech recognition and lip reading. In this paper, we extract eight lip features in articulating the basic vowels [a], [e], [i], [u], [ü] in standard Chinese, and analyze the efficiency in distinguishing the five vowels combined with articulatory phonetics. We use Dense Convolutional Network (DenseNet) to process two-dimensional lip images and fuse the lip features to identify the Chinese with consonants. The results show that the application of lip shape features in Chinese vowel recognition and Chinese consonant lip reading is consistent. Two-dimensional lip images can effectively improve the recognition rate by fusing lip features in lip reading.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.