Abstract
The lip region provides the most direct visual information in the process of multi-sensory speech perception, which is applied to speech recognition and lip reading. In this paper, we extract eight lip features in articulating the basic vowels [a], [e], [i], [u], [ü] in standard Chinese, and analyze the efficiency in distinguishing the five vowels combined with articulatory phonetics. We use Dense Convolutional Network (DenseNet) to process two-dimensional lip images and fuse the lip features to identify the Chinese with consonants. The results show that the application of lip shape features in Chinese vowel recognition and Chinese consonant lip reading is consistent. Two-dimensional lip images can effectively improve the recognition rate by fusing lip features in lip reading.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.