Abstract

This paper describes a neural approach intended to improve the performance of an automatic speech recognition system for unrestricted speakers by using not only voice sound features but also image features of the mouth shape. In particular, we used the natural sample voice signals and mouth shape images that were acquired in the general environment, neither in the sound isolation room nor under specific lighting conditions. The FFT power spectrum of acoustic speech was used as the voice feature. In addition, the gray level image, binary image and geometrical shape features of the mouth were used as the compensatory information, and compared which kinds of image features are effective. For unrestricted speakers, a vowel recognition rate of about 80% was obtained using only voice features, but this increased to some 92% when voice features plus binary images were used. This method can be applied not only to the improvement of voice recognition, but also to aid the communication of hearing-impaired people.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.