Abstract

One study whose importance has significantly grown in recent years is lip-reading, particularly with the widespread of using deep learning techniques. Lip reading is essential for speech recognition in noisy environments or for those with hearing impairments. It refers to recognizing spoken sentences using visual information acquired from lip movements. Also, the lip area, especially for males, suffers from several problems, such as the mouth area containing the mustache and beard, which may cover the lip area. This paper proposes an automatic lip-reading system to recognize and classify short English sentences spoken by speakers using deep learning networks. The input video extracts frames and each frame is passed to the Viola-Jones to detect the face area. Then 68 landmarks of the facial area are determined, and the landmarks from 48 to 68 represent the lip area extracted based on building a binary mask. Then, the contrast is enhanced to improve the quality of the lip image by applying contrast adjustment. Finally, sentences are classified using two deep learning models, the first is AlexNet, and the second is VGG-16 Net. The database consists of 39 participants (32 males and 7 females). Each participant repeats the short sentences five times. The outcomes demonstrate the accuracy rate of AlexNet is 90.00%, whereas the accuracy rate for VGG-16 Net is 82.34%. We concluded that AlexNet performs better for classifying short sentences than VGG-16 Net.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call