Abstract

With the rapid development of artificial intelligence and the increasing popularity of smart devices, human-computer interaction technology has become a multimedia and multimode technology from being computer-focused to people-centered. Among all ways of human-computer interactions, using language to interact with machines is the most convenient and efficient one. However, the performance of audio speech recognition systems is not satisfied in a noisy environment. Thus, more and more researchers focus their works on visual lip reading technology. By extracting lip movement features of speakers rather than audio features, visual lip reading systems can get superior results when noises and interferences exist. Lip segmentation plays an important role in a visual lip reading system, since the segmentation result is crucial to the final recognition accuracy. In this paper, we propose a localized active contour model-based method using two initial contours in a combined color space. We apply illumination equalization to original RGB images to decrease the interference of uneven illumination. A combined color space consists of the U component in CIE-LUV color space and the sum of C2 and C3 components of the image after discrete Hartley transform. We select a rhombus as the initial contour of a closed mouth, because it has a similar shape to a closed lip. For an open mouth, we utilize a combined semi-ellipse as the initial contours of both outer and inner lip boundaries. After attaining the results of each color component separately, we merge them together to obtain the final segmentation result. From the experiment, we can conclude that this method can get better segmentation results compared with the method using a circle as the initial contour to segment gray images and images in combined color space, especially for open mouth. An extremely obvious advantage of this method is the results of open mouth excluding internal information of mouth such as teeth, black holes, and tongue, because of the introduction of the inner initial contour.

Highlights

  • Visual lip reading is a technology which combines machine vision and language perception

  • In order to improve and perfect the lip segmentation, we propose the inner initial contour according to the shape of outer contour

  • We apply the initial contours to each component of combined color space and merge the results of each component to gain the final convergence result

Read more

Summary

Introduction

Visual lip reading is a technology which combines machine vision and language perception. Visual lip reading systems identify face region from images or videos by machine vision, extract the mouth variation features of speakers and determine the pronunciations of these features by recognition model, thereby recognizing the speech contents. This system receives more and more attention in the field of human-computer interaction (HCI), pattern recognition (PR), and artificial intelligence (AI) in recent years. ACM can obtain subpixel accuracy of object boundaries [3, 4] This model can be developed within the framework of the energy minimization principle.

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call