Abstract

In this paper, high-speed size and orientation invariant lips detection of a talking person in an active scene using template matching and genetic algorithms is proposed. As part of the objectives, we also try to acquire numerical parameters to represent the lips. The information is very important for many applications, where high performance is required, such as audio-visual speech recognition, speaker identification systems, robot perception and personal mobile devices interfaces. The difficulty in lips detection is mainly due to deformations and geometric changes of the lips during speech and the active scene by free camera motion. In order to enhance the performance in speed and accuracy, initially, the performance is improved on a single still image, that is, the base of video processing. Our proposed system is based on template matching using genetic algorithms (GA). Only one template is prepared per experiment. The template is the closed mouth of a subject, because the application is for personal devices. In our previous study, the main problem was trade-off between search accuracy and search speed. To overcome this problem, we use two methods: scaling window and dynamic search domain control (SD-Control). We therefore focus on the population size of the GA, because it has a direct effect on search accuracy and speed. The effectiveness of the proposed system is demonstrated by performing computer simulations. We achieved a lips detection accuracy of 91.33% at an average processing time of 33.70 milliseconds per frame.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call