Abstract

Two of the communication channels conveying more information in human-to-human interaction are face and speech. A robust interpretation of the information being expressed by people can be obtained by the combined analysis of both sources of information: short-term facial feature evolution (face) and speech information. This way, face and speech combined analysis is the basis of a large number of human computer interfaces and services. Regardless of the final application of such interfaces, there are two aspects that are commonly required: detection of human faces and combination of both sources of information. In the first section of the chapter, we review the state of the art of face and facial feature detection. The various methods are analyzed from the perspective of the different models that they use to represent images and patterns: pixel based, block based, transform coefficient based and region based techniques. In the second section of the chapter, we present two examples of multimodal signal processing applications. The first one allows the localization of the speaker's mouth in a video sequence, using both the audio signal and the motion extracted from the video. The second application consists in recognizing the spoken words in a video sequence using both the audio and the images of moving lips.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.