Processing of audio and visual speech for telecommunication systems

Stephen Marshall

doi:10.1117/1.482675

Abstract

Most verbal communications use cues from both the vi- sual and acoustic modalities to convey messages. During the pro- duction of speech, the visible information provided by the external articulatory organs can influence the understanding of the language, by interpreting the combined information into meaningful linguistic expressions. The task of integrating speech and image data to emu- late the bimodal human interaction system can be depicted by de- veloping automated systems. These systems have a wide range of applications such as videophone systems, where the interdepen- dencies between image and speech signals can be exploited for data compression and in solving the task of lip synchronization which has been a major problem. Therefore the objective of this work is to investigate and quantify this relationship such that the knowledge gained will assist in longer term multimedia and video- phone research. © 1999 SPIE and IS&T. (S1017-9909(99)00703-5)

Full Text