An adaptive integration based on product hmm for audio-visual speech recognition

K Kumatani,S Nakamura,K Shikano

doi:10.1109/icme.2001.1237846

Abstract

There have been higher demands recently for Automatic Speech Recognition (ASR) systems able to operate robustly in acoustically noisy environments. This paper proposes a method to effectively integrate audio and visual information in audiovisual (bi-modal) ASR systems. For such integration, the following issues are important: (1) The synchronization of the audio and visual information, and (2) The optimization of a system in its environment. In (1), the individual feature of the speech and lip movements has the time lag, and has the correlation. To address this problem, we introduce an integration method using HMM composition. In (2), we examine whether the GPD algorithm can adaptively estimate the stream weights. Evaluation experiments show that the proposed method improves the recognition accuracy for noisy speech.

Full Text