Adaptive Fusion of Speech and Lip Information for Robust Speaker Identification

T Wark,S Sridharan

doi:10.1006/dspr.2001.0397

Abstract

Wark, T., and Sridharan, S., Adaptive Fusion of Speech and Lip Information for Robust Speaker Identification, Digital Signal Processing 11 (2001) 169–186 This paper compares techniques for asynchronous fusion of speech and lip information for robust speaker identification. In any fusion system, the ultimate challenge is to determine the optimal way to combine all information sources under varying conditions. We propose a new method for estimating confidence levels to allow intelligent fusion of the audio and visual data. We describe a secondary classification system, where secondary classifiers are used to give approximations for the estimation errors of outputs likelihoods from primary classifiers. The error estimates are combined with a dispersion measure technique allowing an adaptive fusion strategy based on the level of data degradation at the time of testing. We compare the performance of this fusion system with two other approaches to linear fusion and show that the use of secondary classifiers is an effective technique for improving classification performance. Identification experiments are performed on the M2VTS multimodal database , with encouraging results.

Full Text