Adaptive speaker identification with audiovisual cues for movie content analysis

Ying Li,Shrikanth S Narayanan,C.-C.Jay Kuo

doi:10.1016/j.patrec.2004.01.004

Adaptive speaker identification with audiovisual cues for movie content analysis

Ying Li, Shrikanth S Narayanan + Show 1 more

https://doi.org/10.1016/j.patrec.2004.01.004

Copy DOI

Journal: Pattern recognition letters	Publication Date: Feb 20, 2004
Citations: 19

Affiliation: University of Southern California

#Mouth Tracking #Likelihood-based Approach + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

An adaptive speaker identification system which employs both audio and visual cues is proposed in this work for movie content analysis. Specifically, a likelihood-based approach is first applied for speaker identification using pure speech data, and techniques such as face detection/recognition and mouth tracking are applied for talking face recognition using pure visual data. These two information cues are then effectively integrated under a probabilistic framework for achieving more robust results. Moreover, to account for speakers' voice variations along time, we propose to update their acoustic models on the fly by adapting to their incoming speech data. An improved system performance (80% identification accuracy) has been observed on two test movies.

Full Text