Abstract

We propose a framework for estimation and analysis of temporal facial expression patterns of a speaker. The goal of this framework is to learn the personalized elementary dynamic facial expression patterns for a particular speaker. We track lip, eyebrow, and eyelid of the speaker in 3D across a head-and-shoulder stereo video sequence. We use MPEG-4 facial definition parameters (FDPs) to create the feature set, and MPEG-4 facial animation parameters (FAPs) to represent the temporal facial expression patterns. Hidden Markov model (HMM) based unsupervised temporal segmentation of upper and lower facial expression features is performed separately to determine recurrent elementary facial expression patterns for the particular speaker. These facial expression patterns, which are coded by FAP sequences and may not be tied with prespecified emotions, can be used for personalized emotion estimation and synthesis of a speaker. Experimental results are presented.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call