Abstract

This paper proposes to synthesize expressive head and facial gestures on talking avatar using the three dimensional pleasure-displeasure, arousal-nonarousal and dominance-submissiveness (PAD) descriptors of semantic expressivity. The PAD model is adopted to bridge the gap between text semantics and visual motion features with three dimensions of pleasure-displeasure, arousal-nonarousal, and dominance-submissiveness. Based on the correlation analysis between PAD annotations and motion patterns derived from the head and facial motion database, we propose to build an explicit mapping from PAD descriptors to facial animation parameters with linear regression and neural networks for head motion and facial expression respectively. A PAD-driven talking avatar in text-to-visual-speech system is implemented by generating expressive head motions at the prosodic word level based on the (P, A) descriptors of lexical appraisal, and facial expressions at the sentence level according to the PAD descriptors of emotional information. A series of PAD reverse evaluation and comparative perceptual experiments shows that the head and facial gestures synthesized based on PAD model can significantly enhance the visual expressivity of talking avatar.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call