Multi-task multimodal feature refinement for emotional speech animation

Jinqing Shen,Yunzhong Yu,Chongbiao Zhang,Yongming Xu,Feiwei Li,Yiyang Yao

doi:10.1016/j.jvcir.2018.11.043

Abstract

Emotional human facial animation has become an indispensable technique in a series of multimedia systems. The technique first generates the phoneme and emotion sequences. Then, the viseme/expression sequences are calculated accordingly, which are further converted into a coherent facial animation video. In this work, a completely automatic system is designed by selecting acoustic features discriminative to both emotion and phoneme tags. More specifically, acoustic features highly representative to both emotion and phoneme tags are selected under a multi-task learning framework. Based on this, speech phoneme and emotion sequences are effectively calculated. Then, an active learning algorithm is developed to discover the key facial frames representative to both the phoneme and emotion tags. Finally, we associate each phoneme + emotion tuple with a key facial frame. And a popular morphing algorithm is employed to fit them into a coherent animation video. Experimental results have demonstrated that our generated facial animation video is natural, coherent, and highly synchronized with the input speech.

Full Text