Abstract

Emotional human facial animation has become an indispensable technique in a series of multimedia systems. The technique first generates the phoneme and emotion sequences. Then, the viseme/expression sequences are calculated accordingly, which are further converted into a coherent facial animation video. In this work, a completely automatic system is designed by selecting acoustic features discriminative to both emotion and phoneme tags. More specifically, acoustic features highly representative to both emotion and phoneme tags are selected under a multi-task learning framework. Based on this, speech phoneme and emotion sequences are effectively calculated. Then, an active learning algorithm is developed to discover the key facial frames representative to both the phoneme and emotion tags. Finally, we associate each phoneme + emotion tuple with a key facial frame. And a popular morphing algorithm is employed to fit them into a coherent animation video. Experimental results have demonstrated that our generated facial animation video is natural, coherent, and highly synchronized with the input speech.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.