Abstract

This paper is concerned with the problem of synthesizing animating face driven by new audio sequence, which is not present in the previously recorded database. Future video frames and past video frames influence the dynamics of current video frame, so the dynamics of speech and facial expressions needed to be learned to model an efficient speech driven facial animation. We have incorporated the features of future frames and past frames along with current frame feature to derive the features of complex current frame. K nearest neighbor algorithm is used to derive the simple current frame from complex current frame in testing phase. The inertia of facial muscles is different than inertia of vocal organs, so the change in speech features has different rate than change in video frame features corresponding to the speech. We have incorporated an inter-frame distance vector as feature of speech and video and used audio-video hidden Markov model for mapping the speech features into video features.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call