Abstract

In audiovisual speech communication, the lower part of the face (mainly lips and jaw) actively participates during speech production. Modeling well lip motion and deformation in audiovisual speech synthesis is important to achieve realism and effective communication. This is essential for challenged population as hard-of-hearing people or new language learners. In this scope, we propose a technique that allows for animation of a human face with realistic lips using a limited number of control points. We have used an articulograph that provides high temporal and spatial precision, allowing tracking the positions of small electromagnetic sensors, even when occluded, which is often the case when tracking the lip movement. In our work, the control point data are first acquired, then fitted to a 3D face model of a human speaker, i.e., each control point is associated with a region of the face by minimizing the distance between the control points and the surface of the face model. Finally, we apply an interpolation scheme of the displacement field between the control points. This displacement field describes the deformation of a surface. In the case of the face, this method is well adapted to animating the region of the face that is highly correlated with speech, specifically the lips and the lower part of the face, even with a very limited number of control points.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call