Abstract

Existing 3D face alignment and face reconstruction methods mainly focus on the accuracy of the model. When the existing methods are applied to dynamic videos, the stability and accuracy are significantly reduced. To overcome this problem, we propose a novel regression framework that strikes a balance between accuracy and stability. First, on the basis of lightweight backbone, encoder-decoder structure is used to jointly learn expression details and detailed 3D face from video images to recover shape details and their relationship to facial expression, and dynamic regression of a small number of 3D face parameters, effectively improve the speed and accuracy. Secondly, in order to further improve the stability of face landmarks in video, a jitter loss function of multi-frame image joint learning is proposed to strengthen the correlation between frames and face landmarks in video, and reduce the difference amplitude of face landmarks between adjacent frames to reduce the jitter of face landmarks. Experiments on several challenging datasets verify the effectiveness of our method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.