Abstract

Optical flow estimation in human facial video, which provides 2D correspondences between adjacent frames, is a fundamental pre-processing step for many applications, like facial expression capture and recognition. However, it is quite challenging as human facial images contain large areas of similar textures, rich expressions, and large rotations. These characteristics also result in the scarcity of large, annotated real-world datasets. We propose a robust and accurate method to learn facial optical flow in a self-supervised manner. Specifically, we utilize various shape priors, including face depth, landmarks, and parsing, to guide the self-supervised learning task via a differentiable nonrigid registration framework. Extensive experiments demonstrate that our method achieves remarkable improvements for facial optical flow estimation in the presence of significant expressions and large rotations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call