3D face reconstruction is an important research direction in computer vision, and its goal is to recover a 3D face model from a single face picture. In the absence of real 3D face data, how to reconstruct a 3D face with a high degree of realism has become a hot research topic in recent years. Existing reconstruction algorithms usually rely on 3D labels generated from a large number of 2D face images as training data, however, inaccurate data will seriously affect the reconstruction quality. For this reason, the paper proposes a joint spatial-frequency domain decoupled weak supervision to achieve 3D face reconstruction, the main idea of which is to construct a multi-level loss function by using the weakly supervised information extracted from the spatial domain, and separating the frequency-domain information between the input and the rendered image in the frequency domain, and minimizing the difference between the two by difference computation. The method combines deep learning with 3D deformable models to reconstruct 3D models with high quality texture and shape from only a single face image. Quantitative experiments on the AFLW2000-3D and MICC Florence datasets show that the normalized average error in the small pose interval is as low as 2.42%, and the face reconstruction accuracy in the outdoor scene is 0.98 0.22 mm. Qualitative experiments on the MoFa-test, MICA datasets show that when faced with reconstruction with different poses, lighting, and expressions , our method outperforms other state-of-the-art reconstruction methods.
Read full abstract