In view of the problems of high production cost, scarcity and lack of diversity of 3D face datasets, this paper designs an end-to-end self-supervised learning 3D face reconstruction algorithm with a single 2D face image as input, which only uses 2D face datasets to complete model training. First, the improved ResNet module is introduced to preprocess the input face image. The deep residual neural network has strong feature extraction and characterization ability for the image, which can provide rich high-level semantic feature maps for the subsequent subnetwork. Then, add transformer module completely based on self-attention mechanism to the parameter prediction subnetwork, which can make different parameters of the subnetwork focus on self-related feature map information and avoid interference from invalid feature map information, so as to further improve the parameter prediction accuracy of the subnetwork. Next, training, ablation and comparison experiments were conducted on CelebA, BFM and Photoface datasets, and the combined function of pixel loss function and perceptual loss function was selected as the loss function. The experimental results show that: compared with the historical optimal results of the same network structure, the scale-invariant depth error (SIDE) and mean angle deviation (MAD) are improved by 5.9% and 10.8%, respectively, which strongly proves the effectiveness of the algorithm. Finally, in order to verify the actual effect of the 3D face reconstruction algorithm, examples are selected in this paper for reconstruction. The 3D faces generated by the algorithm all have a good sense of reality, which intuitively and effectively proves the advancement of the algorithm.
Read full abstract