The face swapping methods available only use the posture and expression of the target images to guide the face swapping process, which often ignores other attributes such as background and lighting. Moreover, the swapped faces are poorly fused with the target images. Thus, a face swapping method combining multi-level attributes and attention mechanism is proposed. At the stage of extracting target attributes, a multi-level attribute encoder is designed based on the U-Net structure, using multi-level cascaded convolutional and deconvolutional blocks and inter-layer connections to accurately and comprehensively extract the expression and background attributes of the target images, preserving more detailed information. Meanwhile, in the swapped faces generation stage, a generator is designed that incorporates the attention mechanism to adaptively adjust the effective area of the integration of source face features and target attributes, so that the swapped faces are more consistent with visual mechanism. Experimental results on FaceForensics++ show that compared with the DeepFaceLab method, the structural similarity between the swapped faces and the target images is improved by 6.73%, and the differences in head posture and facial expression are reduced by 1.026 and 0.491, respectively. Proposed method retains the source face features better, preserves a greater degree of fidelity to the target image attributes, and achieves a good swapped effect.
Read full abstract