Flow-Guided Deformable Alignment Network with Self-Supervision for Video Inpainting

Changchang Sun,Yan Yan,Hanyu Xuan,Zhiliang Wu,Kang Zhang

doi:10.1109/icassp49357.2023.10096432

Abstract

Video inpainting aims to utilize plausible contents to fill missing regions in the video. State-of-the-art video inpainting methods typically generate the missing contents of the target frame (current frame) by aggregating the temporal information of reference frames (neighboring frames) aligned using deformable convolution. However, these deformable convolution alignment networks often suffer from offset overflow during training, resulting in unsatisfactory alignment, thereby obtaining compromised inpainting performance. In this paper, we propose a self-supervised Flow-Guided Deformable Alignment (FGDA) network for aligning reference frames at the feature level. FGDA computes the residual of the optical flow as the offsets. This design can effectively reduce the burden of offsets learning, thereby avoiding offset overflow. Furthermore, a gradient-weighted reconstruction loss for supervised completed frame reconstruction is designed, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed texture regions, so that detail textures get more attention during training. Experiments show that FGDA-based video inpainting model trained with gradient-weighted reconstruction loss outperforms the state-of-the-art by a significant margin in terms of PSNR and SSIM with relative improvements of 6.2% and 2.1%, respectively.

Full Text