Abstract

Video inpainting aims to utilize plausible contents to fill missing regions in the video. State-of-the-art video inpainting methods typically generate the missing contents of the target frame (current frame) by aggregating the temporal information of reference frames (neighboring frames) aligned using deformable convolution. However, these deformable convolution alignment networks often suffer from offset overflow during training, resulting in unsatisfactory alignment, thereby obtaining compromised inpainting performance. In this paper, we propose a self-supervised Flow-Guided Deformable Alignment (FGDA) network for aligning reference frames at the feature level. FGDA computes the residual of the optical flow as the offsets. This design can effectively reduce the burden of offsets learning, thereby avoiding offset overflow. Furthermore, a gradient-weighted reconstruction loss for supervised completed frame reconstruction is designed, which can use the gradients in all directions of the video frame to emphasize the difficultly reconstructed texture regions, so that detail textures get more attention during training. Experiments show that FGDA-based video inpainting model trained with gradient-weighted reconstruction loss outperforms the state-of-the-art by a significant margin in terms of PSNR and SSIM with relative improvements of 6.2% and 2.1%, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call