Video frame interpolation is a classic computer vision task that aims to generate in-between frames given two consecutive frames. In this paper, a flow-based interpolation method (FI-Net) is proposed. FI-Net is a lightweight end-to-end neural network that takes two frames in arbitrary size as input and outputs the estimated intermediate frame. Novelly, it computes optical flow at feature level instead of image level. Such practice can increase the accuracy of estimated flow. Multi-scale technique is utilized to handle large motions. For training, a comprehensive loss function that contains a novel content loss (Sobolev loss) and a semantic loss is introduced. It forces the generated frame to be close to the ground truth one at both pixel level and semantic level. We compare FI-Net with previous methods and it achieves higher performance with less time consumption and much smaller model size.
Read full abstract