Interpolating video frames involving large motions remains an elusive challenge. In case that frames involve small and fast-moving objects, conventional feed-forward neural network-based approaches that estimate optical flow and synthesize in-between frames sequentially often result in loss of motion features and thus blurred boundaries. To address the problem, we propose a novel Recurrent Motion-Enhanced Interpolation Network (ReMEI-Net) by assigning attention to the motion features of small objects from both the intra-scale and inter-scale perspectives. Specifically, we add recurrent feedback blocks in the existing multi-scale autoencoder pipeline, aiming to iteratively enhance the motion information of small objects across different scales. Second, to further refine the motion features of the highly moving objects, we propose a Multi-Directional ConvLSTM (MD-ConvLSTM) block to capture the global spatial contextual information of motion from multiple directions. In this way, the coarse-scale features can be utilized to correct and enhance the fine-scale features through the feedback mechanism. Extensive experiments on various datasets demonstrate the superiority of our proposed method over state-of-the-art approaches in terms of clear locations and complete shape.
Read full abstract