Abstract
Video frame interpolation can flexibly increase the temporal resolution of low frame-rate videos by generating the missing intermediate frames at any time. Existing methods generally train a renderer to predict rgb frames based on estimated clues. It tends to generate blurry outputs with unpleasing artifacts due to false clues and the neural network's bias in favor of low-frequency information. To address this problem, we propose a novel two-stage supervised framework. The inaccuracy of clues is modeled as uncertainty which can be estimated by training implicitly with a parameterized loss function in stage one. The bias is alleviated in stage two by regressing a lossless decomposition of frames instead of the raw rgbs. The decomposition can be achieved by several invertible cross-coupling layers, motivating the network to synthesize high-frequency details. Moreover, the proposed framework is equipped with a time-varying neural network that is adaptive to the timestamp of any intermediate frame, bringing benefits to multiple-frame interpolation. Both qualitative and quantitative experiments demonstrate the superiority of our proposed approach.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have