Abstract
We propose a joint super resolution (SR) and frame interpolation framework that can perform both spatial and temporal super resolution. We identify performance variation according to permutation of inputs in video super-resolution and video frame interpolation. We postulate that favorable features extracted from multiple frames should be consistent regardless of input order if the features are optimally complementary for respective frames. With this motivation, we propose a permutation invariant deep architecture that makes use of the multi-frame SR principles by virtue of our order (permutation) invariant network. Specifically, given two adjacent frames, our model employs a permutation invariant convolutional neural network module to extract "complementary" feature representations facilitating both the SR and temporal interpolation tasks. We demonstrate the effectiveness of our end-to-end joint method against various combinations of the competing SR and frame interpolation methods on challenging video datasets, and thereby we verify our hypothesis.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have