This paper presents a novel approach to video super-resolution (VSR) by focusing on the selection of input frames, a process critical to VSR. VSR methods typically rely on deep learning techniques, those that are able to learn features from a large dataset of low-resolution (LR) and corresponding high-resolution (HR) videos and generate high-quality HR frames from any new LR input frames using the learned features. However, these methods often use as input the immediate neighbouring frames to a given target frame without considering the importance and dynamics of the frames across the temporal dimension of a video. This work aims to address the limitations of the conventional sliding-window mechanisms by developing input frame selection algorithms. By dynamically selecting the most representative neighbouring frames based on content-aware selection measures, our proposed algorithms enable VSR models to extract more informative and accurate features that are better aligned with the target frame, leading to improved performance and higher-quality HR frames. Through an empirical study, we demonstrate that the proposed dynamic content-aware selection mechanism improves super-resolution results without any additional architectural overhead, offering a counter-intuitive yet effective alternative to the long-established trend of increasing architectural complexity to improve VSR results.
Read full abstract