Abstract
Video super-resolution aims to generate high-resolution frames from low-resolution counterparts. It can be regarded as a specialized application of image super-resolution, serving various purposes, such as video display and surveillance. This paper proposes a novel method for real-time video super-resolution. It effectively exploits spatial information by utilizing the capabilities of an image super-resolution model and leverages the temporal information inherent in videos. Specifically, the method incorporates a pre-trained image super-resolution network as its foundational framework, allowing it to leverage existing expertise for super-resolution. A fast temporal information aggregation module is presented to further aggregate temporal cues across frames. By using deformable convolution to align features of neighboring frames, this module takes advantage of inter-frame dependency. In addition, it employs a hierarchical fast spatial offset feature extraction and a channel attention-based temporal fusion. A redundancy-aware inference algorithm is developed to reduce computational redundancy by reusing intermediate features, achieving real-time inferring speed. Extensive experiments on several benchmarks demonstrate that the proposed method can reconstruct satisfactory results with strong quantitative performance and visual qualities. The real-time inferring ability makes it suitable for real-world deployment.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have