Abstract

High-resolution high-frame-rate videos can record motion scenes detailedly and smoothly, but usually only professional cameras have enough transmission bandwidth to meet the video capture requirement. The conventional solutions use video processing methods such as video super-resolution (VSR) and video frame interpolation (VFI), but their results suffer from unreal spatial-temporal details in complex dynamic cases. To address this problem, we reconstruct a more real high-resolution high-frame-rate video using a hybrid video input, including a low-resolution high-frame-rate video (main video) and a high-resolution low-frame-rate video (auxiliary video). We propose a deep learning model named HIS-VSR, which consists of three parts: super-resolution of the main video, detail feature extraction of the auxiliary video and hybrid video information aggregation. Among them, the first part processes the main video to generate preliminary high-resolution frames; the second part warps the auxiliary frames for alignment and extracts their high-resolution detail features; the last part uses a weighted aggregation method to fuse the results of the first and second part. We train our model on synthetic datasets and demonstrate its excellent performance of reconstructing dynamic scenes by comparing it with Deep-SloMo on synthetic and real videos.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call