In the current landscape, high-resolution (HR) videos have gained immense popularity, promising an elevated viewing experience. Recent research has demonstrated that the video super-resolution (SR) algorithm, empowered by deep neural networks (DNNs), can substantially enhance the quality of HR videos by processing low-resolution (LR) frames. However, the existing DNN models demand significant computational resources, posing challenges for the deployment of SR algorithms on client devices. While numerous accelerators have proposed solutions, their primary focus remains on client-side optimization. In contrast, our research recognizes that the HR video is originally stored in the cloud server and presents an untapped opportunity for achieving both high accuracy and performance improvements. Building on this insight, this article introduces an end-to-end video CODEC-assisted super-resolution (E 2 SR+) algorithm, which tightly integrates the cloud server with the client device to deliver a seamless and real-time video viewing experience. We propose the motion vector search algorithm executed in a cloud server, which can search the motion vectors and residuals for a part of the HR video frames and then pack them as add-ons. We also design an auto-encoder algorithm to down-sample the residuals to save the bitstream cost while guaranteeing the quality of the residuals. Lastly, we propose a reconstruction algorithm performed in the client to quickly reconstruct the corresponding HR frames using the add-ons to skip part of the DNN computations. To implement the E 2 SR+ algorithm, we design corresponding E 2 SR+ architecture in the client, which achieves significant speedup with minimal hardware overhead. Given that the environmental condition varies in the server–client hierarchies, we believe that simply applying E 2 SR+ to all frames is irrational. Accordingly, we offer an environmental condition–aware system to chase the best performance while adapting to the diverse environment. In the system, we design a linear programming (LP) model to simulate the environment and allocate frames to three existing mechanisms. Our experimental results demonstrate that the E 2 SR+ algorithm enhances the peak signal-to-noise ratio by 1.2, 2.5, and 2.3 compared with the state-of-the-art (SOTA) methods EDVR, BasicVSR, and BasicVSR++, respectively. In terms of performance, the E 2 SR+ architecture offers significant improvements over existing SOTA methods. For instance, while BasicVSR++ requires 98 ms on an NVIDIA V100 graphics processing unit (GPU) to generate a 1,280 × 720 HR frame, the E 2 SR+ architecture reduces the execution time to just 39 ms, highlighting the efficiency and effectiveness of our proposed method. Overall, the E 2 SR+ architecture respectively achieves 1.4×, 2.2×, 4.6×, and 442.0× performance improvement compared with ADAS, ISRAcc, the NVIDIA V100 GPU, and a central processing unit. Lastly, the proposed system showcases its superiority and surpasses all the existing mechanisms in terms of execution time when varying environmental conditions.
Read full abstract