Abstract

Nowadays, deformable 3D convolutional network (D3D-ConvNet) has been widely used in the field of video super-resolution (VSR), due to the powerful capabilities of multiple frame alignment and spatiotemporal information extraction. However, the expensive computational complexity and irregular memory accesses caused by the deformable 3D convolution (D3D-Conv) hinder the deployment of D3D-ConvNet on edge devices. To tackle these issues, an algorithm and hardware co-optimization framework is proposed to accelerate D3D-ConvNet for VSR on field-programmable gate array (FPGA) in this paper. Firstly, at the algorithm level, a tile decoupling computing strategy is introduced to execute the computations at the tile-level granularity. Thus, the memory requirements caused by the deformed receptive field and large input resolution can be reduced significantly. Moreover, the image restoration performance can be maintained sufficiently. Secondly, based on the proposed strategy, the computing modules together with a ping-pong transposition storage scheme are designed to accelerate D3D-Conv operations, avoiding irregular memory access patterns. Thirdly, an overall hardware architecture and a memory-efficient dataflow are further developed to accelerate deformable convolutional layers. Experimental results demonstrate that the proposed design provides more promising visual perceptions than prior FPGA-based VSR methods. Meanwhile, our design surpasses existing hardware implementations for deformable convolutions significantly in terms of throughput and processing speed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call