Abstract

Deformable convolutional networks (DCNs) have shown outstanding potential in video super-resolution with their powerful inter-frame feature alignment. However, deploying DCNs on resource-limited devices is challenging, due to their high computational complexity and irregular memory accesses. In this work, an algorithm-hardware co-optimization framework is proposed to accelerate the DCNs on field-programmable gate array (FPGA). Firstly, at the algorithm level, an anchor-based lightweight deformable network (ALDNet) is proposed to extract spatio-temporal information from the aligned features, boosting the visual effects with low model complexity. Secondly, to reduce intensive multiplications, an innovative shift-based deformable 3D convolution is developed using low-cost bit shifts and additions, maintaining comparable reconstruction quality. Thirdly, at the hardware level, a dedicated critical processing core, together with a block-level interleaving storage scheme, is presented to avoid dynamic and irregular memory accesses caused by the deformable convolutions. Finally, an overall architecture is designed to accelerate the ALDNet and implemented on an Intel Stratix 10GX platform. Experimental results demonstrate that the proposed design can provide significantly better visual perception than other FPGA-based super-resolution implementations. Meanwhile, compared with the prior hardware accelerators, our design can achieve 2.75 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> and 1.63 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> improvements in terms of throughput and energy efficiency, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call