CNN Acceleration With Hardware-Efficient Dataflow for Super-Resolution

Sumin Lee,Sunghwan Joo,Hong Keun Ahn,Seong-Ook Jung

doi:10.1109/access.2020.3031055

Abstract

The convolutional neural network (CNN)-based super-resolution (SR) has shown outstanding performance in the field of computer vision. The implementation of inference hardware for CNN-based SR has suffered from the intensive computation with severely unbalanced computation load among layers. Various light-weighted SR networks have been researched with little performance degradation. However, the hardware-efficient dataflow is also required to efficiently accelerate inference hardware within limited resources. In this article, we propose the hardware-efficient dataflow of CNN-based SR that reduces computation load by increasing data reuse and increases process element (PE) utilization by balancing the computation load among layers for high throughput. In the proposed dataflow, row-wise pixels in the receptive field are computed by circularly shifting memory addresses to maximize data reuse. The partial convolution is exploited in a layer-based pipeline architecture to relieve intensive computation in a single pipeline stage. The delay-balancing with adjusting parallelism is employed for balancing computations precisely in the overall layers. Furthermore, the inference hardware of CNN-based SR is implemented for 4K ultrahigh definition at 60 fps on a field-programmable gate array (FPGA). For hardware-friendly computation, the quantization of activation and weight is adopted. The proposed hardware shows an average peak signal-to-noise ratio of 36.42 dB in the Set-5 dataset with a memory usage of 53 KB and an average PE utilization of 76.7% in the overall layers. Thus, it achieves the lowest memory usage and highest PE utilization compared with other inference hardware for CNN-based SR.

Full Text