A Self-adaptation Method of Fitting Convolutional Neural Network into FPGA

Ning Mao,Xing Wei,Xinkai Di,Le Yu,Zhihong Huang,He Zhao,Haigang Yang

doi:10.1145/3174243.3175003

Abstract

In recent years, Convolutional Neural Networks (CNNs) have been used widely in many artificial intelligence (AI) related fields. Of many implementation platforms for CNNs, FPGA is regarded as an optimal platform because of its high power-efficiency and flexibility. Although various FPGA accelerators have been proposed to realize CNN, some of them are implemented by High-Level Synthesis such as in OpenCL. This may result in inefficiency in operation performance and resource utilization. Therefore, we propose to parameterize the RTL design at both algorithm and hardware implementation levels. Four types of parallelism are considered to model the parameterized design in terms of the input feature map, the output feature map, the layer and the convolution kernel. Meanwhile a library covering convolution layer, fully-connected layer, pooling layer, control module is established to cater for various CNN models. Further, an algorithm is proposed to find an optimal level of parallelism dedicated to limited resources. As a case study, four typical CNNs are implemented on Stratix III EP3SL110, taking up on-chip memory. Compared with some existing works using the automated design flow, the implementations obtained by the proposed approach have achieved up to 17.13× GOPS. To the best estimate, our design has also achieved 1.33× resource efficiency and 3.61× power efficiency.

Full Text