A High Performance Multi-Bit-Width Booth Vector Systolic Accelerator for NAS Optimized Deep Learning Neural Networks

Mingqiang Huang,Quan Cheng,Wei Mao,Changhai Man,Yucen Liu,Kai Li,Hao Yu

doi:10.1109/tcsi.2022.3178474

Abstract

Multi-bit-width convolutional neural network (CNN) maintains the balance between network accuracy and hardware efficiency, thus enlightening a promising method for accurate yet energy-efficient edge computing. In this work, we develop state-of-the-art multi-bit-width accelerator for NAS Optimized deep learning neural networks. To efficiently process the multi-bit-width network inferencing, multi-level optimizations have been proposed. Firstly, differential Neural Architecture Search (NAS) method is adopted for the high accuracy multi-bit-width network generation. Secondly, hybrid Booth based multi-bit-width multiply-add-accumulation (MAC) unit is developed for data processing. Thirdly, vector systolic array is proposed for effectively accelerating the matrix multiplications. With vector-style systolic dataflow, both the processing time and logic resources consumption can be reduced when compared with the classical systolic array. Finally, The proposed multi-bit-width CNN acceleration scheme has been practically deployed on FPGA platform of Xilinx ZCU102. Average performance on accelerating the full NAS optimized VGG16 network is 784.2 GOPS, and peek performance of the convolutional layer can reach as high as 871.26 GOPS for INT8, 1676.96 GOPS for INT4, and 2863.29 GOPS for INT2 respectively, which is among the best results in previous CNN accelerator benchmarks.

Full Text