A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction

Huiyu Mo,Shouyi Yin,Qiang Li,Wenjing Hu,Leibo Liu,Ang Li,Wenping Zhu,Shaojun Wei

doi:10.1109/jssc.2021.3113569

Huiyu Mo, Shouyi Yin + Show 6 more

Open Access

https://doi.org/10.1109/jssc.2021.3113569

Copy DOI

Journal: IEEE Journal of Solid-state Circuits	Publication Date: May 1, 2022
Citations: 14	License type: publisher-specific, author manuscript

Affiliation: Tsinghua University

Abstract

In this article, a quantized network acceleration processor (QNAP) is proposed to efficiently accelerate CNN processing by eliminating most unessential operations based on algorithm-hardware co-optimizations. First, an effective-weight-based convolution (EWC) is proposed to distinguish a group of effective weights (EWs) to replace the other unique weights. Therefore, the input activations corresponding to the same EW can be accumulated first and then multiplied by the EW to reduce amounts of multiplication operations, which is efficiently supported by the dedicated process elements in QNAP. The experimental results show that energy efficiency is improved by 1.59x-3.20x compared with different UCNN implementations. Second, an error-compensation-based prediction (ECP) method adopts trained compensated values to replace partly unimportant partial sums to further reduce potentially redundant addition operations caused by the ReLU function. Compared with SnaPEA and Pred on AlexNet, 1.23x and 1.75x higher energy efficiencies (TOPS/W) are achieved by ECP, respectively, with marginal accuracy loss. Third, the residual pipeline mode is proposed to efficiently implement residual blocks with a 1.5x lower memory footprint, a 1.18x lower power consumption, and a 13.15% higher hardware utilization on average than existing works. Finally, the QNAP processor is fabricated in the TSMC 28-nm CMOS process with a core area of 1.9 mm². Benchmarked with AlexNet, VGGNet, GoogLeNet, and ResNet on ImageNet at 470 MHz and 0.9 V, the processor achieves 117.4 frames per second with 131.6-mW power consumption on average, which outperforms the state-of-the-art processors by 1.77x-24.20x in energy efficiency.

Full Text