Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design.

Nahsung Kim,Geonho Kim,Jongsun Park,Dongyeob Shin,Wonseok Choi

doi:10.1109/tnnls.2020.3008996

Abstract

For successful deployment of deep neural networks (DNNs) on resource-constrained devices, retraining-based quantization has been widely adopted to reduce the number of DRAM accesses. By properly setting training parameters, such as batch size and learning rate, bit widths of both weights and activations can be uniformly quantized down to 4 bit while maintaining full precision accuracy. In this article, we present a retraining-based mixed-precision quantization approach and its customized DNN accelerator to achieve high energy efficiency. In the proposed quantization, in the middle of retraining, an additional bit (extra quantization level) is assigned to the weights that have shown frequent switching between two contiguous quantization levels since it means that both quantization levels cannot help to reduce quantization loss. We also mitigate the gradient noise that occurs in the retraining process by taking a lower learning rate near the quantization threshold. For the proposed novel mixed-precision quantized network (MPQ-network), we have implemented a customized accelerator using a 65-nm CMOS process. In the accelerator, the proposed processing elements (PEs) can be dynamically reconfigured to process variable bit widths from 2 to 4 bit for both weights and activations. The numerical results show that the proposed quantization can achieve 1.37 × better compression ratio for VGG-9 using CIFAR-10 data set compared with a uniform 4-bit (both weights and activations) model without loss of classification accuracy. The proposed accelerator also shows 1.29× of energy savings for VGG-9 using the CIFAR-10 data set over the state-of-the-art accelerator.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems

Lead the way for us

Journal: IEEE Transactions on Neural Networks and Learning Systems	Publication Date: Aug 3, 2020
Citations: 17

Similar Papers

An Ordered Aggregation-Based Ensemble Selection Method of Lightweight Deep Neural Networks With Random Initialization
Lin He ... Lijun Peng
IEEE Access | VOL. 10
Lin He, et. al.Lin He ... Lijun Peng
01 Jan 2021
IEEE Access | VOL. 10

A Novel Low-Power Compression Scheme for Systolic Array-Based Deep Learning Accelerators
Ayush Arunachalam ... Arnab Raha
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 42
Ayush Arunachalam, et. al.Ayush Arunachalam ... Arnab Raha
01 Apr 2023
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 42

An Error Compensation Technique for Low-Voltage DNN Accelerators
Daehan Ji ... Jongsun Park
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 29
Daehan Ji, et. al.Daehan Ji ... Jongsun Park
15 Dec 2020
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 29

Control Variate Approximation for DNN Accelerators
Georgios Zervakis ... Ourania Spantidi
-
Georgios Zervakis, et. al.Georgios Zervakis ... Ourania Spantidi
05 Dec 2021
05 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exploiting Retraining-Based Mixed-Precision Quantization for Low-Cost DNN Accelerator Design.

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Neural Networks and Learning Systems