ALigN: A Highly Accurate Adaptive Layerwise Log_2_Lead Quantization of Pre-Trained Neural Networks

Siddharth Gupta,Kapil Ahuja,Akash Kumar,Salim Ullah,Aruna Tiwari

doi:10.1109/access.2020.3005286

Abstract

Deep Neural Networks are one of the machine learning techniques which are increasingly used in a variety of applications. However, the significantly high memory and computation demands of deep neural networks often limit their deployment on embedded systems. Many recent works have considered this problem by proposing different types of data quantization schemes. However, most of these techniques either require post-quantization retraining of deep neural networks or bear a significant loss in output accuracy. In this paper, we propose a novel and scalable technique with two different modes for the quantization of the parameters of pre-trained neural networks. In the first mode, referred to as log_2_lead , we use a single template for the quantization of all parameters. In the second mode, denoted as ALigN , we analyze the trained parameters of each layer and adaptively adjust the quantization template to achieve even higher accuracy. Our technique significantly maintains the accuracy of the parameters and does not require retraining of the networks. Moreover, it supports quantization to an arbitrary bit-size. For example, compared to the single-precision floating-point numbers-based implementation, our proposed 8-bit quantization technique generates only $\sim 0.2\%$ and $\sim 0.1\%$ , loss in the Top-1 and Top-5 accuracies respectively for VGG-16 network using ImageNet dataset. We have observed similar minimal losses in the Top-1 and Top-5 accuracies for AlexNet and Resnet-18 using the proposed quantization scheme for the 8-bit range. Our proposed quantization technique also provides a higher mean intersection over union for semantic segmentation when compared with state-of-the-art quantization techniques. The proposed technique represents parameters in powers of 2, thereby eliminating the need for resource-computationally intensive multiplier units for the hardware accelerators of the neural networks. We also present a design for implementing the multiplication operation using bit-shifts and addition for the proposed quantization technique.

Highlights

Deep neural networks (DNNs) are the machine learning models which have achieved promising classification accuracies on different recognition problems such as images, speech, and natural language processing [1]–[3]
Log_2_lead Quantization Scheme: Based on our analysis, we present a novel and highly accurate quantization technique, log_2_lead (L2L), to quantize the parameters of pre-trained DNNs
QUANTIZED DNNs we present a brief overview of DNNs followed by a description of the usually employed techniques for the quantization of pre-trained DNNs

Summary

INTRODUCTION

Deep neural networks (DNNs) are the machine learning models which have achieved promising classification accuracies on different recognition problems such as images, speech, and natural language processing [1]–[3]. The BFloat, a subset of the single-precision Float, utilizes only 7 bits for storing the fraction (significand) [31] Most of these techniques represent the parameters of a trained network in low precision fixed-point number systems by utilizing different types of quantization schemes. Log_2_lead Quantization Scheme: Based on our analysis, we present a novel and highly accurate quantization technique, log_2_lead (L2L), to quantize the parameters of pre-trained DNNs. Our technique uses a unique template to store the most significant fractional bits. ALigN Quantization Scheme: We propose an adaptive layerwise variation, referred to as ALigN, of our L2L quantization scheme for pre-trained DNNs In this technique, we align the available quantization bit-width according to the occurrences of the leading 1’s in the trained parameters of each layer.

RELATED WORK

OVERVIEW OF DNNs

COMMONLY USED QUANTIZATION TECHNIQUES

PROPOSED QUANTIZATION TECHNIQUE-BASED MULTIPLIER

EXPERIMENTAL SETUP AND RESULTS

Findings

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2020
Citations: 28	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

ALigN: A Highly Accurate Adaptive Layerwise Log_2_Lead Quantization of Pre-Trained Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

L2L: A Highly Accurate Log_2_Lead Quantization of Pre-trained Neural Networks
Salim Ullah ... Siddharth Gupta
-
Salim Ullah, et. al.Salim Ullah ... Siddharth Gupta
01 Mar 2020
01 Mar 2020

From Algorithm to Module: Adaptive and Energy-Efficient Quantization Method for Edge Artificial Intelligence in IoT Society
Tao Li ... Yitao Ma
IEEE Transactions on Industrial Informatics | VOL. 19
Tao Li, et. al.Tao Li ... Yitao Ma
01 Aug 2023
IEEE Transactions on Industrial Informatics | VOL. 19

A Novel Multi-class Classification Framework Based on Local OVR Deep Neural Network
Yuan Chen ... Junyan Wang
-
Yuan Chen, et. al.Yuan Chen ... Junyan Wang
20 Oct 2020
20 Oct 2020

GenSyth: a new way to understand deep learning
Alexander Wong ... Francis Li
Electronics Letters | VOL. 55
Alexander Wong, et. al.Alexander Wong ... Francis Li
01 Sep 2019
Electronics Letters | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ALigN: A Highly Accurate Adaptive Layerwise Log_2_Lead Quantization of Pre-Trained Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions