Attention Round for post-training quantization

Huabin Diao,Gongyan Li,Shaoyun Xu,Chao Kong,Wei Wang

doi:10.1016/j.neucom.2023.127012

Abstract

Quantization methods for convolutional neural network models can be broadly categorized into post-training quantization (PTQ) and quantization aware training (QAT). While PTQ offers the advantage of requiring only a small portion of the data for quantization, the resulting quantized model may not be as effective as QAT. To address this limitation, this paper proposes a novel quantization function named Attention Round. Unlike traditional quantization function that map 32 bit floating-point value w to nearby quantization levels, Attention Round allows w to be mapped to all possible quantization levels in the entire quantization space, expanding the quantization optimization space. The possibilities of mapping w to different quantization levels are inversely correlated with the distance between w and the quantization levels, regulated by a Gaussian decay function. Furthermore, to tackle the challenge of mixed precision quantization, this paper introduces a lossy coding length measure to assign quantization precision to different layers of the model, eliminating the need for solving a combinatorial optimization problem. Experimental evaluations on various models demonstrate the effectiveness of the proposed method. Notably, for ResNet18 and MobileNetV2, the PTQ approach achieves comparable quantization performance to QAT while utilizing only 1024 training data and 10 min for the quantization process.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Attention Round for post-training quantization

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Nov 10, 2023
Citations: 3

Similar Papers

Optimizing convolutional neural networks for IoT devices: performance and energy efficiency of quantization techniques
Nicolás Hernández ... Vicente Blanco
The Journal of Supercomputing | VOL. 80
Nicolás Hernández, et. al.Nicolás Hernández ... Vicente Blanco
20 Feb 2024
The Journal of Supercomputing | VOL. 80

Efficient Quantization Techniques for Deep Neural Networks
Chutian Jiang
-
Chutian JiangChutian Jiang
01 Nov 2021
01 Nov 2021

Quantizing Neural Networks for Low-Power Computer Vision
Marios Fournarakis ... Tijmen Blankevoort
-
Marios Fournarakis, et. al.Marios Fournarakis ... Tijmen Blankevoort
12 Jan 2022
12 Jan 2022

Symmetry Regularization and Saturating Nonlinearity for Robust Quantization
Sein Park ... Eunhyeok Park
-
Sein Park, et. al.Sein Park ... Eunhyeok Park
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Attention Round for post-training quantization

Abstract

Talk to us

Similar Papers

More From: Neurocomputing