Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training

Artem Sher,Dmitry Nikolaev,Elena Limonova,Vladimir V Arlazarov,Anton Trusov

doi:10.3390/math11092112

Abstract

Quantized neural networks (QNNs) are widely used to achieve computationally efficient solutions to recognition problems. Overall, eight-bit QNNs have almost the same accuracy as full-precision networks, but working several times faster. However, the networks with lower quantization levels demonstrate inferior accuracy in comparison to their classical analogs. To solve this issue, a number of quantization-aware training (QAT) approaches were proposed. In this paper, we study QAT approaches for two- to eight-bit linear quantization schemes and propose a new combined QAT approach: neuron-by-neuron quantization with straight-through estimator (STE) gradient forwarding. It is suitable for quantizations with two- to eight-bit widths and eliminates significant accuracy drops during training, which results in better accuracy of the final QNN. We experimentally evaluate our approach on CIFAR-10 and ImageNet classification and show that it is comparable to other approaches for four to eight bits and outperforms some of them for two to three bits while being easier to implement. For example, the proposed approach to three-bit quantization of the CIFAR-10 dataset results in 73.2% accuracy, while baseline direct and layer-by-layer result in 71.4% and 67.2% accuracy, respectively. The results for two-bit quantization for ResNet18 on the ImageNet dataset are 63.69% for our approach and 61.55% for the direct baseline.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematics	Publication Date: Apr 29, 2023
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training

Abstract

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Similar Papers

Propagating Asymptotic-Estimated Gradients for Low Bitwidth Quantized Neural Networks
Jun Chen ... Jian Yang
IEEE Journal of Selected Topics in Signal Processing | VOL. 14
Jun Chen, et. al.Jun Chen ... Jian Yang
04 Mar 2020
IEEE Journal of Selected Topics in Signal Processing | VOL. 14

IQNN: Training Quantized Neural Networks with Iterative Optimizations
Shuchang Zhou ... Xinyu Zhou
-
Shuchang Zhou, et. al.Shuchang Zhou ... Xinyu Zhou
01 Jan 2017
01 Jan 2017

A Flash-based Current-mode IC to Realize Quantized Neural Networks
Kyler R Scott ... Cheng-Yen Lee
-
Kyler R Scott, et. al.Kyler R Scott ... Cheng-Yen Lee
14 Mar 2022
14 Mar 2022

DiffQuant: Reducing Compression Difference for Neural Network Quantization
Ming Zhang ... Weijun Li
Electronics | VOL. 12
Ming Zhang, et. al.Ming Zhang ... Weijun Li
12 Dec 2023
Electronics | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Neuron-by-Neuron Quantization for Efficient Low-Bit QNN Training

Abstract

Talk to us

Similar Papers

More From: Mathematics