4.6-Bit Quantization for Fast and Accurate Neural Network Inference on CPUs

Anton Trusov,Elena Limonova,Vladimir V Arlazarov,Dmitry Nikolaev

doi:10.3390/math12050651

Abstract

Quantization is a widespread method for reducing the inference time of neural networks on mobile Central Processing Units (CPUs). Eight-bit quantized networks demonstrate similarly high quality as full precision models and perfectly fit the hardware architecture with one-byte coefficients and thirty-two-bit dot product accumulators. Lower precision quantizations usually suffer from noticeable quality loss and require specific computational algorithms to outperform eight-bit quantization. In this paper, we propose a novel 4.6-bit quantization scheme that allows for more efficient use of CPU resources. This scheme has more quantization bins than four-bit quantization and is more accurate while preserving the computational efficiency of the later (it runs only 4% slower). Our multiplication uses a combination of 16- and 32-bit accumulators and avoids multiplication depth limitation, which the previous 4-bit multiplication algorithm had. The experiments with different convolutional neural networks on CIFAR-10 and ImageNet datasets show that 4.6-bit quantized networks are 1.5–1.6 times faster than eight-bit networks on the ARMv8 CPU. Regarding the quality, the results of the 4.6-bit quantized network are close to the mean of four-bit and eight-bit networks of the same architecture. Therefore, 4.6-bit quantization may serve as an intermediate solution between fast and inaccurate low-bit network quantizations and accurate but relatively slow eight-bit ones.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

4.6-Bit Quantization for Fast and Accurate Neural Network Inference on CPUs

Abstract

Talk to us

Similar Papers

More From: Mathematics

Lead the way for us

Journal: Mathematics	Publication Date: Feb 23, 2024
License type: CC BY 4.0

Similar Papers

Research on improved convolutional wavelet neural network
Jingwei Liu ... Jiaming Chen
Scientific Reports | VOL. 11
Jingwei Liu, et. al.Jingwei Liu ... Jiaming Chen
09 Sep 2021
Scientific Reports | VOL. 11

ECG signal classification with binarized convolutional neural network
Qing Wu ... Xundong Wu
Computers in Biology and Medicine | VOL. 121
Qing Wu, et. al.Qing Wu ... Xundong Wu
05 May 2020
Computers in Biology and Medicine | VOL. 121

A modified frequency domain cross correlation implemented in mat lab for fast sub-image detection sing neural networks
H.M El-Bakry ... Qiangfu Zhao
-
H.M El-Bakry, et. al.H.M El-Bakry ... Qiangfu Zhao
27 Dec 2005
27 Dec 2005

A Convolutional Hierarchical Neural Network Classifier
Ismail Gadzhiev ... Sergey Dolenko
-
Ismail Gadzhiev, et. al.Ismail Gadzhiev ... Sergey Dolenko
06 Dec 2021
06 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

4.6-Bit Quantization for Fast and Accurate Neural Network Inference on CPUs

Abstract

Talk to us

Similar Papers

More From: Mathematics