Fully integer-based quantization for mobile convolutional neural network inference

Peng Peng,Mingyu You,Weisheng Xu,Jiaxin Li

doi:10.1016/j.neucom.2020.12.035

Abstract

Deploying deep convolutional neural networks on mobile devices is challenging because of the conflict between their heavy computational overhead and the hardware’s restricted computing capacity. Network quantization is typically used to alleviate this problem. However, we found that a “datatype mismatch” issue in existing low bitwidth quantization approaches can generate severe instruction redundancy, dramatically reducing their running efficiency on mobile devices. We therefore propose a novel quantization approach which ensures that only integer-based arithmetic is needed during the inference stage of the quantized model. To this end, we improved the quantization function to compel the quantized value to follow a standard integer format. Then we presented to simultaneously quantize the batch normalization parameters by a logarithm-like method. By doing so, the quantized model can keep the advantage of low bitwidth representation, while preventing the occurrence of “datatype mismatch” issue and corresponding instruction redundancy. Comprehensive experiments show that our method can achieve comparable prediction accuracy to other state-of-the-art methods while reducing the run-time latency by a large margin. Our fully integer-based quantized Resnet-18 has 4-bit weights, 4-bit activations and only a 0.7% top-1 and 0.4% top-5 accuracy drop on the ImageNet dataset. The assembly language implementation of a series of building blockscan reach a maximum of 4.33× the speed of the original full-precision version on an ARMv8 CPU.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fully integer-based quantization for mobile convolutional neural network inference

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Journal: Neurocomputing	Publication Date: Dec 23, 2020
Citations: 22

Similar Papers

Training low bitwidth convolutional neural network on RRAM
Yi Cai ... Zhenhua Zhu
-
Yi Cai, et. al.Yi Cai ... Zhenhua Zhu
01 Jan 2018
01 Jan 2018

Training low bitwidth convolutional neural network on RRAM
...
-
, et. al. ...
22 Jan 2018
22 Jan 2018

Towards high performance low bitwidth training for deep neural networks
Chunyou Su ... Liang Feng
Journal of Semiconductors | VOL. 41
Chunyou Su, et. al.Chunyou Su ... Liang Feng
01 Feb 2020
Journal of Semiconductors | VOL. 41

DEEP LEARNING TECHNOLOGY OF CONVOLUTIONALNEURAL NETWORKS FOR FACIAL EXPRESSION RECOGNITION
Denys V Petrosiuk ... Anatolii O Nikolenko
Applied Aspects of Information Technology | VOL. 4
Denys V Petrosiuk, et. al.Denys V Petrosiuk ... Anatolii O Nikolenko
30 Jun 2021
Applied Aspects of Information Technology | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fully integer-based quantization for mobile convolutional neural network inference

Abstract

Talk to us

Similar Papers

More From: Neurocomputing