Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

Weihan Chen,Jian Cheng,Peisong Wang

doi:10.1109/iccv48922.2021.00530

Abstract

Quantization is a widely used technique to compress and accelerate deep neural networks. However, conventional quantization methods use the same bit-width for all (or most of) the layers, which often suffer significant accuracy degradation in the ultra-low precision regime and ignore the fact that emergent hardware accelerators begin to support mixed-precision computation. Consequently, we present a novel and principled framework to solve the mixed-precision quantization problem in this paper. Briefly speaking, we first formulate the mixed-precision quantization as a discrete constrained optimization problem. Then, to make the optimization tractable, we approximate the objective function with second-order Taylor expansion and propose an efficient approach to compute its Hessian matrix. Finally, based on the above simplification, we show that the original problem can be reformulated as a MultipleChoice Knapsack Problem (MCKP) and propose a greedy search algorithm to solve it efficiently. Compared with existing mixed-precision quantization works, our method is derived in a principled way and much more computationally efficient. Moreover, extensive experiments conducted on the ImageNet dataset and various kinds of network architectures also demonstrate its superiority over existing uniform and mixed-precision quantization approaches.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Mixed-Precision Quantization of U-Net for Medical Image Segmentation
Liming Guo ... Wen Fei
-
Liming Guo, et. al.Liming Guo ... Wen Fei
28 May 2022
28 May 2022

Mixed-precision Quantization with Dynamical Hessian Matrix for Object Detection Network
Zerui Yang ... Chenglin Li
-
Zerui Yang, et. al.Zerui Yang ... Chenglin Li
05 Dec 2021
05 Dec 2021

SANA: Sensitivity-Aware Neural Architecture Adaptation for Uniform Quantization
Mingfei Guo ... Kurt Keutzer
Applied Sciences | VOL. 13
Mingfei Guo, et. al.Mingfei Guo ... Kurt Keutzer
15 Sep 2023
Applied Sciences | VOL. 13

Analysis of Neural Network Accuracy Degradation due to Uniform Weight Quantization of One or More Layers
Jelena R Nikolic ... Zoran H Peric
-
Jelena R Nikolic, et. al.Jelena R Nikolic ... Zoran H Peric
16 Jun 2022
16 Jun 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization

Abstract

Talk to us

Similar Papers