Mixed-Clipping Quantization for Convolutional Neural Networks

Zhengzhe Huang,Huimin Du,Libo Chang

doi:10.3724/sp.j.1089.2021.18509

Abstract

<p indent=0mm>Quantization is the main method to compress convolutional neural networks and accelerate convolutional neural network inference. Most existing quantization methods quantize all layers to the same bit width. Mixed-precision quantization can obtain higher precision under the same compression ratio, but it is difficult to find a mixed-precision quantization strategy. To solve this problem, a mixed-clipping quantization method based on reinforcement learning is proposed. It uses reinforcement learning to search for a mixed-precision quantization strategy, and uses a mixed-clipping method to clip weight data according to the searched quantization strategy before quantization. This method further improves the accuracy of the quantized network. We extensively test this method on a diverse set of models, including ResNet18/50, MobileNet-V2 on ImageNet, as well as YOLOV3 on the Microsoft COCO dataset. The experimental results show that our method can achieve 2.7% and 0.3% higher Top-1 accuracy on MobileNet-V2 (4 bit), as compared to the HAQ and ZeroQ method. And our method can achieve 2.6% higher mAP on YOLOV3 (6 bit), as compared to per-layer quantization method.

Full Text