A Dynamic Balance Quantization Method for YOLOv3

Yang Hua,Zhiyong Qin,Xiao Meng,Boxu Chen,Lixin Yu

doi:10.1088/1742-6596/1848/1/012157

Abstract

This paper describes a quantization method for pre-trained deep CNN models, and has achieved very good results in YOLOv3. The weights are quantized to int8 and the biases are quantized to int16, by contrasting the float inference result, the mAP loss is less than 0.5 %. While running neural nets on hardware, 8-bit fixed-point quantization is the key to efficient inference. However, it is a very difficult task for running a very deep network on 8-bit hardware, often resulting in a significant decrease in accuracy or spending a lot of time on retraining the network. Our method uses dynamic fixed-point quantization and adds a small amount of bit-shifts to balance the accuracy of each layer in a YOLO network. At the same time, it further shows that this method can also be extended to other computer vision architectures and tasks, such as semantic segmentation and image classification.

Full Text