Abstract

This paper describes a quantization method for pre-trained deep CNN models, and has achieved very good results in YOLOv3. The weights are quantized to int8 and the biases are quantized to int16, by contrasting the float inference result, the mAP loss is less than 0.5 %. While running neural nets on hardware, 8-bit fixed-point quantization is the key to efficient inference. However, it is a very difficult task for running a very deep network on 8-bit hardware, often resulting in a significant decrease in accuracy or spending a lot of time on retraining the network. Our method uses dynamic fixed-point quantization and adds a small amount of bit-shifts to balance the accuracy of each layer in a YOLO network. At the same time, it further shows that this method can also be extended to other computer vision architectures and tasks, such as semantic segmentation and image classification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.