A novel quantization method combined with knowledge distillation for deep neural networks

Zhou Hu,Guangming Song,Chunxiao Fan

doi:10.1088/1742-6596/1976/1/012026

Zhou Hu, Guangming Song + Show 1 more

Open Access

https://doi.org/10.1088/1742-6596/1976/1/012026

Copy DOI

Abstract

The massive parameters and intensive computations in neural networks always limit the deployment on embedded devices with poor storage and computing power. To solve this problem, a novel quantization algorithm combined with Knowledge Distillation (KD) is proposed to reduce the model size and speed up the inference of deep models. The proposed method consists of two phases, KD-Training and Quantization-Retraining. KD-Training attempts to train a compact student model with pre-quantized weights by the proposed pre-quantized constraint loss. In Quantization-Retraining, the pre-quantized weights are quantized to 2n and the first and last layers of the network are retained to make up for the accuracy loss caused by quantization. Experiments on the CIFAR-10 dataset show that the proposed method can obtain a low-precision (2-5bit) quantized student model with a compact structure, and the test accuracy even exceeds its full-precision(32bit) reference as the improvement of the generalization ability. It can get higher performance compared with that obtained by other quantization methods. Also, since the quantized weights are constrained in { ± 2n}, it is suitable for the acceleration of the network calculation in hardware.

Full Text