Three-Means Ternary Quantization

Jie Ding,Junmin Wu,Huan Wu

doi:10.1007/978-3-319-70096-0_25

Abstract

Deep Convolution Neural Networks (DCNNs) have achieved state-of-the-art results in a wide range of tasks, especially in image recognition and object detection. However, millions of parameters make it difficult to be deployed on embedded devices with limited storage and computational capabilities. In this paper, we propose a new method called Three-Means Ternary Quantization (TMTQ), which can quantize the weights to ternary values {\( - \alpha_{1} , 0, + \alpha_{2} \)} during the forward and backward propagations. Scaling factors {\( \alpha_{1} , \alpha_{2} \)} are used to reduce the loss of quantization. We evaluate this method on MNIST, CIFAR-10 and ImageNet datasets with different network architectures. The results show that the performance of our ternary models obtained from TMTQ is only slightly worse than full precision models but better than recently proposed binary and ternary models. Meanwhile, our TMTQ method achieves up to about 16\( \times \) model compression rate compared with the 32-bits full precision counterparts, for we just use ternary weights (2-bits) and fixed scaling factors during the inference.

Full Text