Universal Deep Neural Network Compression

Yoojin Choi,Mostafa El-Khamy,Jungwon Lee

doi:10.1109/jstsp.2020.2975903

Abstract

We consider compression of deep neural networks (DNNs) by weight quantization and lossless source coding for memory-efficient deployment. Whereas the previous work addressed non-universal scalar quantization and entropy source coding, we for the first time introduce universal DNN compression by universal vector quantization and universal source coding. In particular, the proposed scheme utilizes universal lattice quantization, which randomizes the source by uniform random dithering before lattice quantization and can perform near-optimally on any source without relying on knowledge of the source distribution. Moreover, we present a method of fine-tuning vector quantized DNNs to recover any accuracy loss due to quantization. From our experiments, we show that the proposed scheme compresses the MobileNet and ShuffleNet models trained on ImageNet with the state-of-the-art compression ratios of 10.7 and 8.8, respectively.

Full Text