Abstract

We consider compression of deep neural networks (DNNs) by weight quantization and lossless source coding for memory-efficient deployment. Whereas the previous work addressed non-universal scalar quantization and entropy source coding, we for the first time introduce universal DNN compression by universal vector quantization and universal source coding. In particular, the proposed scheme utilizes universal lattice quantization, which randomizes the source by uniform random dithering before lattice quantization and can perform near-optimally on any source without relying on knowledge of the source distribution. Moreover, we present a method of fine-tuning vector quantized DNNs to recover any accuracy loss due to quantization. From our experiments, we show that the proposed scheme compresses the MobileNet and ShuffleNet models trained on ImageNet with the state-of-the-art compression ratios of 10.7 and 8.8, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.