A New Compression Method for Deep Neural Networks with Accuracy Improvement

Wang Xiao,Liu Han

doi:10.23919/ccc52363.2021.9549422

Wang Xiao, Liu Han

https://doi.org/10.23919/ccc52363.2021.9549422

Copy DOI

Export

Save

Cite

Publication Date: Jul 26, 2021

Citations: 1

Affiliation: Shandong University of Technology

Abstract
Full-Text
Similar Papers

Abstract

Listen

Deep neural networks have achieved great success in many fields, but deploying deep neural networks on mobile devices with low memory resources or in applications with strict latency requirements is difficult. The networks are both computationally and memory-intensive, which hinders deployment in mobile computing environments. To address this problem, a new compression method for deep neural networks is proposed in this paper. First, the standard convolution is replaced with depthwise separable convolution to simplify the calculation and decrease the number of parameters of the convolution layers. Then, the deep networks are pruned by removing unimportant connections. Finally, weight quantization is performed to compress the network further based on the weight sharing applied using the shared weights obtained from kernel k-means clustering. Several experiments are completed to verify the viability and effectiveness of the proposed method. The experimental results show that deep neural networks can be compressed effectively with improvement in accuracy. On the MNIST and ImageNet datasets, our method reduces the storage required by the LeNet, AlexNet and VGG-16 networks by 33× to 45×. Reducing the storage requirement makes the deployment of deep neural networks possible on mobile systems where application size and download bandwidth are constrained.

Full Text