Abstract
Deep neural networks have achieved great success in many fields, but deploying deep neural networks on mobile devices with low memory resources or in applications with strict latency requirements is difficult. The networks are both computationally and memory-intensive, which hinders deployment in mobile computing environments. To address this problem, a new compression method for deep neural networks is proposed in this paper. First, the standard convolution is replaced with depthwise separable convolution to simplify the calculation and decrease the number of parameters of the convolution layers. Then, the deep networks are pruned by removing unimportant connections. Finally, weight quantization is performed to compress the network further based on the weight sharing applied using the shared weights obtained from kernel k-means clustering. Several experiments are completed to verify the viability and effectiveness of the proposed method. The experimental results show that deep neural networks can be compressed effectively with improvement in accuracy. On the MNIST and ImageNet datasets, our method reduces the storage required by the LeNet, AlexNet and VGG-16 networks by 33× to 45×. Reducing the storage requirement makes the deployment of deep neural networks possible on mobile systems where application size and download bandwidth are constrained.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.