Abstract

Deep neural network (DNN) compression can reduce the memory footprint of deep networks effectively, so that the deep model can be deployed on the portable devices. However, most of the existing model compression methods cost lots of time, e.g., vector quantization or pruning, which makes them inept to the real-world applications that need fast online computation. In this paper, we therefore explore how to accelerate the model compression process by reducing the computation cost. Then, we propose a new deep model compression method, termed Dictionary Pair-based Data-Free Fast DNN Compression, which aims at reducing the memory consumption of DNNs without extra training and can greatly improve the compression efficiency. Specifically, our proposed method performs tensor decomposition on the DNN model with a fast dictionary pair learning-based reconstruction approach, which can be deployed on different layers (e.g., convolution and fully-connection layers). Given a pre-trained DNN model, we first divide the parameters (i.e., weights) of each layer into a series of partitions for dictionary pair-based fast reconstruction, which can potentially discover more fine-grained information and provide the possibility for parallel model compression. Then, dictionaries of less memory occupation are learned to reconstruct the weights. Extensive experiments on popular DNNs (i.e., VGG-16, ResNet-18 and ResNet-50) showed that our proposed weight compression method can significantly reduce the memory footprint and speed up the compression process, with less performance loss.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call