Abstract

Deep Neural Networks have become ubiquitous in various domains. Meanwhile, the problems of massive storage and computation costs have hindered the deployment of these models to real-world applications. This paper proposes a novel and unified two-stage framework for automatic model compression. To determine the compression ratio of each layer, we improve the optimization from two aspects. First, to predict the performance of each compression policy, we propose Dynamic BN, which improves the correlation significantly with little computation overhead. Second, to search for the compression ratio allocation, we propose an efficient and hyperparameter-free solving algorithm based on the proposed Hessian matrix approximation and Knapsack problem reformulation. Moreover, comprehensive experiments and analyses are conducted on the CIFAR-100&ImageNet datasets and various network architectures to demonstrate its performance advantages over existing model compression methods under the quantization-only, pruning-only, and pruning-quantization settings.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call