Abstract

Deep Neural Networks have become ubiquitous in various domains. Meanwhile, the problems of massive storage and computation costs have hindered the deployment of these models to real-world applications. This paper proposes a novel and unified two-stage framework for automatic model compression. To determine the compression ratio of each layer, we improve the optimization from two aspects. First, to predict the performance of each compression policy, we propose Dynamic BN, which improves the correlation significantly with little computation overhead. Second, to search for the compression ratio allocation, we propose an efficient and hyperparameter-free solving algorithm based on the proposed Hessian matrix approximation and Knapsack problem reformulation. Moreover, comprehensive experiments and analyses are conducted on the CIFAR-100&ImageNet datasets and various network architectures to demonstrate its performance advantages over existing model compression methods under the quantization-only, pruning-only, and pruning-quantization settings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.