Abstract

Deep neural networks (DNNs) achieved great cognitive performance at the expense of a considerable computation workload. To relieve the computational burden, many optimization works are developed to reduce the model redundancy by identifying and removing insignificant model components, such as weight sparsity and filter pruning methods. However, these works only evaluate model components’ static significance with parameter information, ignoring their dynamic interaction with external inputs. Specifically, due to the difference in per-input features, the model components’ significance can dynamically change and, thus, the static methods can only achieve suboptimal performance. Focusing on this aspect, we propose a dynamic DNN optimization framework in this work. Based on the neural network attention mechanism, we propose a comprehensive dynamic optimization framework, including 1) testing-phase dynamic feature map pruning; 2) training-phase optimization by training with targeted dropout; and 3) deployment-phase one-for-all (OFA) model adaptability enhancement. By providing a holistic dynamic testing, training, and deployment co-optimization framework, our work has the following benefits: first, it can accurately identify and aggressively remove per-input feature redundancy by considering the model-input interaction and involving the channel/column-wise pruning flexibility; meanwhile, the training-testing co-optimization favors the dynamic pruning and helps maintain the model accuracy even with a very high feature pruning ratio. Finally, the deployment enhancement provides one unified OFA model to support full-spectrum feature sparsity ratios. The unified model can be dynamically reconfigured to meet different resource budgets without any retraining cost, and thus provide significant deployment flexibility. Extensive experiments show that our method could bring 37.4%–54.5% floating-point operations reduction with negligible accuracy drop on various test benchmarks. Meanwhile, the OFA deployment optimization enables us to use one model to support at most ten different resource constraints without any retraining cost.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.