AntiDoteX: Attention-Based Dynamic Optimization for Neural Network Runtime Efficiency

Fuxun Yu,Yanzhi Wang,Dimitrios Stamoulis,Xiang Chen,Chenchen Liu,Zirui Xu,Di Wang

doi:10.1109/tcad.2022.3144616

Abstract

Deep neural networks (DNNs) achieved great cognitive performance at the expense of a considerable computation workload. To relieve the computational burden, many optimization works are developed to reduce the model redundancy by identifying and removing insignificant model components, such as weight sparsity and filter pruning methods. However, these works only evaluate model components’ static significance with parameter information, ignoring their dynamic interaction with external inputs. Specifically, due to the difference in per-input features, the model components’ significance can dynamically change and, thus, the static methods can only achieve suboptimal performance. Focusing on this aspect, we propose a dynamic DNN optimization framework in this work. Based on the neural network attention mechanism, we propose a comprehensive dynamic optimization framework, including 1) testing-phase dynamic feature map pruning; 2) training-phase optimization by training with targeted dropout; and 3) deployment-phase one-for-all (OFA) model adaptability enhancement. By providing a holistic dynamic testing, training, and deployment co-optimization framework, our work has the following benefits: first, it can accurately identify and aggressively remove per-input feature redundancy by considering the model-input interaction and involving the channel/column-wise pruning flexibility; meanwhile, the training-testing co-optimization favors the dynamic pruning and helps maintain the model accuracy even with a very high feature pruning ratio. Finally, the deployment enhancement provides one unified OFA model to support full-spectrum feature sparsity ratios. The unified model can be dynamically reconfigured to meet different resource budgets without any retraining cost, and thus provide significant deployment flexibility. Extensive experiments show that our method could bring 37.4%–54.5% floating-point operations reduction with negligible accuracy drop on various test benchmarks. Meanwhile, the OFA deployment optimization enables us to use one model to support at most ten different resource constraints without any retraining cost.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AntiDoteX: Attention-Based Dynamic Optimization for Neural Network Runtime Efficiency

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Nov 1, 2022
Citations: 1

Similar Papers

AntiDote: Attention-based Dynamic Optimization for Neural Network Runtime Efficiency
Fuxun Yu ... Di Wang
-
Fuxun Yu, et. al.Fuxun Yu ... Di Wang
01 Mar 2020
01 Mar 2020

Dynamic optimisation for graded tissue scaffolds using machine learning techniques
Chi Wu ... Qing Li
Computer Methods in Applied Mechanics and Engineering | VOL. 425
Chi Wu, et. al.Chi Wu ... Qing Li
23 Mar 2024
Computer Methods in Applied Mechanics and Engineering | VOL. 425

Incorporating natural enemies in an economic threshold for dynamically optimal pest management
Wei Zhang ... Scott M Swinton
Ecological Modelling | VOL. 220
Wei Zhang, et. al.Wei Zhang ... Scott M Swinton
13 Mar 2009
Ecological Modelling | VOL. 220

A framework for remote dynamic program optimization
Michael J Voss ... Rudolf Eigenmann
-
Michael J Voss, et. al.Michael J Voss ... Rudolf Eigenmann
01 Jan 1999
01 Jan 1999

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AntiDoteX: Attention-Based Dynamic Optimization for Neural Network Runtime Efficiency

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems