Abstract

Vision Transformer (ViT) has emerged as a pivotal model for a variety of visual tasks, surpassing convolutional neural networks by a substantial margin. However, the performance of ViT is seriously impaired by intensive computational and storage costs requirements, posing significant barriers for real-world applications or deployment on resource-constrained edge devices. To address this limitation, compressing the ViT to accelerate its inference at no appreciable degradation of vision performance has attracted widespread attention. Although there are some studies on accelerating ViT, they seldom consider resource constraints and multi-criteria decision making in the process. This article formulates ViT pruning as a large-scale constrained multi-objective optimization problem, and proposes a patch pruning framework for accelerating ViT, called EvolutionViT, based on the developed multi-objective optimization model. EvolutionViT can effectively tradeoff between computational cost and performance under resource constraints, automatically searching for solutions while optimizing two conflicting objectives. In particular, exploiting the knee solution and boundary solutions to directly guide the entire evolutionary process, EvolutionViT can efficiently identify a knee solution that satisfies the resource constraints, which in turn avoids the manual search for a good trade-off. To verify and evaluate our proposed method, we compare EvolutionViT with a number of representative ViT models on the ImageNet dataset. The comprehensive simulation results show that the proposed EvolutionViT demonstrates a competitive advantage compared to peers, with significantly reduced computational expense at the cost of slightly degraded performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.