Abstract

Fine-grained Visual Classification (FGVC) aims to distinguish object classes belonging to the same category, e.g., different bird species or models of vehicles. The task is more challenging than ordinary image classification due to the subtle inter-class differences. Recent works proposed deep learning models based on the vision transformer (ViT) architecture with its self-attention mechanism to locate important regions of the objects and derive global information. However, deploying them on resource-restricted internet of things (IoT) devices is challenging due to their intensive computational cost and memory footprint. Energy and power consumption varies in different IoT devices. To improve their inference efficiency, previous approaches require manually designing the model architecture and training a separate model for each computational budget. In this work, we propose Token Adaptive Vision Transformer (TAVT) that dynamically drops out tokens and can be used for various inference scenarios across many IoT devices after training the model once. Our adaptive model can switch among different token drop configurations at run time, providing instant accuracy-efficiency trade-offs. We train a vision transformer with a progressive token pruning scheme, eliminating a large number of redundant tokens in the later layers. We then conduct a multi-objective evolutionary search with the overall number of floating point operations (FLOPs) as its efficiency constraint that could be translated to energy consumption and power to find the token pruning schemes that maximize accuracy and efficiency under various computational budgets. Empirical results show that our proposed TAVT dramatically speeds up the GPU inference latency by up to 10× and reduces memory requirements and FLOPs by up to 5.5 × and 13 × respectively while achieving competitive accuracy compared to prior ViT-based state-of-the-art approaches.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call