Abstract
In the AI era, the emergence of the transformer model contributes to a significant shift in the natural language processing field. Its derivative, the Vision Transformer (ViT), adapts these principles for image recognition and demonstrates superior performance over traditional Convolutional Neural Networks (CNNs). Despite its excellent performance, deploying these models on edge devices is impeded by their extensive computational demands and large memory requirements, which poses challenges for the limited resources and real-time processing needs at the edge. Hence, it is necessary to develop a new hardware accelerator to optimize ViT architecture. This paper reviews the development of Field-Programmable Gate Array (FPGA)-based ViT inference accelerators, focusing on their architectures and applications in solving the dilemmas of ViT model deployment. Additionally, it explores the optimization approaches on both algorithm and hardware and traces the advancements in deploying AI models at the edge using FPGAs. It summarizes the current trends in research on FPGA-based ViT accelerators and provides insights into future directions for innovations in hardware-accelerated AI. Generally, by tracing the related works for FPGA-based ViT inference accelerator optimization, this article presents a useful snapshot of current research in ViT hardware accelerators and contributes to clarifying future research directions in this area.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.