Abstract

The transformer-based model has great potential to deliver higher accuracy for object recognition applications when comparing it with the convolution neural network (CNN). Yet, the amount of weight sharing of a transformer-based model is significantly lower than that of the CNN, which should apply different dataflow to reduce the memory access. This brief proposes a transformer accelerator with an output block stationary (OBS) dataflow to minimize the repeated memory access by block-level and vector-level broadcasting while preserving a high digital signal processor (DSP) utilization rate, leading to higher energy efficiency. It also lowers the memory access bandwidth to the input and output. Verified through an FPGA, the proposed accelerator evaluates a transformer-in-transformer (TNT) model with a throughput of 728.3 GOPs, corresponding to energy efficiency of 58.31 GOPs/W.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.