Abstract

It is difficult to employ transformer models for computer vision in mobile devices due to their memory- and computation-intensive properties. Accordingly, there is ongoing research on various methods for compressing transformer models, such as pruning. However, general computing platforms such as central processing units (CPUs) and graphics processing units (GPUs) are not energy-efficient to accelerate the pruned model due to their structured sparsity. This paper proposes a low-power accelerator for transformers with various sizes of structured sparsity induced by pruning with different granularity. In this study, we can accelerate a transformer that has been pruned in a head-wise, line-wise, or block-wise manner. We developed a head scheduling algorithm to support head-wise skip operations and resolve the processing engine (PE) load imbalance problem caused by different number of operations in one head. Moreover, we implemented a sparse general matrix-to-matrix multiplication (sparse GEMM) module that supports line-wise and block-wise skipping. As a result, when compared with a mobile GPU and mobile CPU respectively, our proposed accelerator achieved <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$6.1\times$</tex> and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$13.6\times$</tex> improvements in energy efficiency for the detection transformer (DETR) model and achieved approximately <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$2.6\times$</tex> and <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$7.9\times$</tex> improvements in the energy efficiency on average for the vision transformer (ViT) models.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.