TEA-S: A Tiny and Efficient Architecture for PLAC-Based Softmax in Transformers

Zhengyu Mei,Hongxi Dong,Yuxuan Wang,Hongbing Pan

doi:10.1109/tcsii.2023.3265710

Abstract

With the popularity of Transformer neural networks, it is inevitable for hardware accelerators to perform nonlinear computation mainly based on the softmax operation. However, a better compromise between the algorithm performance and hardware overhead is always a constant challenge. Hence, this brief advances a tiny and efficient architecture named TEA-S to implement the softmax function with the universal approximation scheme based on Piecewise Linear Approximation Computation (PLAC). With the first co-optimization of calculation and memory, TEA-S can better achieve the design goals of the tiny area and high efficiency. The implementation results show that the peak efficiency of processing 8-bit quantized data will be 487.51 Gps/(mm2∙mW) with the tiny area of 3052.43 μm2 at the frequency of 0.5 GHz under 90-nm CMOS technology. Moreover, TEA-S can offer the universal solution to any lengths of input sequences, providing negligible accuracy loss in Transformers compared to the quantized baselines.

Full Text