Base-2 Softmax Function: Suitability for Training and Efficient Hardware Implementation

Yuan Zhang,Yonggang Zhang,Lele Peng,Zhonghai Lu,Shubin Zheng,Lianghua Quan,Hui Chen

doi:10.1109/tcsi.2022.3175534

Abstract

The softmax function is widely used in deep neural networks (DNNs), its hardware performance plays an important role in the training and inference of DNN accelerators. However, due to the complexity of the traditional softmax, the existing hardware architectures are resource-consuming or have low precision. In order to address the challenges, we study a base-2 softmax function in terms of its suitability for neural network training and efficient hardware implementation. Compared to the classical base- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$e$ </tex-math></inline-formula> softmax function, the base-2 softmax function is a new softmax function that uses 2 as the exponential base instead of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$e$ </tex-math></inline-formula> . From the aspects of mathematical derivation and software simulation, we first demonstrate the feasibility and good accuracy of the base-2 softmax function in the application of neural network training. Then, we use the symmetric-mapping lookup table (SM-LUT) method to design a low-complexity architecture but with high precision to implement it. Under TSMC 28nm CMOS technology, an example design of our architecture has the area of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$5676 ~\mu m^{2}$ </tex-math></inline-formula> and the power consumption of 13.12 mW for circuit synthesis at the frequency of 3 GHz. Compared with the latest works, our architecture achieves the best performance and efficiency.

Full Text