Integration of Single-Port Memory (ISPM) for Multiprecision Computation in Systolic-Array-Based Accelerators

Renyu Yang,Yuhang Li,Junzhong Shen,Mei Wen,Yasong Cao

doi:10.3390/electronics11101587

Abstract

On-chip memory is one of the core components of deep learning accelerators. In general, the area used by the on-chip memory accounts for around 30% of the total chip area. With the increasing complexity of deep learning algorithms, it will become a challenge for the accelerators to integrate much larger on-chip memory responding to algorithm needs, whereas the on-chip memory for multiprecision computation is required by the different precision (such as FP32, FP16) computations in training and inference. To solve it, this paper explores the use of single-port memory (SPM) in systolic-array-based deep learning accelerators. We propose transformation methods for multiple precision computation scenarios, respectively, to avoid the conflict of simultaneous read and write requests on the SPM. Then, we prove that the two methods are feasible and can be implemented on hardware without affecting the computation efficiency of the accelerator. Experimental results show that both methods have about 30% and 25% improvement in terms of area cost when accelerator integrates SPM without affecting the throughput of the accelerator, while the hardware cost is almost negligible.

Full Text