Carry-Propagation-Adder-Factored Gemmini Systolic Array for Machine Learning Acceleration

Kashif Inayat,Jaeyong Chung

doi:10.3390/electronics10060652

Abstract

Systolic arrays are the primary part of modern deep learning accelerators and are being used widely in real-life applications such as self-driving cars. This paper presents a novel factored systolic array, where the carry propagation adder for accumulation and the rounding logic are extracted out from each processing element, which reduces the area, power and delay of the processing elements substantially. The factoring is performed in the column-wise manner and the cost of the factored logic, placed at each column output, is amortized by the processing elements in a column. We demonstrate the proposed factoring in an open source systolic array, Gemmini. The factoring technique does not change the functionality of the base design and is transparent to applications. We show that the proposed technique leads to substantial reduction in area and delay up to 45.3% and 23.7%, respectively, compared to the Gemmini baseline.

Highlights

Machine learning (ML) algorithms have acquired considerable attention after deep learning (DL) demonstrated breakthroughs in various complex tasks such as the ImageNet challenge
We present a novel factored systolic array and demonstrate it using an open-source systolic array, the Gemmini (Gemmini system on chip (SoC) RTL can be generated by following this lab, EE-290-2, Hardware for Machine Learning, Lab-2) [17]
We built the test binaries using bare-metal software (Bare-metal software given in the Gemmini open source repository) test and checked the correctness of both designs in bare-metal environment

Summary

Introduction

Machine learning (ML) algorithms have acquired considerable attention after deep learning (DL) demonstrated breakthroughs in various complex tasks such as the ImageNet challenge. The vigorous ability of DL to solve complex tasks is not limited to image recognition and applicable in object detection, speech recognition, natural language processing, etc. Deep learning models require massive amounts of computation and large memory footprints, and recent research have focused on DL accelerators [4]. The matrix multiplication is the key primitive in computation of ML models, and systolic arrays (SAs) for the matrix multiplication have been adopted widely [5,6]. Systolic arrays, proposed in 1979, are two dimensional mesh that consist of processing elements (PEs) organized in the form of a grid [7,8]. Concurrency and simple architectural characteristics, many industry giants such as Google [9], Nvidia [10], Intel [11]

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Mar 11, 2021
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Carry-Propagation-Adder-Factored Gemmini Systolic Array for Machine Learning Acceleration

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Scalable don't-care-based logic optimization and resynthesis
Alan Mishchenko ... Jie-Hong Roland Jiang
-
Alan Mishchenko, et. al.Alan Mishchenko ... Jie-Hong Roland Jiang
22 Feb 2009
22 Feb 2009

A certain examination on heterogeneous systolic array (HSA) design for deep learning accelerations with low power computations
Dinesh Kumar Jayaraman Rajanediran ... K Priyadharsini
Sustainable Computing: Informatics and Systems | VOL. 44
Dinesh Kumar Jayaraman Rajanediran, et. al.Dinesh Kumar Jayaraman Rajanediran ... K Priyadharsini
11 Oct 2024
Sustainable Computing: Informatics and Systems | VOL. 44

Hybrid Accumulator Factored Systolic Array for Machine Learning Acceleration
Kashif Inayat ... Jaeyong Chung
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 30
Kashif Inayat, et. al.Kashif Inayat ... Jaeyong Chung
01 Jul 2022
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 30

ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN Tensors
Ching-Jui Lee ... Tsung Tai Yeh
ACM Transactions on Architecture and Code Optimization | VOL. -
Ching-Jui Lee, et. al.Ching-Jui Lee ... Tsung Tai Yeh
21 Mar 2024
ACM Transactions on Architecture and Code Optimization | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Carry-Propagation-Adder-Factored Gemmini Systolic Array for Machine Learning Acceleration

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics