Enhancing Matrix Multiplication With a Monolithic 3-D-Based Scratchpad Memory

Cong Thuan Do,Young Seo Lee,Sung Woo Chung,Cheol Hong Kim,Jeong Hwan Choi

doi:10.1109/les.2020.3001954

Abstract

Convolutional neural networks (CNNs) are one of the most popular machine learning algorithms. The convolutional layers, which account for the most execution time of CNNs, are implemented with matrix multiplication because the convolution operation performs dot products between filters and local regions of the input. On the other hand, GPUs with thousands of cores were proven to significantly accelerate matrix multiplication, compared to CPUs with a limited number of cores, especially for large matrices. However, the current memory architecture allows only one row access at a time so that multiple accesses are necessary to read the column data of the second matrix, thus slowing down matrix multiplication. In this study, we adopt the monolithic 3-D integration for the GPU scratchpad memory, called monolithic 3-D integration (M3D) scratchpad memory (SPM), to enhance matrix multiplication. The M3D SPM allows one access to read the column data of the second matrix, similar to the case of the first matrix. The simulation results show that our M3D SPM improves the system performance by 46.3% for the 32×32 matrix multiplication, over the conventional 2-D SPM where the column data of the second matrix are read sequentially.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enhancing Matrix Multiplication With a Monolithic 3-D-Based Scratchpad Memory

Abstract

Talk to us

Similar Papers

More From: IEEE Embedded Systems Letters

Lead the way for us

Journal: IEEE Embedded Systems Letters	Publication Date: Jun 12, 2020
Citations: 6

Similar Papers

A Power-Efficient Accelerator for Convolutional Neural Networks
Fan Sun ... Xi Li
-
Fan Sun, et. al.Fan Sun ... Xi Li
01 Sep 2017
01 Sep 2017

Managing hybrid on-chip scratchpad and cache memories for multi-tasking embedded systems
Zimeng Zhou ... Lei Ju
-
Zimeng Zhou, et. al. Zimeng Zhou ... Lei Ju
01 Jan 2015
01 Jan 2015

UniCNN: A Pipelined Accelerator Towards Uniformed Computing for CNNs
Fan Sun ... Xi Li
International Journal of Parallel Programming | VOL. 46
Fan Sun, et. al.Fan Sun ... Xi Li
27 Sep 2017
International Journal of Parallel Programming | VOL. 46

Convolutional Neural Networks for Biometrics Applications
Jiani Jin
SHS Web of Conferences | VOL. 144
Jiani JinJiani Jin
01 Jan 2021
SHS Web of Conferences | VOL. 144

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhancing Matrix Multiplication With a Monolithic 3-D-Based Scratchpad Memory

Abstract

Talk to us

Similar Papers

More From: IEEE Embedded Systems Letters