Functionality-Based Processing-in-Memory Accelerator for Deep Convolutional Neural Networks

Min-Jae Kim,Shin-Dug Kim,Jeong-Geun Kim,Su-Kyung Yoon

doi:10.1109/access.2021.3122818

Min-Jae Kim, Shin-Dug Kim + Show 2 more

Open Access

https://doi.org/10.1109/access.2021.3122818

Copy DOI

Abstract

Processing-in-memory (PIM) architectures show the advantage of handling applications that generate complicated memory request patterns; usually, those kinds of memory streams degrade the application’s performance in conventional memory hierarchy systems. In particular, deep convolutional neural networks (DCNNs) processing that consists of several functionalities could be highly optimized if PIM cores can extend the processing capability and data accessibility. In this work, we propose a functionality-based PIM accelerator for DCNNs. We design several modules in addition to the conventional PIM system based on a hybrid memory cube (HMC). First, we compose a new buffer module, namely, a shared cache, in which PIM cores are provided DCNN functionalities and pre-trained weights. The PIM cores subsequently enhance computational utilization and data accessibility. Second, an efficient replacement method complements the shared cache to optimize the data miss rate of DCNN processing. Third, we compose dual prefetchers that can deal with DCNN’s memory access patterns, thereby reducing the system’s overall latency. Fourth, we compose a PIM scheduler for PIM core-level autonomous request control. The PIM scheduler relieves the host processor of significant computational loads, achieving the overall latency of the system and reducing the energy consumption. By the performance evaluation based on the trace-driven HMC simulator, our proposed model improves average latency and bandwidth by 38.9 and 27.9 % with only 18.7 % more energy consumption compared with conventional HMC-based PIM systems. Our system also achieves scalable processing performance because when the DCNN becomes deeper, it processes faster than conventional PIM systems.

Highlights

In the era of the fourth industrial revolution, various cutting-edge technologies, such as artificial intelligence, robotics, 5G network, and internet-of-things, have been integrated for intelligent service automation
deep convolutional neural networks (DCNNs) functionality-based requests could be processed by the PIM core which is configured with simple in-order core in our assumed PIM system. ● We composed simple dual prefetchers in each PIM core to deal with patterned memory access of DCNN workloads. ● We introduced a PIM scheduler with several functions for PIM core-level autonomous request control
The shared cache has high energy requirements because the DCNN’s functional primitives are provisioned to multiple PIM cores, the PIM system’s energy was significantly reduced when the PIM scheduler was added, and the values were lower than the baseline in LeNet, which represents the effect of a significant energy reduction in the SerDes link as the PIM scheduler allowed for PIM core-level autonomous request control without the aid of the host processor

Summary

INTRODUCTION

In the era of the fourth industrial revolution, various cutting-edge technologies, such as artificial intelligence, robotics, 5G network, and internet-of-things, have been integrated for intelligent service automation. In the recent literature, small-sized convolution filters, such as 1 × 1 or 3 × 3 size, were used to reduce the dimensions of feature maps [2], and residual blocks were used to make shortcuts in a feed-forward network [3], resulting in low data locality and frequent memory accesses These computational overheads have increased the demand for effective accelerating architectures. Different from the conventional PIM architectures (e.g., HMC (Section II-A)) which used PIM cores only as distributed near memory calculators to operate atomic instructions offloaded from the host processors, the PIM scheduler comprises several function calls that allow multiple PIM cores to control DCNN’s requests autonomously.

AND RELATED WORK

EVALUATIONS

FUNCTIONALITY-BASED DCNN OPERATION ANALYSIS

OPTIMAL SIZE OF THE PREFETCH BUFFER

ENERGY CONSUPTION

PREFETCH PERFORMANCE

Findings

CONCLUSION AND FUTURE WORK

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Functionality-Based Processing-in-Memory Accelerator for Deep Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Journal: IEEE Access	Publication Date: Jan 1, 2021
License type: CC BY 4.0

Similar Papers

DDAM: D ata D istribution- A ware M apping of CNNs on Processing-In-Memory Systems
Qi Xu ... Junpeng Wang
ACM Transactions on Design Automation of Electronic Systems | VOL. 28
Qi Xu, et. al.Qi Xu ... Junpeng Wang
19 Mar 2023
ACM Transactions on Design Automation of Electronic Systems | VOL. 28

Research on improved convolutional wavelet neural network
Jiaming Chen ... Peixuan Li
Scientific Reports | VOL. 11
Jiaming Chen, et. al.Jiaming Chen ... Peixuan Li
09 Sep 2021
Scientific Reports | VOL. 11

Task Parallelism-Aware Deep Neural Network Scheduling on Multiple Hybrid Memory Cube-Based Processing-in-Memory
Tae Hee Han ... Young Sik Lee
IEEE Access | VOL. 9
Tae Hee Han, et. al.Tae Hee Han ... Young Sik Lee
01 Jan 2020
IEEE Access | VOL. 9

Reliability-Aware Training and Performance Modeling for Processing-In-Memory Systems
Yu Wang ... Zhenhua Zhu
-
Yu Wang, et. al.Yu Wang ... Zhenhua Zhu
18 Jan 2021
18 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Functionality-Based Processing-in-Memory Accelerator for Deep Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access