Task Parallelism-Aware Deep Neural Network Scheduling on Multiple Hybrid Memory Cube-Based Processing-in-Memory

Young Sik Lee,Tae Hee Han

doi:10.1109/access.2021.3077294

Young Sik Lee, Tae Hee Han

Open Access

https://doi.org/10.1109/access.2021.3077294

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 10	License type: CC BY 4.0

Affiliation: Sungkyunkwan University

Abstract
Highlights/Summary
Full-Text
Similar Papers

Abstract

Listen

Processing-in-memory (PIM) comprises computational logic in the memory domain. It is the most promising solution to alleviate the memory bandwidth problem in deep neural network (DNN) processing. The hybrid memory cube (HMC), a 3D stacked memory structure, can efficiently implement the PIM architecture by maximizing the existing legacy hardware. To accelerate DNN inference, multiple HMCs can be connected, and data-independent tasks can be assigned to processing elements (PEs) within each HMC. However, owing to the packet-switched network structure, inter-HMC interconnects exhibit variable and unpredictable latencies depending on the data transmission path and link contention. A well-designed task schedule using context switching can effectively hide communication latency and improve PE utilization. Nevertheless, as the number of HMC increases, the variability of a wide range of inter-HMC communication latencies causes frequent context switching, degrading overall performance. This paper proposes a DNN task scheduling that can effectively utilize task parallelism by reducing the communication latency variance owing to HMC interconnect characteristics. Task partitions are generated to exploit parallelism while providing inter-HMC traffic within the sustainable link bandwidth. Task-to-HMC mapping is performed to hide the average communication latency of intermediate DNN processing results. A task schedule is generated using retiming to accelerate DNN inference while maximizing resource utilization. The effectiveness of the proposed method was verified through simulations using various realistic DNN applications performed on a ZSim x86-64 simulator. The simulations revealed that DNN processing with the proposed scheduling improved the DNN processing speed by reducing the processing time by 18.19% over conventional methods where each HMC operated independently.

Highlights

Deep neural networks (DNNs) have achieved breakthroughs in solving a wide range of challenging computation problems, from image recognition to speech translation [1]
This paper proposes a DNN task scheduling that fully utilizes data-level parallelism by reducing the communication latency variance owing to hybrid memory cube (HMC) interconnect characteristics
According to the HMC 2.1 specification, the bandwidth of each memory was set to 10GB/s

Summary

Introduction

Deep neural networks (DNNs) have achieved breakthroughs in solving a wide range of challenging computation problems, from image recognition to speech translation [1]. This paper proposes a DNN task scheduling that fully utilizes data-level parallelism by reducing the communication latency variance owing to HMC interconnect characteristics. Task-to-HMC mapping is performed to hide the average communication latency of the intermediate DNN processing results.

Objectives

Results

Conclusion

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Task Parallelism-Aware Deep Neural Network Scheduling on Multiple Hybrid Memory Cube-Based Processing-in-Memory

Abstract

Highlights

Summary

Published Version

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

16.5 DynaPlasia: An eDRAM In-Memory-Computing-Based Reconfigurable Spatial Accelerator with Triple-Mode Cell for Dynamic Resource Switching
Sangjin Kim ... Sangyeob Kim
-
Sangjin Kim, et. al.Sangjin Kim ... Sangyeob Kim
19 Feb 2023
19 Feb 2023

Improved Hybrid Memory Cube for Weight-Sharing Deep Convolutional Neural Networks
Hao Zhang ... Seok-Bum Ko
-
Hao Zhang, et. al.Hao Zhang ... Seok-Bum Ko
01 Mar 2019
01 Mar 2019

Real-Time Video Recognition via Decoder-Assisted Neural Network Acceleration Framework
Zhuoran Song ... Xiaoyao Liang
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 42
Zhuoran Song, et. al.Zhuoran Song ... Xiaoyao Liang
01 Jul 2023
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 42

CasHMC: A Cycle-Accurate Simulator for Hybrid Memory Cube
Dong-Ik Jeon ... Ki-Seok Chung
IEEE Computer Architecture Letters | VOL. 16
Dong-Ik Jeon, et. al.Dong-Ik Jeon ... Ki-Seok Chung
01 Jan 2017
IEEE Computer Architecture Letters | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Task Parallelism-Aware Deep Neural Network Scheduling on Multiple Hybrid Memory Cube-Based Processing-in-Memory

Abstract

Highlights

Summary

Published Version

Talk to us

Similar Papers

More From: IEEE Access