Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration

Youngrang Kim,Jik-Soo Kim,Jaehwan Lee,Hyunseung Jei,Hongchan Roh

doi:10.1007/s10586-019-02974-6

Abstract

This paper presents a comprehensive suite of techniques for optimized memory management in multi-GPU systems to accelerate deep learning application execution. We employ a hybrid utilization of GPU and CPU memories in a multi-GPU environment by effectively addressing contention issues in the shared interconnect (e.g., PCIe, NVLink). In addition, we designed and implemented an intelligent prefetching algorithm (from CPU memory to GPU) that achieves the highest processing throughput while sustaining a large mini-batch size. We successfully implemented our optimization techniques on TensorFlow, and performed extensive experiments in various multi-GPU environments including traditional PCIe and the latest high-bandwidth interconnect, NVLink. Evaluation results show that our proposed scheme actually improves computing performance by decreasing the I/O bottleneck, and effectively increasing the mini-batch size without sacrificing overall training throughput.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration

Abstract

Talk to us

Similar Papers

More From: Cluster Computing

Lead the way for us

Journal: Cluster Computing	Publication Date: Aug 23, 2019
Citations: 8

Similar Papers

Efficient Multi-GPU Memory Management for Deep Learning Acceleration
Youngrang Kim ... Jik-Soo Kim
-
Youngrang Kim, et. al.Youngrang Kim ... Jik-Soo Kim
01 Sep 2018
01 Sep 2018

ZeRO-infinity
Samyam Rajbhandari ... Shaden Smith
-
Samyam Rajbhandari, et. al.Samyam Rajbhandari ... Shaden Smith
13 Nov 2021
13 Nov 2021

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses
Jeongmin Brian Park ... Wen-mei Hwu
Proceedings of the VLDB Endowment | VOL. 17
Jeongmin Brian Park, et. al.Jeongmin Brian Park ... Wen-mei Hwu
01 Feb 2024
Proceedings of the VLDB Endowment | VOL. 17

GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management
Youngsok Kim ... Jaewon Lee
-
Youngsok Kim, et. al.Youngsok Kim ... Jaewon Lee
01 Feb 2014
01 Feb 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration

Abstract

Talk to us

Similar Papers

More From: Cluster Computing