Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models

Kyeong-Hwan Kim,Chang-Sung Jeong

doi:10.3390/app13169306

Abstract

In this study, we introduce a novel training algorithm specifically designed to overcome the limitations of GPU memory on a single DGX-A100 system. By utilizing the CPU and main memory in the training process and applying a strategy of division and parallelization, our algorithm enhances the size of the trainable language model and the batch size. In addition, we developed a comprehensive management system to effectively manage the execution of the algorithm. This system systematically controls the training process and resource usage, while also enabling the asynchronous deployment of tasks. Finally, we proposed a scheduling technique integrated into the management system, promoting efficient task scheduling in a complex, heterogeneous training environment. These advancements equip researchers with the ability to work with larger models and batch sizes, even when faced with limited GPU memory.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models

Abstract

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Journal: Applied Sciences	Publication Date: Aug 16, 2023
License type: CC BY 4.0

Similar Papers

PommDNN: Performance optimal GPU memory management for deep neural network training
Weiduo Chen ... Qiang Wang
Future Generation Computer Systems | VOL. 152
Weiduo Chen, et. al.Weiduo Chen ... Qiang Wang
01 Nov 2023
Future Generation Computer Systems | VOL. 152

Large Data Flow Graphs in Limited GPU Memory
Geert Janssen ... Tung D Le
-
Geert Janssen, et. al.Geert Janssen ... Tung D Le
01 Dec 2019
01 Dec 2019

SwapAdvisor
Chien-Chin Huang ... Gu Jin
-
Chien-Chin Huang, et. al.Chien-Chin Huang ... Gu Jin
09 Mar 2020
09 Mar 2020

Optimal Re-Materialization Strategies for Heterogeneous Chains: How to Train Deep Neural Networks with Limited Memory
Olivier Beaumont ... Lionel Eyraud-Dubois
ACM Transactions on Mathematical Software | VOL. -
Olivier Beaumont, et. al.Olivier Beaumont ... Lionel Eyraud-Dubois
05 Mar 2024
ACM Transactions on Mathematical Software | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizing Single DGX-A100 System: Overcoming GPU Limitations via Efficient Parallelism and Scheduling for Large Language Models

Abstract

Talk to us

Similar Papers

More From: Applied Sciences