PommDNN: Performance optimal GPU memory management for deep neural network training

Weiduo Chen,Xiaoshe Dong,Xinhang Chen,Song Liu,Qin Xia,Qiang Wang

doi:10.1016/j.future.2023.10.025

Abstract

It is known that deep neural network (DNN) models achieve higher accuracy by deeper structure, but the deep structure and the limitation of GPU memory restrict the model training to a very small batch size, underutilizing the GPU’s computational power. The optimization of GPU memory management based on tensor swapping between GPU and CPU memory can reduce the memory overhead of DNN models, which enables the GPU to train models with a larger batch size. However, because of inappropriate tensor swapping schemes, existing works lead to a slowdown in model training when batch size is expanded.In this paper, we propose pommDNN11https://github.com/victorno2/pommDNN., a performance optimal GPU memory management method that improves the training speed by batch size expansion, and the optimal batch size is selected by predicting the throughput of DNN model training with the expanded batch size. The method of pommDNN makes a tradeoff between the performance gained by batch size expansion and the communication overhead caused by tensor swapping to select the optimal batch size. We design an optimal tensor swapping scheme searching algorithm based on genetic algorithm according to the DNN computational graph and the result of optimal batch size selection. Our experiments show that for DNN models with different depths, pommDNN can improve the throughput of network training by 1∼57%, which is higher than other tensor swapping based methods on most models.

Full Text