TBEM: Testing-Based GPU-Memory Consumption Estimation for Deep Learning

Haiyi Liu,Shaoying Liu,W Eric Wong,Chenglong Wen

doi:10.1109/access.2022.3164510

Abstract

Deep Learning (DL) has been successfully implemented and deployed to various software service applications. During the training process of DL, a large amount of GPU computing resources is required, but it is difficult for developers to accurately calculate the GPU resources that the model may consume before running, which brings great inconvenience to the development of DL systems. Especially in today’s cloud-based model training. Therefore, it is very important to estimate the GPU memory resources that the DL model may use in a certain computing framework. Existing work has focused on static analysis methods to assess GPU memory consumption, highly coupled with the framework, and lack of research on low-coupled GPU memory consumption of the framework. In this article, we propose TBEM, which is a test-based method for estimating the memory usage of the DL model. First, TBEM generates enough DL models using an orthogonal array testing strategy and a classical neural network design pattern. Then, TBEM generates DL model tested in a real environment to obtain the real-time GPU memory usage values corresponding to the model. After obtaining the data of different models and corresponding GPU usage values, the data is analyzed by regression.

Full Text