MILP: A Memory Improved LSTM Prediction Algorithm for Gradient Transmission Time in Distributed Deep Learning

Jingqi Li,Lei Qian,Zhengzhi Xu,Shuren Li,Yifei Lu

doi:10.1109/icc45855.2022.9839224

Abstract

In recent years, distributed deep learning (DDL) has been widely used to scale out and accelerate deep neural network training. In DDL, each worker trains a copy of the deep learning model with different training inputs and synchronizes the model gradients at the end of each iteration. However, it is well known that the network communication for synchronizing model parameters is the main bottleneck in DDL. In this paper, we propose a new idea to relieve network congestion. Through the gradient transmission time prediction, we can predict the arrival time of burst traffic in advance. It provides support for prevention efforts ahead of time. We propose a memory improved LSTM prediction algorithm called MILP to predict gradient transmission time. MILP designs an improving memory for LSTM to overcome the drawback of LSTM that is too conservative in predicting gradient transmission time. We compare the performance of MILP with other time series forecasting (TSF) models on our data sets. Our experiments show that MILP is more accurate than other classical TSF models in predicting gradient transmission time. The average error rate is 13.17 % lower than the LSTM model. Our code is available at https://github.com/surprisejxb789/MILP.

Full Text