Abstract

In recent years, distributed deep learning (DDL) has been widely used to scale out and accelerate deep neural network training. In DDL, each worker trains a copy of the deep learning model with different training inputs and synchronizes the model gradients at the end of each iteration. However, it is well known that the network communication for synchronizing model parameters is the main bottleneck in DDL. In this paper, we propose a new idea to relieve network congestion. Through the gradient transmission time prediction, we can predict the arrival time of burst traffic in advance. It provides support for prevention efforts ahead of time. We propose a memory improved LSTM prediction algorithm called MILP to predict gradient transmission time. MILP designs an improving memory for LSTM to overcome the drawback of LSTM that is too conservative in predicting gradient transmission time. We compare the performance of MILP with other time series forecasting (TSF) models on our data sets. Our experiments show that MILP is more accurate than other classical TSF models in predicting gradient transmission time. The average error rate is 13.17 % lower than the LSTM model. Our code is available at https://github.com/surprisejxb789/MILP.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.