Abstract

Abstract Parameter servers (PSs) placement is one of the most important factors for global model training on distributed deep learning. This paper formulates a novel problem for placement strategy of PSs in the dynamic available storage capacity, with the objective of minimizing the training time of the distributed deep learning under the constraints of storage capacity and the number of local PSs. Then, we provide the proof for the NP-hardness of the proposed problem. The whole training epochs are divided into two parts, i.e. the first epoch and the other epochs. For the first epoch, an approximation algorithm and a rounding algorithm are proposed in this paper, to solve the proposed problem. For the other epochs, an adjustment algorithm is proposed, by continuously adjusting the decisions for placement strategy of PSs to decrease the training time of the global model. Simulation results show that the proposed approximation algorithm and rounding algorithm perform better than existing works for all cases, in terms of the training time of global model. Meanwhile, the training time of global model for the proposed approximation algorithm is very close to that for optimal solution generated by the brute-force approach for all cases. Besides, the integrated algorithm outperforms the existing works when the available storage capacity varies during the training.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call