Abstract

Deep learning has shown considerable promise in numerous practical machine learning applications. However training deep learning models is highly time-consuming. To solve this problem, many studies design distributed deep learning systems with multiple graphics processing units (GPUs) on a single machine or across machines. Data parallelism is the usually method to use multiple GPUs. However, this method is not suitable for all deep learning models such as fully connected deep neural network (DNN) because of the transfer overhead. In this paper we have analyzed the transfer overhead. Parameters synchronization is the key factor to cause the transfer overhead. To reduce parameters synchronization, we propose a multiple-GPUs framework based on the model averaging where each GPU trains a whole model until convergence and the CPU averages the models as the final optimal model. The only one parameters synchronization occurs when all GPUs have completed the training model, thus dramatically reducing transfer overhead. Experimental results show that the model averaging method achieves a speedup of 1.6x with two GPUs and 1.8x with four GPUs compared with the training method on a single GPU, respectively. Compared with the data parallelism method, it also achieves a speedup of 17x and 25x on two GPUs and four GPUs, respectively.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call