Abstract

Deep neural networks (DNNs) achieve higher accuracy as the amount of training data increases. However, training data such as personal medical data are often privacy sensitive and cannot be collected. Methods have been proposed for training with distributed data that remain in a wide area network. Due to heterogeneity in a wide area network, methods based on synchronous communication, such as all-reduce stochastic gradient descent (SGD), are not suitable, and gossip SGD is promising because it is based on asynchronous communication. Communication time is a problem in a wide area network. Gossip SGD cannot use double buffering that is a technique for hiding the communication time, since gossip SGD uses an asynchronous communication method. In this paper, we propose a type of gossip SGD in which computation and communication overlap to accelerate learning. The proposed method shares newer models by scheduling communication. To schedule the communication, the nodes share the information of the estimated communication time and communication-enabled nodes. This method is effective in both homogeneous and heterogeneous networks. The experimental results using the CIFAR-100 and Fashion-MNIST datasets demonstrate the faster convergence of the proposed method.

Highlights

  • Since AlexNet [1] won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) for object detection and image classification in 2012, deep neural networks (DNNs) have had an impact on many fields, including image recognition, speech recognition, and language processing

  • In the case of DNNs that are normally distributed on a single computer cluster but not a wide area network, methods using a parameter server to share the models [6]–[10] and all-reduce stochastic gradient descent (SGD) [4], [11]–[16], which is based on a synchronous all-reduce communication method to share the models, are studied

  • We propose a type of gossip SGD in which the computation and communication overlap

Read more

Summary

INTRODUCTION

Since AlexNet [1] won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) for object detection and image classification in 2012, deep neural networks (DNNs) have had an impact on many fields, including image recognition, speech recognition, and language processing. For dealing with a wide area network, gossip stochastic gradient descent (SGD) [3]–[5] is the most common method. In the case of DNNs that are normally distributed on a single computer cluster but not a wide area network, methods using a parameter server to share the models [6]–[10] and all-reduce SGD [4], [11]–[16], which is based on a synchronous all-reduce communication method to share the models, are studied. Shudo: Communication Scheduling for Gossip SGD in Wide Area Network. The computation on nodes and the communication between nodes overlap in some distributed DNNs based on synchronous communication to hide the communication time

RELATED WORK
GOSSIP SGD
2: Initialize
PROPOSED METHOD
1: Initialize
EXPERIMENT
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.