Abstract

The bottleneck of Distributed Machine Learning (DML) has shifted from computation to communication. Lots of works have focused on speeding up communication phase from perspective of Parameter Server (PS) architecture, for example resource scheduling. Nonetheless, the performance improvement of these schemes is limited due to the agnostic of the physical topology to the communication pattern of the applications. Concurrently, some articles have also pointed out the impact of topology on DML performance. Besides, our analysis and experimental results also indicate that the general topologies cannot match well with the communication characteristics of DML based on PS architecture. However, to the best of our knowledge, no special topology is tailored for DML. Therefore, in this paper, we propose PSNet, a reconfigurable modular network topology for DML with consideration of the communication characteristics of PS architecture. The main idea of PSNet is that servers are firstly divided into two categories, namely workers configured with high-performance computing capability and parameter servers equipped with multiple Network Interface Cards (NICs). Then Electrical Circuit Switch (ECS) is exploited to connect workers and Top of Rack (ToR) switches for flexibility and reconfigurability in each module. Our theoretical analysis proves that PSNet not only provides high performance for DML tasks, but also achieves high fault tolerance and flexibility. In order to validate the performance of PSNet, we conduct large-scale simulations and small-scale testbed experiments, and the results of experiments demonstrate that PSNet performs 1.89× and 1.92× faster than FatTree for VGG-16 and ResNet50, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.