Abstract

Networks are the well-known bottlenecks for distributed deep learning (DDL) jobs. The DDL jobs require topologies matching their communication patterns as well as high and stable bandwidth in network transmission, which makes the traditional optimization methods in the field of electrical packet switching (EPS) perform poorly when adapting to the DDL jobs. In contrast, optical circuit switching (OCS) has the natural advantages of topology reconstruction and high bandwidth, which can well fulfill the network requirements and reduce the in-network delay of the DDL jobs. Consequently, in the present work, we propose Optical Slicing provisioning in support of DDL jobs (OSDL). Practically, over an EPS + OCS hybrid network, to match the network topology and the traffic patterns of DDL jobs, we comprehensively design and propose corresponding algorithms for job placement, and scheduling of both OCS and EPS. We first evaluate OSDL with large-scale networks and high-density DDL jobs, by simulations. The simulation results show that OSDL outperforms multiple well-known scheduling methods. More specifically, the speedup of OSDL achieves up to 3.31 times and 10.48 times, over a homogeneous network and a heterogeneous network, respectively. Additionally, we conduct experiments with a relatively small-scale network and relatively low-density DDL jobs, we still observe up to 1.74 times of the speedup. The simulations and the experiments verify the effectiveness of OSDL in accelerating DDL jobs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call