Fast and scalable all-optical network architecture for distributed deep learning

Wenzhe Li,Peiheng Zhang,Guangming Tan,Guojun Yuan,Zhan Wang,George N Rouskas

doi:10.1364/jocn.511696

Abstract

With the ever-increasing size of training models and datasets, network communication has emerged as a major bottleneck in distributed deep learning training. To address this challenge, we propose an optical distributed deep learning (ODDL) architecture. ODDL utilizes a fast yet scalable all-optical network architecture to accelerate distributed training. One of the key features of the architecture is its flow-based transmit scheduling with fast reconfiguration. This allows ODDL to allocate dedicated optical paths for each traffic stream dynamically, resulting in low network latency and high network utilization. Additionally, ODDL provides physically isolated and tailored network resources for training tasks by reconfiguring the optical switch using LCoS-WSS technology. The ODDL topology also uses tunable transceivers to adapt to time-varying traffic patterns. To achieve accurate and fine-grained scheduling of optical circuits, we propose an efficient distributed control scheme that incurs minimal delay overhead. Our evaluation on real-world traces showcases ODDL’s remarkable performance. When implemented with 1024 nodes and 100 Gbps bandwidth, ODDL accelerates VGG19 training by 1.6× and 1.7× compared to conventional fat-tree electrical networks and photonic SiP-Ring architectures, respectively. We further build a four-node testbed, and our experiments show that ODDL can achieve comparable training time compared to that of an ideal electrical switching network.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fast and scalable all-optical network architecture for distributed deep learning

Abstract

Talk to us

Similar Papers

More From: Journal of Optical Communications and Networking

Lead the way for us

Similar Papers

<title>Optical network architecture for future global telecommunications</title>
Philip Dumortier ... Francesco B Masetti
-
Philip Dumortier, et. al.Philip Dumortier ... Francesco B Masetti
17 Feb 1995
17 Feb 1995

Accelerate Distributed Deep Learning with a Fast Reconfigurable Optical Network
Wenzhe Li ... Zhan Wang
-
Wenzhe Li, et. al.Wenzhe Li ... Zhan Wang
01 Jan 2024
01 Jan 2024

Dynamic Elastic and Scalable Photonic Infrastructures and Network Architectures
Ioannis Tomkos ... Dimitrios Klonidis
-
Ioannis Tomkos, et. al.Ioannis Tomkos ... Dimitrios Klonidis
01 Jun 2011
01 Jun 2011

A Scalable, High-Performance, and Fault-Tolerant Network Architecture for Distributed Machine Learning
Songtao Wang ... Jinkun Geng
IEEE/ACM Transactions on Networking | VOL. 28
Songtao Wang, et. al.Songtao Wang ... Jinkun Geng
01 Aug 2020
IEEE/ACM Transactions on Networking | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast and scalable all-optical network architecture for distributed deep learning

Abstract

Talk to us

Similar Papers

More From: Journal of Optical Communications and Networking