Distributed deep learning training using silicon photonic switched architectures

Ziyi Zhu,Zhenguo Wu,Min Yee Teh,Madeleine Strom Glick,Keren Bergman,Shijia Yan,Maarten Hattink

doi:10.1063/5.0070711

Abstract

The scaling trends of deep learning models and distributed training workloads are challenging network capacities in today’s datacenters and high-performance computing (HPC) systems. We propose a system architecture that leverages silicon photonic (SiP) switch-enabled server regrouping using bandwidth steering to tackle the challenges and accelerate distributed deep learning training. In addition, our proposed system architecture utilizes a highly integrated operating system-based SiP switch control scheme to reduce implementation complexity. To demonstrate the feasibility of our proposal, we built an experimental testbed with a SiP switch-enabled reconfigurable fat tree topology and evaluated the network performance of distributed ring all-reduce and parameter server workloads. The experimental results show up to 3.6× improvements over the static non-reconfigurable fat tree. Our large-scale simulation results show that server regrouping can deliver up to 2.3× flow throughput improvement for a 2× tapered fat tree and a further 11% improvement when higher-layer bandwidth steering is employed. The collective results show the potential of integrating SiP switches into datacenters and HPC systems to accelerate distributed deep learning training.

Highlights

Deep learning (DL) is a branch of machine learning that has become a major driving force behind the progress in artificial intelligence applications such as image classification,[1] natural language processing,[2] and recommendation systems.[3]
Similar performance improvements are observed for the server regrouping and the server regrouping with bandwidth steering above the ToR as 67% and 47% in execution time differences (3.0 and 1.9 improvements), respectively
We have shown a reconfigurable datacenter/high-performance computing (HPC) system architecture using silicon photonic (SiP) switches to accelerate distributed deep learning training workloads

Summary

INTRODUCTION

Deep learning (DL) is a branch of machine learning that has become a major driving force behind the progress in artificial intelligence applications such as image classification,[1] natural language processing,[2] and recommendation systems.[3] The demand for better DL models has resulted in a rise of more complex models that support larger dataset sizes to improve these deep neural networks.[4,5] The typical approach to speed up the training process of these larger DL models is parallelization using many GPU-equipped nodes,[6,7,8] which requires a high-bandwidth interconnect to support the communication requirements between training devices.[9] DL workloads are taking a large proportion of the computation in today’s high-performance computing (HPC) operations, and observation has shown that the demand is dramatically growing in datacenters.[10] These trends have shifted the performance bottleneck from the compute to the network interconnect due to system fragmentation (applications often receive an allocation on a set of distant and non-contiguous nodes) This places a tremendous challenge on interconnect designs to provide high bandwidth and low latency networking to sustain the continual growth of these hardwaredriven deep learning applications. Our simulation results show that server regrouping can deliver up to 2.3 flow throughput improvement for a 2 tapered fat tree and a further 11% improvement when higher-layer bandwidth steering is applied

SILICON PHTONICS FOR OPTICAL CIRCUIT SWITCHING

System Architecture

SiP Switches and Control

EXPERIMENTS AND RESULTS

SYSTEM-SCALE EVALUATION

Simulation Setup

Server Regrouping and Bandwidth Steering

Results

CONCLUSIONS

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: APL Photonics	Publication Date: Mar 1, 2022
Citations: 5	License type: cc-by

R Discovery Prime

R Discovery Prime

Distributed deep learning training using silicon photonic switched architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: APL Photonics

Lead the way for us

Similar Papers

Opportunities for Cross-Layer Design in High-Performance Computing Systems with Integrated Silicon Photonic Networks
Asif Mirza ... Shadi Manafi Avari
-
Asif Mirza, et. al.Asif Mirza ... Shadi Manafi Avari
01 Mar 2020
01 Mar 2020

Design of robust scheduling methodologies for high performance computing

-

01 Jan 2019
01 Jan 2019

Multi-node Power/Performance Modeling for HPC System
Sangwoo Han ... Tae Yang Jeong
-
Sangwoo Han, et. al.Sangwoo Han ... Tae Yang Jeong
01 Jun 2019
01 Jun 2019

Code Modernization Tools for Assisting Users in Migrating to Future Generations of Supercomputers
Ritu Arora ... Lars Koesterke
-
Ritu Arora, et. al.Ritu Arora ... Lars Koesterke
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed deep learning training using silicon photonic switched architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: APL Photonics