Distributed Deep Learning With GPU-FPGA Heterogeneous Computing

Kenji Tanaka,Tsuyoshi Ito,Naru Nemoto,Takeshi Sakamoto,Yuki Arikawa,Kazutaka Morita,Junji Teramoto,Kazuhiko Terada

doi:10.1109/mm.2020.3039835

Abstract

In distributed deep learning (DL), collective communication algorithms, such as Allreduce, used to share training results between graphical processing units (GPUs) are an inevitable bottleneck. We hypothesize that a cache access latency occurred at every Allreduce is a significant bottleneck in the current computational systems with high-bandwidth interconnects for distributed DL. To reduce this frequency of latency, it is important to aggregate data at the network interfaces. We implement a data aggregation circuit in a field-programmable gate array (FPGA). Using this FPGA, we proposed novel Allreduce architecture and training strategy without accuracy degradation. Results of the measurement show Allreduce latency reduction to 1/4. Our system can also conceal about 90% of the communication overhead and improve scalability by 20%. The end-to-end time consumed for training in distributed DL with ResNet-50 and ImageNet is reduced to 87.3% without any degradation in validation accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Distributed Deep Learning With GPU-FPGA Heterogeneous Computing

Abstract

Talk to us

Similar Papers

More From: IEEE Micro

Lead the way for us

Journal: IEEE Micro	Publication Date: Nov 25, 2020
Citations: 3

Similar Papers

Communication-Efficient Distributed Deep Learning with GPU-FPGA Heterogeneous Computing
Kenji Tanaka ... Fumiaki Miura
-
Kenji Tanaka, et. al.Kenji Tanaka ... Fumiaki Miura
01 Aug 2020
01 Aug 2020

EasyDist
Varun Natu ... Rahul Ghosh
-
Varun Natu, et. al.Varun Natu ... Rahul Ghosh
03 Jan 2019
03 Jan 2019

Collective Communication Performance Evaluation for Distributed Deep Learning Training
Sookwang Lee ... Jaehwan Lee
Applied Sciences | VOL. 14
Sookwang Lee, et. al.Sookwang Lee ... Jaehwan Lee
12 Jun 2024
Applied Sciences | VOL. 14

Instance segmentation on distributed deep learning big data cluster
Mohammed Elhmadany ... Hossam E Abdelmunim
Journal of Big Data | VOL. 11
Mohammed Elhmadany, et. al.Mohammed Elhmadany ... Hossam E Abdelmunim
02 Jan 2024
Journal of Big Data | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Distributed Deep Learning With GPU-FPGA Heterogeneous Computing

Abstract

Talk to us

Similar Papers

More From: IEEE Micro