S-Caffe

Ammar Ahmad Awan,Jahanzeb Maqbool Hashmi,Dhabaleswar K Panda,Khaled Hamidouche

doi:10.1145/3155284.3018769

Abstract

Availability of large data sets like ImageNet and massively parallel computation support in modern HPC devices like NVIDIA GPUs have fueled a renewed interest in Deep Learning (DL) algorithms. This has triggered the development of DL frameworks like Caffe, Torch, TensorFlow, and CNTK. However, most DL frameworks have been limited to a single node. In order to scale out DL frameworks and bring HPC capabilities to the DL arena, we propose, S-Caffe; a scalable and distributed Caffe adaptation for modern multi-GPU clusters. With an in-depth analysis of new requirements brought forward by the DL frameworks and limitations of current communication runtimes, we present a co-design of the Caffe framework and the MVAPICH2-GDR MPI runtime. Using the co-design methodology, we modify Caffe's workflow to maximize the overlap of computation and communication with multi-stage data propagation and gradient aggregation schemes. We bring DL-Awareness to the MPI runtime by proposing a hierarchical reduction design that benefits from CUDA-Aware features and provides up to a massive 133x speedup over OpenMPI and 2.6x speedup over MVAPICH2 for 160 GPUs. S-Caffe successfully scales up to 160 K-80 GPUs for GoogLeNet (ImageNet) with a speedup of 2.5x over 32 GPUs. To the best of our knowledge, this is the first framework that scales up to 160 GPUs. Furthermore, even for single node training, S-Caffe shows an improvement of 14\% and 9\% over Nvidia's optimized Caffe for 8 and 16 GPUs, respectively. In addition, S-Caffe achieves up to 1395 samples per second for the AlexNet model, which is comparable to the performance of Microsoft CNTK.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

S-Caffe

Abstract

Talk to us

Similar Papers

More From: ACM SIGPLAN Notices

Lead the way for us

Journal: ACM SIGPLAN Notices	Publication Date: Jan 26, 2017
Citations: 31

Similar Papers

S-Caffe
Ammar Ahmad Awan ... Jahanzeb Maqbool Hashmi
-
Ammar Ahmad Awan, et. al.Ammar Ahmad Awan ... Jahanzeb Maqbool Hashmi
26 Jan 2017
26 Jan 2017

Benchmarking Contemporary Deep Learning Hardware and Frameworks: A Survey of Qualitative Metrics
Wei Dai ... Daniel Berleant
-
Wei Dai, et. al.Wei Dai ... Daniel Berleant
01 Dec 2019
01 Dec 2019

Qualitative Benchmarking of Deep Learning Hardware and Frameworks: Review and Tutorial
...
-
, et. al. ...
05 Jul 2019
05 Jul 2019

Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and Disengagement
Kai Gao ... Bing Xie
ACM Transactions on Software Engineering and Methodology | VOL. 33
Kai Gao, et. al.Kai Gao ... Bing Xie
18 Apr 2024
ACM Transactions on Software Engineering and Methodology | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

S-Caffe

Abstract

Talk to us

Similar Papers

More From: ACM SIGPLAN Notices