Studying the Effects of Hashing of Sparse Deep Neural Networks on Data and Model Parallelisms

Mohammad Hasanzadeh Mofrad,Rami Melhem,Mohammad Hammoud,Yousuf Ahmad

doi:10.1109/hpec43674.2020.9286195

Mohammad Hasanzadeh Mofrad, Rami Melhem + Show 2 more

https://doi.org/10.1109/hpec43674.2020.9286195

Copy DOI

Abstract

Deep Neural Network (DNN) training and inference are two resource-intensive tasks that are usually scaled out using data or model parallelism where data parallelism parallelizes over the input data and model parallelism parallelizes over the network. Also, dense matrix-matrix multiplication is the key primitive behind training/inference of dense DNNs. On the contrary, sparse DNNs are less resource-intensive compared to their dense counterparts while offering comparable accuracy. Similarly, they can be parallelized using data or model parallelism with Sparse Matrix-Matrix Multiplication (SpMM) as the key primitive. To scale out, both data and model parallelisms initially use data parallelism to partition the input data among multiple machines. This initial partitioning of the input makes data and model parallelisms performance prone to load imbalance as partitions may be imbalanced. As part of this paper, we take a deeper look into data and model parallelisms and closely study the mechanics of the SpMM used for each. Moreover, to intuitively remedy their load imbalance problem, we incorporate hashing as a simple yet powerful method to address load imabalance. Finally, we use the IEEE HPEC sparse DNN challenge dataset to evaluate the performance of data and model parallelisms at scale. We scaled up to 32 machines (896 cores) and inferred a large sparse DNN with 4B parameters in 51 seconds. Results suggest that with hashing, data and model parallelisms achieve super-linear speedup due to better load balance and cache utilization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Studying the Effects of Hashing of Sparse Deep Neural Networks on Data and Model Parallelisms

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Multithreaded Layer-wise Training of Sparse Deep Neural Networks using Compressed Sparse Column
Mohammad Hasanzadeh Mofrad ... Yousuf Ahmad
-
Mohammad Hasanzadeh Mofrad, et. al.Mohammad Hasanzadeh Mofrad ... Yousuf Ahmad
01 Sep 2019
01 Sep 2019

Scalable Inference for Sparse Deep Neural Networks using Kokkos Kernels
J Austin Ellis ... Sivasankaran Rajamanickam
-
J Austin Ellis, et. al.J Austin Ellis ... Sivasankaran Rajamanickam
01 Sep 2019
01 Sep 2019

GraphChallenge.org Sparse Deep Neural Network Performance
Jeremy Kepner ... Lauren Milechin
-
Jeremy Kepner, et. al.Jeremy Kepner ... Lauren Milechin
22 Sep 2020
22 Sep 2020

Accelerating Distributed Inference of Sparse Deep Neural Networks via Mitigating the Straggler Effect
Mohammad Hasanzadeh Mofrad ... Mohammad Hammoud
-
Mohammad Hasanzadeh Mofrad, et. al.Mohammad Hasanzadeh Mofrad ... Mohammad Hammoud
22 Sep 2020
22 Sep 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Studying the Effects of Hashing of Sparse Deep Neural Networks on Data and Model Parallelisms

Abstract

Talk to us

Similar Papers