Abstract

Deep Neural Network (DNN) training and inference are two resource-intensive tasks that are usually scaled out using data or model parallelism where data parallelism parallelizes over the input data and model parallelism parallelizes over the network. Also, dense matrix-matrix multiplication is the key primitive behind training/inference of dense DNNs. On the contrary, sparse DNNs are less resource-intensive compared to their dense counterparts while offering comparable accuracy. Similarly, they can be parallelized using data or model parallelism with Sparse Matrix-Matrix Multiplication (SpMM) as the key primitive. To scale out, both data and model parallelisms initially use data parallelism to partition the input data among multiple machines. This initial partitioning of the input makes data and model parallelisms performance prone to load imbalance as partitions may be imbalanced. As part of this paper, we take a deeper look into data and model parallelisms and closely study the mechanics of the SpMM used for each. Moreover, to intuitively remedy their load imbalance problem, we incorporate hashing as a simple yet powerful method to address load imabalance. Finally, we use the IEEE HPEC sparse DNN challenge dataset to evaluate the performance of data and model parallelisms at scale. We scaled up to 32 machines (896 cores) and inferred a large sparse DNN with 4B parameters in 51 seconds. Results suggest that with hashing, data and model parallelisms achieve super-linear speedup due to better load balance and cache utilization.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.