Accelerating Dense Matrix Computations with Effective Workload Partitioning on Heterogeneous Architectures

Mohsin Khan,Waseem Ahmed,Touseef M Golandaz

doi:10.1080/03772063.2018.1436476

Abstract

ABSTRACTThe emergence of High Performance Computing (HPC) has enabled the researchers to perform large scientific computations efficiently and quickly. But as the heterogeneity of the processing units of the HPC systems increased, the utilization of all the resources became an issue. Fully harnessing the power of these systems requires efficient division of work across all the processing units. This solves the issue of under-utilization of resources and improves performance of the application. In this research work, we present a dynamic approach to workload partitioning that obtains the optimal workload partition and schedules them to processing units for parallel processing. Our workload partitioning technique is able to respond automatically to performance variation to provide good performance, it requires very negligible training and is implemented as a library. Performance results show that our dynamic approach is better than static and linear approach. By running the Dense Matrix-Matrix Multiplication kernel library by our proposed method on both CPU and Graphics Processing Unit (GPU) in parallel, we obtain average speedups from to over CPU and to over GPU. We used our method on multi-GPUs for which we obtain average speedups of over CPU and over single GPU.

Full Text