Exploiting heterogeneity of communication channels for efficient GPU selection on multi-GPU nodes

Iman Faraji,Ahmad Afsahi,Seyed H Mirsadeghi

doi:10.1016/j.parco.2017.07.001

Abstract

Abstract Multi-GPU nodes have become the platform of choice for scientific applications. In a multi-GPU node, GPUs are interconnected together via different communication channels. The intranode communications among GPUs may traverse different paths with different latency and bandwidth characteristics. As the number of GPUs within a multi-GPU node increases, the physical topology of the GPU interconnects tend to have more levels of hierarchy, which in turn increases the heterogeneity of the GPU communication channels.In this paper, we show that the performance of different intranode GPU communication channels can be considerably different from each other. Accordingly, we propose a topology-aware GPU selection scheme for efficient assignment of GPUs to the MPI processes within a node. The resulting assignment helps to improve the communication performance by mapping more intensive inter-process GPU-to-GPU communications on the stronger communication channels. We leverage three metrics in our scheme to distinguish among different GPU-to-GPU communication channels: latency, bandwidth, and distance. We evaluate our scheme through extensive experiments conducted on a 16-GPU node, and show that our scheme can provide considerable performance improvements over the default GPU selection scheme. In particular, we can achieve up to 70% and 21% performance improvement at the microbenchmark and application level, respectively.

Full Text