Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters

Pouya Kousha,Nick Sarkauskas,Arpan Jain,Ching-Hsiang Chu,Bharath Ramesh,Kaushik Kandadi Suresh,Hari Subramoni,Dhabaleswar K Panda

doi:10.1109/hipc.2019.00022

Abstract

The recent advent of advanced fabrics like NVIDIA NVLink is enabling the deployment of dense Graphics Processing Unit (GPU) systems, e.g., DGX-2 and Summit. The Message Passing Interface (MPI) has been the dominant programming model to design distributed applications on such clusters. The MPI Tools Interface (MPI_T) provides an opportunity for performance tools and external software to introspect and understand MPI runtime behavior at a deeper level to detect performance and scalability issues. However, the lack of low-overhead and scalable monitoring tools have thus far prevented a comprehensive study of efficiency and utilization of high-performance interconnects such as NVLinks on high-performance GPU-enabled clusters. In this paper, we address this deficiency by proposing and designing an in-depth, real-time analysis, profiling, and visualization tool for high-performance GPU-enabled clusters with NVLinks. The proposed tool builds on the top of the OSU InfiniBand Network Analysis and Monitoring Tool (INAM). It provides insights into the efficiency of different communication patterns by examining the utilization of underlying GPU interconnects. The contributions of the proposed tool are two-fold: 1) domain scientists and system administrators can understand how applications and runtime libraries interact with underlying high-performance interconnects, and 2)Proposed tool enables designers of high-performance communication libraries to gain low-level knowledge to optimize existing designs and develop new algorithms to optimally utilize cutting-edge interconnects on GPU clusters. To the best of our knowledge, this is the first such tool which is capable of presenting a unified and holistic view of MPI-level and fabric level information for emerging NVLink-enabled high-performance GPU clusters.

Full Text