Abstract

Heterogeneous computer systems with multiple types of processing elements (PEs) are becoming a popular design to optimize performance and efficiency for a wide variety of applications. Each part of an application can be executed on the PE for which it is best suited. In heterogeneous systems, communication, efficient data movement, and memory sharing across PEs are critical to execute an application across the different PEs while incurring minimal overhead for communication and synchronization. The IBM POWER9 processor supports the NVIDIA NVLink interface, a high-performance interconnect with many such capabilities. In the IBM Power System AC922, IBM POWER9 processors directly connect to multiple NVIDIA GPUs using NVLink. In this paper, we highlight the important functional and performance capabilities of NVLink with the POWER9 processor. These include high bandwidth, hardware cache coherence, fine-grained data movement, and hardware support for atomic operations across all PEs of a compute node. We also present an analysis of how these performance and functional capabilities of POWER9 processors and NVLink are expected to have significant impacts on performance and programmability across a variety of important applications, such as machine learning and domains within high-performance computing.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call