Abstract
Heterogeneous computer systems with multiple types of processing elements (PEs) are becoming a popular design to optimize performance and efficiency for a wide variety of applications. Each part of an application can be executed on the PE for which it is best suited. In heterogeneous systems, communication, efficient data movement, and memory sharing across PEs are critical to execute an application across the different PEs while incurring minimal overhead for communication and synchronization. The IBM POWER9 processor supports the NVIDIA NVLink interface, a high-performance interconnect with many such capabilities. In the IBM Power System AC922, IBM POWER9 processors directly connect to multiple NVIDIA GPUs using NVLink. In this paper, we highlight the important functional and performance capabilities of NVLink with the POWER9 processor. These include high bandwidth, hardware cache coherence, fine-grained data movement, and hardware support for atomic operations across all PEs of a compute node. We also present an analysis of how these performance and functional capabilities of POWER9 processors and NVLink are expected to have significant impacts on performance and programmability across a variety of important applications, such as machine learning and domains within high-performance computing.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Similar Papers
More From: IBM Journal of Research and Development
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.