Abstract

Abstract A new era of computing has begun with the development of high-performance computing (HPC), artificial intelligence (AI), machine learning (ML), and cognitive systems. Dramatic increases in the power density of the electronic components have led to the design and architecture of efficient thermal management technologies on these systems. IBM designed and delivered in 2018 the most powerful and fastest supercomputers of the world known as Summit and Sierra having 200 petaflops peak computing performance through LINPACK benchmarks. These systems which are called as IBM POWER AC922 are both air and liquid cooled, where water is employed in liquid-cooled systems to cool the high-power electronic components including IBM POWER9 processors and NVIDIA graphics processing units (GPUs). In this paper, we highlight the overview of the thermal and mechanical design strategies applied to these systems. Testing and experimental analysis with comparison to computational modeling is provided. Thermal control strategies are investigated for the optimization of overall system efficiency. In air cooled systems, we discuss the fan and heat sink designs, as well as the preheating effect on the PCIe section. In liquid-cooled systems, which have a unique cold plate design cooling the processors and the GPUs with water, we examine the water flow path design for the central processing units (CPUs), the GPUs, and the thermal performance of the cold plate. An overview of the cooling assemblies such as TIMs and air baffles in these systems is discussed. Unit and rack manifolds and rear door heat exchanger (RDHx) are investigated. Water flow and pressure distribution at the node and rack-level are provided.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call