Abstract

Abstract. With semiconductor technology gradually approaching its physical and thermal limits, recent supercomputers have adopted major architectural changes to continue increasing the performance through more power-efficient heterogeneous many-core systems. Examples include Sunway TaihuLight that has four management processing elements (MPEs) and 256 computing processing elements (CPEs) inside one processor and Summit that has two central processing units (CPUs) and six graphics processing units (GPUs) inside one node. Meanwhile, current high-resolution Earth system models that desperately require more computing power generally consist of millions of lines of legacy code developed for traditional homogeneous multicore processors and cannot automatically benefit from the advancement of supercomputer hardware. As a result, refactoring and optimizing the legacy models for new architectures become key challenges along the road of taking advantage of greener and faster supercomputers, providing better support for the global climate research community and contributing to the long-lasting societal task of addressing long-term climate change. This article reports the efforts of a large group in the International Laboratory for High-Resolution Earth System Prediction (iHESP) that was established by the cooperation of Qingdao Pilot National Laboratory for Marine Science and Technology (QNLM), Texas A&M University (TAMU), and the National Center for Atmospheric Research (NCAR), with the goal of enabling highly efficient simulations of the high-resolution (25 km atmosphere and 10 km ocean) Community Earth System Model (CESM-HR) on Sunway TaihuLight. The refactoring and optimizing efforts have improved the simulation speed of CESM-HR from 1 SYPD (simulation years per day) to 3.4 SYPD (with output disabled) and supported several hundred years of pre-industrial control simulations. With further strategies on deeper refactoring and optimizing for remaining computing hotspots, as well as redesigning architecture-oriented algorithms, we expect an equivalent or even better efficiency to be gained on the new platform than traditional homogeneous CPU platforms. The refactoring and optimizing processes detailed in this paper on the Sunway system should have implications for similar efforts on other heterogeneous many-core systems such as GPU-based high-performance computing (HPC) systems.

Highlights

  • The development of numerical simulations and the development of modern supercomputers have been like two entangled streams, with numerous interactions at different stages

  • The peak performance of a supercomputer system has evolved from the scale of 1 megaflop (e.g. CDC 3300) to 100 petaflops (e.g. Sunway TaihuLight), which is an increase of 11 orders of magnitude

  • This article reports the efforts of a large group in the International Laboratory for High-Resolution Earth System Prediction that was established by the cooperation of Qingdao Pilot National Laboratory for Marine Science and Technology (QNLM), Texas A&M University (TAMU), and the National Center for Atmospheric Research (NCAR), with the goal of enabling highly efficient simulation of the Community Earth System Model high-resolution version (25 km atmosphere and 10 km ocean) (CESM-HR) on Sunway TaihuLight

Read more

Summary

Introduction

The development of numerical simulations and the development of modern supercomputers have been like two entangled streams, with numerous interactions at different stages. Compared with the existing work mentioned above (GPUbased ASUCA, POM, WRF, and COSMO), our work optimizes an Earth system model, which demonstrates the level of complexity in both components and numerical methods This requires better accuracy and better conservation of matter and energy so as to perform simulation of hundreds of years instead of just hundreds of days; as the first step, we are not changing the original algorithm design to minimize the uncertainties of code and results.

Hardware
Software
The second-level parallelism between MPE and CPEs
Architectural pros and cons
Power efficiency
Enabling CESM-HR on Sunway TaihuLight
Migrating the code from Intel to Sunway processors
Correctness verification
Transformation of independent loops
Register communication-based parallelization of dependent loops
Athread-based redesign of the code
Refactoring and optimizing of the CAM5 computing hotspots
Refactoring and optimizing of the POP2 computing hotspots
The current CPE-parallelized version of CESM-HR
The feature of radiative balance at the top of atmosphere
Model bias
Summary and discussions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call