Abstract

Abstract. We present an architecture-portable and performant implementation of the atmospheric dynamical core (High-Order Methods Modeling Environment, HOMME) of the Energy Exascale Earth System Model (E3SM). The original Fortran implementation is highly performant and scalable on conventional architectures using the Message Passing Interface (MPI) and Open MultiProcessor (OpenMP) programming models. We rewrite the model in C++ and use the Kokkos library to express on-node parallelism in a largely architecture-independent implementation. Kokkos provides an abstraction of a compute node or device, layout-polymorphic multidimensional arrays, and parallel execution constructs. The new implementation achieves the same or better performance on conventional multicore computers and is portable to GPUs. We present performance data for the original and new implementations on multiple platforms, on up to 5400 compute nodes, and study several aspects of the single- and multi-node performance characteristics of the new implementation on conventional CPU (e.g., Intel Xeon), many core CPU (e.g., Intel Xeon Phi Knights Landing), and Nvidia V100 GPU.

Highlights

  • We present the results of an effort to rewrite High-Order Methods Modeling Environment (HOMME), a Fortran-based code for global atmosphere dynamics and transport, to a performance-portable implementation in C++, using the Kokkos library and programming model (Edwards et al, 2014) for on-node parallelism

  • HOMME is a critical part of E3SM, a globally coupled climate model funded by the Department of Energy (DOE), and this work was part of the effort to prepare E3SM for future exascale computing resources

  • We presented performance results of the new, end-to-end implementation in HOMMEXX over a range of simulation regimes, and, where possible, compared them against the original Fortran code

Read more

Summary

Introduction

We present the results of an effort to rewrite High-Order Methods Modeling Environment (HOMME), a Fortran-based code for global atmosphere dynamics and transport, to a performance-portable implementation in C++ (which we will call HOMMEXX), using the Kokkos library and programming model (Edwards et al, 2014) for on-node parallelism. Numerical results were presented on a single code implementation running across three different multi-/many-core architectures: GPU, KNL, and HSW Another performance-portability effort in the realm of climate involves the acceleration of the implicit–explicit (IMEX) Non-hydrostatic Unified Model of the Atmosphere (NUMA) on many-core processors such as GPUs and KNLs (Abdi et al, 2017).

The HOMME dycore
The Kokkos library
Performance and optimization choices
Implementation details
Differential operators on the sphere
Performance results
Strong scaling
Single node or device performance
GPU kernel performance
Power consumption
Conclusions
Findings
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call